<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SPARQL-Proxy: a generic proxy server for SPARQL endpoints</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shuichi Kawashima</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Toshiaki Katayama</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Database Center for Life Science</institution>
          ,
          <addr-line>Wakashiba 178-4-4, Kashiwa-shi, Chiba 277-0871</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>SPARQL-Proxy is a portable Web application that works as a proxy server of a SPARQL endpoint. It provides several functions such as job scheduling for SPARQL queries, validating the safety of query statements, and caching of SPARQL search results to improve response time performance. Providing a SPARQL endpoint is one of the most effective ways that users can easily utilize RDF data. Various SPARQL endpoints are available for major RDF datasets in life sciences such as UniProt and the EBI RDF platform. We are also providing SPARQL endpoints for our RDF data services including TogoGenome [1] and the NBDC RDF portal [2]. When providing a SPARQL endpoint, it is demanded to properly control the submitted queries so that the RDF data management system will not be down due to heavy queries. The function of filtering unsafe queries is also needed. In order to easily make use of such functionalities for any SPARQL endpoint running on the various environments and variety of RDF stores, we have developed a portable web application named SPARQL-proxy.</p>
      </abstract>
      <kwd-group>
        <kwd>SPARQL endpoint</kwd>
        <kwd>proxy server</kwd>
        <kwd>RDF</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>Methods and Results</title>
      <p>SPARQL-proxy is implemented in Node.js. To start it, the user just executes the
following command from the directory where it built.</p>
      <p>$ PORT=3000 SPARQL_BACKEND=&lt;url&gt; npm start
It works as a proxy server for the SPARQL endpoint of which the URL is specified
via the SPARQL_BACKEND environment variable. The provider of the SPARQL
endpoint can expose the proxy URL instead of the original endpoint URL. In the
above case, port 3000 is assigned but it can be 80 or the provider can configure an
HTTP reverse proxy to point that port. All other options such as a cache system of
choice can also be set via the environment variables. SPARQL-proxy provides two
web interfaces: one is the dashboard for administrators to monitor the execution of
jobs (Fig. 1) and the other is the query submission form for the debugging use.
Administrators can see the execution logs, cancel running/queued jobs and remove
cached results. Submitted queries are validated if it includes unsafe instructions such
as a SPARQL Update query prior to passing them to the backend triplestore. The job
timeout and the number of concurrent requests can also be specified.</p>
      <p>In order to improve the response time of the requested query, SPARQL-proxy
provides a function that caches each SPARQL result and returns a cached result when the
same query is submitted. The provider of the service can select the caching
mechanism from a local file, memory, Redis, and Memcached. To reduce the size of the
cache, cached results can be compressed by using snappy.js which is a JavaScript
implementation of Google's Snappy compression library. SPARQL-proxy is freely
available and the source code is provided in the GitHub repository [3].
1. http://togogenome.org/
2. https://integbio.jp/rdf/
3. https://github.com/dbcls/sparql-proxy</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>