=Paper= {{Paper |id=Vol-2042/paper46 |storemode=property |title=SPARQL-Proxy: A Generic Proxy Server for SPARQL Endpoint |pdfUrl=https://ceur-ws.org/Vol-2042/paper46.pdf |volume=Vol-2042 |authors=Shuichi Kawashima,Toshiaki Katayama |dblpUrl=https://dblp.org/rec/conf/swat4ls/KawashimaK17 }} ==SPARQL-Proxy: A Generic Proxy Server for SPARQL Endpoint== https://ceur-ws.org/Vol-2042/paper46.pdf
     SPARQL-Proxy: a generic proxy server for SPARQL
                       endpoints

                       Shuichi Kawashima1 and Toshiaki Katayama1
 1
     Database Center for Life Science, Wakashiba 178-4-4, Kashiwa-shi, Chiba 277-0871, Japan
                      kwsm@dbcls.rois.ac.jp, ktym@dbcls.jp



         Abstract. SPARQL-Proxy is a portable Web application that works as a proxy
         server of a SPARQL endpoint. It provides several functions such as job sched-
         uling for SPARQL queries, validating the safety of query statements, and cach-
         ing of SPARQL search results to improve response time performance.

         Keywords: SPARQL endpoint, proxy server, RDF.


1        Introduction

Providing a SPARQL endpoint is one of the most effective ways that users can easily
utilize RDF data. Various SPARQL endpoints are available for major RDF
datasets in life sciences such as UniProt and the EBI RDF platform. We are also
providing SPARQL endpoints for our RDF data services including TogoGenome [1]
and the NBDC RDF portal [2]. When providing a SPARQL endpoint, it is demanded
to properly control the submitted queries so that the RDF data management system
will not be down due to heavy queries. The function of filtering unsafe queries is also
needed. In order to easily make use of such functionalities for any SPARQL endpoint
running on the various environments and variety of RDF stores, we have developed a
portable web application named SPARQL-proxy.


2        Methods and Results
SPARQL-proxy is implemented in Node.js. To start it, the user just executes the fol-
lowing command from the directory where it built.
                       $ PORT=3000 SPARQL_BACKEND= npm start

It works as a proxy server for the SPARQL endpoint of which the URL is specified
via the SPARQL_BACKEND environment variable. The provider of the SPARQL
endpoint can expose the proxy URL instead of the original endpoint URL. In the
above case, port 3000 is assigned but it can be 80 or the provider can configure an
HTTP reverse proxy to point that port. All other options such as a cache system of
choice can also be set via the environment variables. SPARQL-proxy provides two
web interfaces: one is the dashboard for administrators to monitor the execution of
2

jobs (Fig. 1) and the other is the query submission form for the debugging use. Ad-
ministrators can see the execution logs, cancel running/queued jobs and remove
cached results. Submitted queries are validated if it includes unsafe instructions such
as a SPARQL Update query prior to passing them to the backend triplestore. The job
timeout and the number of concurrent requests can also be specified.




                Fig. 1. The dashboard interface of the SPARQL-proxy.


In order to improve the response time of the requested query, SPARQL-proxy pro-
vides a function that caches each SPARQL result and returns a cached result when the
same query is submitted. The provider of the service can select the caching mecha-
nism from a local file, memory, Redis, and Memcached. To reduce the size of the
cache, cached results can be compressed by using snappy.js which is a JavaScript
implementation of Google's Snappy compression library. SPARQL-proxy is freely
available and the source code is provided in the GitHub repository [3].


References
 1. http://togogenome.org/
 2. https://integbio.jp/rdf/
 3. https://github.com/dbcls/sparql-proxy