=Paper=
{{Paper
|id=Vol-2042/paper46
|storemode=property
|title=SPARQL-Proxy: A Generic Proxy Server for SPARQL Endpoint
|pdfUrl=https://ceur-ws.org/Vol-2042/paper46.pdf
|volume=Vol-2042
|authors=Shuichi Kawashima,Toshiaki Katayama
|dblpUrl=https://dblp.org/rec/conf/swat4ls/KawashimaK17
}}
==SPARQL-Proxy: A Generic Proxy Server for SPARQL Endpoint==
SPARQL-Proxy: a generic proxy server for SPARQL endpoints Shuichi Kawashima1 and Toshiaki Katayama1 1 Database Center for Life Science, Wakashiba 178-4-4, Kashiwa-shi, Chiba 277-0871, Japan kwsm@dbcls.rois.ac.jp, ktym@dbcls.jp Abstract. SPARQL-Proxy is a portable Web application that works as a proxy server of a SPARQL endpoint. It provides several functions such as job sched- uling for SPARQL queries, validating the safety of query statements, and cach- ing of SPARQL search results to improve response time performance. Keywords: SPARQL endpoint, proxy server, RDF. 1 Introduction Providing a SPARQL endpoint is one of the most effective ways that users can easily utilize RDF data. Various SPARQL endpoints are available for major RDF datasets in life sciences such as UniProt and the EBI RDF platform. We are also providing SPARQL endpoints for our RDF data services including TogoGenome [1] and the NBDC RDF portal [2]. When providing a SPARQL endpoint, it is demanded to properly control the submitted queries so that the RDF data management system will not be down due to heavy queries. The function of filtering unsafe queries is also needed. In order to easily make use of such functionalities for any SPARQL endpoint running on the various environments and variety of RDF stores, we have developed a portable web application named SPARQL-proxy. 2 Methods and Results SPARQL-proxy is implemented in Node.js. To start it, the user just executes the fol- lowing command from the directory where it built. $ PORT=3000 SPARQL_BACKEND=npm start It works as a proxy server for the SPARQL endpoint of which the URL is specified via the SPARQL_BACKEND environment variable. The provider of the SPARQL endpoint can expose the proxy URL instead of the original endpoint URL. In the above case, port 3000 is assigned but it can be 80 or the provider can configure an HTTP reverse proxy to point that port. All other options such as a cache system of choice can also be set via the environment variables. SPARQL-proxy provides two web interfaces: one is the dashboard for administrators to monitor the execution of 2 jobs (Fig. 1) and the other is the query submission form for the debugging use. Ad- ministrators can see the execution logs, cancel running/queued jobs and remove cached results. Submitted queries are validated if it includes unsafe instructions such as a SPARQL Update query prior to passing them to the backend triplestore. The job timeout and the number of concurrent requests can also be specified. Fig. 1. The dashboard interface of the SPARQL-proxy. In order to improve the response time of the requested query, SPARQL-proxy pro- vides a function that caches each SPARQL result and returns a cached result when the same query is submitted. The provider of the service can select the caching mecha- nism from a local file, memory, Redis, and Memcached. To reduce the size of the cache, cached results can be compressed by using snappy.js which is a JavaScript implementation of Google's Snappy compression library. SPARQL-proxy is freely available and the source code is provided in the GitHub repository [3]. References 1. http://togogenome.org/ 2. https://integbio.jp/rdf/ 3. https://github.com/dbcls/sparql-proxy