Introduction

SPARQL-Proxy: a generic proxy server for SPARQL endpoints

Shuichi Kawashima

Toshiaki Katayama

0 0 Database Center for Life Science , Wakashiba 178-4-4, Kashiwa-shi, Chiba 277-0871 , Japan

SPARQL-Proxy is a portable Web application that works as a proxy server of a SPARQL endpoint. It provides several functions such as job scheduling for SPARQL queries, validating the safety of query statements, and caching of SPARQL search results to improve response time performance. Providing a SPARQL endpoint is one of the most effective ways that users can easily utilize RDF data. Various SPARQL endpoints are available for major RDF datasets in life sciences such as UniProt and the EBI RDF platform. We are also providing SPARQL endpoints for our RDF data services including TogoGenome [1] and the NBDC RDF portal [2]. When providing a SPARQL endpoint, it is demanded to properly control the submitted queries so that the RDF data management system will not be down due to heavy queries. The function of filtering unsafe queries is also needed. In order to easily make use of such functionalities for any SPARQL endpoint running on the various environments and variety of RDF stores, we have developed a portable web application named SPARQL-proxy.

SPARQL endpoint proxy server RDF

Introduction Methods and Results

SPARQL-proxy is implemented in Node.js. To start it, the user just executes the following command from the directory where it built.

$ PORT=3000 SPARQL_BACKEND=<url> npm start It works as a proxy server for the SPARQL endpoint of which the URL is specified via the SPARQL_BACKEND environment variable. The provider of the SPARQL endpoint can expose the proxy URL instead of the original endpoint URL. In the above case, port 3000 is assigned but it can be 80 or the provider can configure an HTTP reverse proxy to point that port. All other options such as a cache system of choice can also be set via the environment variables. SPARQL-proxy provides two web interfaces: one is the dashboard for administrators to monitor the execution of jobs (Fig. 1) and the other is the query submission form for the debugging use. Administrators can see the execution logs, cancel running/queued jobs and remove cached results. Submitted queries are validated if it includes unsafe instructions such as a SPARQL Update query prior to passing them to the backend triplestore. The job timeout and the number of concurrent requests can also be specified.

In order to improve the response time of the requested query, SPARQL-proxy provides a function that caches each SPARQL result and returns a cached result when the same query is submitted. The provider of the service can select the caching mechanism from a local file, memory, Redis, and Memcached. To reduce the size of the cache, cached results can be compressed by using snappy.js which is a JavaScript implementation of Google's Snappy compression library. SPARQL-proxy is freely available and the source code is provided in the GitHub repository [3]. 1. http://togogenome.org/ 2. https://integbio.jp/rdf/ 3. https://github.com/dbcls/sparql-proxy