Incorporating API data into SPARQL query answers Matı́as Jünemann, Juan L. Reutter, Adrián Soto, and Domagoj Vrgoč PUC Chile and Center for Semantic Web Research Abstract. In this demo we present an extension of SPARQL which allows queries to connect to JSON APIs and integrate the obtained in- formation into query answers. We achieve this by adding a new operator to SPARQL, and implement this extension on top of the Jena framework in order to illustrate how it functions with real world APIs. 1 Introduction and motivation Semantic Web provides a platform for publishing data on the Web via the Re- source Description Framework (RDF). Having a common format for data dis- semination allows for applications of increasing complexity since it enables them to access data obtained from different sources. However, the majority of data available on the Web today is still not published in RDF, and is thus not (di- rectly) available to Semantic Web services. Huge amount of this data is accessed through Web APIs which use a variety of different formats to provide data to the users. Therefore it would be useful to allow SPARQL, the standard query language for the Semantic Web, to access and use all of this data. In this demo we make a first step in this direction by extending SPARQL with the capability of communicating with JSON APIs. We picked JSON because it is currently the most popular data format in Web APIs, however, the results presented here can easily be extended to any API format; we stick with JSON simply to keep the presentation manageable. By allowing SPARQL to connect to an API we can utilise not just the information that is available locally in our dataset, but also extend the query answer with data obtained from a Web service. Use cases for such an extensions are numerous and can be particularly practical when the data obtained from the API changes very often (such as current weather conditions, sensor data, etc.). We illustrate how this extension works and why one might want to use it by means of an example. Example 1. Suppose that you are travelling around Japan in order to do some skiing. You find yourself at the Hokkaido island and wish to find all the ski re- sorts close to your location. It is easy to obtain this information by querying e.g. the YAGO database using SPARQL, however, you will probably want to go ski- ing in a resort where the weather conditions are favourable (i.e. it is not raining nor snowing). Although you can not obtain weather information directly using SPARQL and join it with the list of resorts obtained previously, this informa- tion is available through a weather service API called weather.api. This API implements HTTP requests, so for example to retrieve the weather in Sapporo you use the URI template: http://weather.api/request?q=Sapporo, to which the API responds with a JSON document containing weather informa- tion, say of the form {"timestamp": "14/04/2016 11:59:07", "temperature": 11, "description": "Sunny"}, Thus, all you need is to produce one call to the weather API for each ski centre in Hokkaido, and filter out all those where the description is not Sunny. One can do this manually by e.g. querying with SPARQL first, and then executing an API call for each obtained resort. However this might be cumbersome, particularly when the number of answers is large, and does not allow us to incorporate this information into our query answers. On the other hand, we propose to extend SPARQL with the BIND API operator, which allows us to easily obtain all of the desired information using the following query. SELECT ?x ?n WHERE { ?x yago:isLocatedIn yago:Hokkaido . ?x rdf:type yago:wikicat_Ski_areas_and_resorts_in_Japan . ?x rdfs:label ?n BIND_API (["description"]) AS (?t) FILTER regex(?t,"Sunny") } The first part of our query is executed over the YAGO database and obtains the IRI representing the resort and the label of its location. We pass the label of the location as a parameter to the URI template used to consult the API. The newly introduced operator BIND API takes the (instantiated) URI template and upon executing the API call processes the received JSON document using an expression ["description"], which obtains the value of the key description of the received JSON, and binds it to the variable ?t. Generally, the answer we receive is going to be a collection of key-value pairs, so we need to specify which value we want to obtain and store using the BIND API operator. In this demo we showcase an extension of Jena framework implementing the BIND API operator, and do live testing (on a remote server available at http://107. 170.168.31/query/#/) using either preselected queries, or the ones provided by the visitors at the time of the demonstration. To keep the presentation manageable we will use YAGO as our base dataset, but allow arbitrary APIs to be used. Related work. There have been several proposals to allow SPARQL to commu- nicate with APIs (see e.g. [2]), the main difference here being that we offer com- munication capability with an arbitrary API as an integral part of the SPARQL query processor. On the other hand, there have been many attempts to trans- form data residing in other formats to RDF, the most popular paradigm here being that of mappings [1]. The main difference from the work we present is that mappings generally do not support querying “on-the-fly”, which can be an issue when the data changes a lot such as e.g. with weather information. 2 The proposed extension In order to allow SPARQL to obtain API data, we propose to extend the language with a new operator called BIND API. Formally, we allow SPARQL to contain patterns of the form P1 BIND API U (N1 , N2 , . . . , Nm ) AS (?x1 , ?x2 , . . . , ?xm ), (1) which we call BIND-from-API patterns. Here P1 is an arbitrary SPARQL graph pattern [4], U a URI template [3] used to contact the API, and N1 , . . . , Nm a sequence of JSON navigation instructions [5] which tell us how to extract the desired value from the retrieved JSON document. By our definition BIND-from- API patterns can appear anywhere usual SPARQL patterns can. This can be e.g. inside a WHERE clause, such as in Example 1, or even as P1 in (1), thus allowing us to obtain results from multiple APIs inside a single query. Evaluating this operator over an RDF dataset G is done as follows. For each mapping µ in JP1 KG we instantiate every variable ?y in the URI template U with the value µ(?y), thus obtaining an IRI which is a valid API call. We call the API with this instantiated IRI, obtaining a JSON document, say J. We then apply the navigation instruction N1 to J and store the obtained value into ?x1 . If the API call produced an error, or if the returned value is not a literal, we do not assign a value to ?x1. Similarly, the value of N2 applied to J is stored into ?x2 , etc. After this is done, the mapping µ is extended with the new variables ?x1 , . . . , ?xm , which have been assigned values according to J and Ni s. The implementation is done on top of the Jena framework, and does not alter the inner workings of the standard BIND operator. Full definitions of the syntax and the semantics are available at http://dvrgoc.ing.puc.cl/APIs/. We have also made the implementation available on github for the readers who would like to further test the capabilities of our extension: https://github.com/CSWR/SPARQL-JSONAPI. 3 Demonstration overview The main focus of this demonstration will be the live query interface available at http://107.170.168.31/query/#/, which will allow the demo visitors to test out ar- bitrary queries which use API calls. The service we provide runs the extended version of the Jena TDB framework through Fuseki, and the query interface connects to this implementation using a Python script. The interface will check every five seconds if the results are available. In order to make the presentation more streamlined, we have decided to use a reasonable sized (2 GB) chunk of the YAGO and DBpedia database containing information about geographical locations as our base dataset. Apart from the usual query window available in SPARQL endpoints, we also have a separate window where the users can en- ter their strategies for accessing the APIs. As a default, we provide support for OAuth and the typical strategy of providing a personalised token to access the API. The visual presentation of the query interface is illustrated in Figure 3. Fig. 1. Live query interface available at http://107.170.168.31/query/#/. The user types the query in the window to the left, and provides a strategy in the window to the right. The demonstration will consist of two parts: – First, we will give a brief introduction by example showcasing the different functionalities supported by our implementation. – Second, we will allow the users to specify queries using API calls which they wish to execute (here we allow arbitrary APIs). The aim of the demo is to emphasise the potential uses of such an extension to SPARQL through a series of examples, and also to show that our imple- mentation can handle multiple (and simultaneous) user provided queries in real time on a remote server, thus simulating a typical SPARQL endpoint. Acknowledgements. Work funded by the Millennium Nucleus Center for Semantic Web Research under Grant NC120004. References 1. A. Dimou, M. V. Sande, P. Colpaert, R. Verborgh, E. Mannens, and R. V. de Walle. RML: A generic language for integrated RDF mappings of heterogeneous data. In LDOW, 2014. 2. P. Fafalios and Y. Tzitzikas. SPARQL-LD: a SPARQL extension for fetching and querying linked data. In ISWC Demos, 2015. 3. IETF. URI Template. https://tools.ietf.org/html/rfc6570, 2012. 4. J. Pérez, M. Arenas, and C. Gutierrez. Semantics and complexity of SPARQL. ACM Transactions on Database Systems, 34(3), 2009. 5. F. Pezoa, J. L. Reutter, F. Suarez, M. Ugarte, and D. Vrgoč. Foundations of JSON Schema. In WWW 2016, pages 263–273, 2016.