=Paper=
{{Paper
|id=Vol-2063/sisiot-paper1
|storemode=property
|title=Towards Federated Queries for Web of Things Devices
|pdfUrl=https://ceur-ws.org/Vol-2063/sisiot-paper1.pdf
|volume=Vol-2063
|authors=Sejal Jaiswal,Maxime Lefrançois
|dblpUrl=https://dblp.org/rec/conf/i-semantics/JaiswalL17
}}
==Towards Federated Queries for Web of Things Devices==
Towards Federated Queries for Web of Things Devices
Sejal Jaiswal Maxime Lefrançois
Université. Jean Monnet, CNRS, Univ. Lyon, Mines Saint-Étienne, CNRS, Laboratoire
F-42023 Saint-Étienne, France Hubert Curien UMR 5516,
cj.sejal@gmail.com F-42023 Saint-Étienne, France
maxime.lefrancois@emse.fr
ABSTRACT with a different RDF dataset. SPARQL-LD [5] extends the applica-
In the emerging Web of Things (WoT), a vast majority of devices do bility of the SERVICE operator to RDF Sources so as to exploit the
not consume nor produce RDF, notably because of their inherent Web of Linked Data: the SPARQL engine attempts to retrieve the
constraints that prevent them from manipulating textual RDF syn- RDF Graph located at the endpoint URL, and executes the portion
tax. However, it would be relevant to query against these sources of the query against this RDF Graph.
irrespective of the (lightweight) formats they use, combined with In this paper, we propose to extend this solution further for
other Linked Data. In fact the data these WoT devices expose con- querying also non-RDF Web resources for which some RDF lifting
tain crucial real-time information, and being able to tap directly mechanism is known. This allows one to semantically query WoT
into this information could enable smarter industrial applications. devices while allowing them to expose the data in the format and
In this paper, we are interested in querying indifferently SPARQL structure they prefer. Doing so also allows for standardized access
endpoints, RDF documents, and non-RDF document exposed by to all formats of data through the use of SPARQL. This shall be help-
WoT devices in a flexible and extensible way. The core of this solu- ful for the current industry to adapt to the principles of Semantic
tion is an extension of SPARQL-LD. The SPARQL SERVICE clause Web without having to change much their existing solutions. Such
is extended and can be used to query also non-RDF sources for an extended SERVICE clause allows to write queries with portions
which we know some RDF lifting mechanism. that are explicitly targeting some specific WoT devices that host
their own HTTP server. This allows for low-level querying scenar-
ios such as “What temperature value does sensor x observe?”. On
the other hand, we argue that a higher level of abstraction could
be beneficial for other querying scenarios such as “What is the
temperature on the second floor?”.
The rest of the paper is organized as follows: Section 2 introduces
a motivating example for the proposed solution. Section 3 discusses
1 INTRODUCTION related work done in the field. Section 4 describes the proposed
extension of SPARQL-LD. Section 5 describes how the proposed
Even though RDF was adopted as a W3C recommendation for data
solution can be integrated with principles of query federation to
interchange on the Web, not all data producer in the IoT/WoT in-
query non-RDF data sources as well. Section 6 reuses the moti-
dustry follow RDF as a standard model. Constrained devices or
vating example to explain the working of the solution. Section 7
things on the WoT ecosystem tend to prefer lightweight formats,
discusses in brief the evaluation, implementation. Finally, Section 8
mostly binary such as EXI1 or CBOR2 due to their inherent band-
concludes the paper and suggests future works planned for further
width, memory, storage, and/ or battery constraints. However, it
enhancement of the solution.
would be relevant to query against these sources irrespective of the
formats they use, combined with other Linked Data. In fact the data
2 MOTIVATING EXAMPLE
these WoT devices expose contain crucial real-time information,
and being able to tap directly into this information could enable Consider a WoT-enabled smart office3 with two floors. For simplic-
smarter industrial applications. ity, we consider the case of only the 2nd Floor with just one room
The SPARQL query language enables to retrieve and manipulate that has connected sensors and actuators. Floor 2 () houses a
a RDF dataset, which consists of a default graph, and set of named heater that exposes temperature property () and
graphs. SPARQL 1.1 Federated Query [14] introduces the SERVICE an occupancy sensor with occupancy property ().
clause, thanks to which a query author can direct a portion of a We assume that some description of the building and the devices is
query to a particular SPARQL endpoint that is potentially working available in a Data Catalogue [Section 5.1] and the data generated
1 https://www.w3.org/TR/exi/
by the devices and it’s sensors is hosted on some URL. Listing be-
2 https://tools.ietf.org/html/rfc7049 low is a condensed version of the Data Catalogue and shows some
details for Floor 2. The namespaces are those available with the
online service http://prefix.cc/.
a seas:Room , sosa:FeatureOfInterest ;
ssn:hasProperty ;
© 2017 Copyright held by the author/owners. seas:onFloor 2 .
SEMANTiCS 2017 workshops proceedings: SIS-IoT
September 11-14, 2017, Amsterdam, Netherlands 3 https://w3c.github.io/wot-architecture/
SIS-IoT’17, July 2017, Amsterdam, Netherlands Sejal Jaiswal and Maxime Lefrançois
a sosa:ObservableProperty, based on information from many different sources. In query fed-
seas:TemperatureProperty . eration, a user query is transformed into several sub-queries and
a sosa:Sensor, rdfp:RDFSource ;
the result is generated by combining the intermediate results from
sosa:observes ;
rdfp:describedBy [ the integrated data sources. There are studies done explaining the
a rdfp:GraphDescription ; infrastructure of federation query [6] as well as studies focused on
rdfp:validationRule """ the basis of federation query processing strategy [9].
?temp_obs a sosa:Observation ; SPARQL-LD [5] is an extension of SPARQL 1.1 Federated Query
sosa:hasFeatureOfInterest ?room ;
sosa:observedProperty ?temperature ;
[14] that exploits the Web of Linked Data by extending the applica-
sosa:hasSimpleResult ?temp_value ."""^^ex:bgp ; bility of the SERVICE operator. This enables to query any HTTP
rdfp:presentedBy [ a rdfp:GraphPresentation ; Web source containing RDF data (what is called RDF Source in
rdfp:mediaType "application/json" ; RDF 1.1).We shall use an extended version of SPARQL-LD to exploit
rdfp:liftingRule ] .
the real-time and dynamic nature of Linked Data in our proposed
We want to allow an end user to query the devices by launching solution.
a SPARQL query such as the one in Listing below without having to One of the primary visions of Semantic Web is to enable ma-
worry about the various data formats used by the devices/sensors or chines to exchange and process web content easily. This vision is
even the distributed nature of the data sources. This query should hampered by the coexistence and usage of many heterogeneous
answer the question: “What are the rooms that have the property data formats and models. For data conversion from various formats
‘temperature’ and ‘occupancy’? What values do the sensors that have to RDF (called RDF lifting) which can then be easily queried using
these property depict?” SPARQL, we have used principles of SPARQL-Generate [11, 12] with
respect to constrained devices that produce binary data. SPARQL-
SELECT ?room WHERE { Generate is based on SPARQL and leverages its expressible and
?room a seas:Room , sosa:FeatureOfInterest ;
ssn:hasProperty ?temperature, ?occupancy ;
extensible nature to be able to support RDF lifting for new data for-
seas:onFloor ?floor . mats. Although we chose to use SPARQL-Gnerate, one could design
?temperature a sosa:ObservableProperty, seas:TemperatureProperty . similar solutions that make use of any other existing RDF-lifting
?occupancy a sosa:ObservableProperty, seas:OccupancyProperty . mechanisms such as JSON-LD contexts [16], RML mappings [4], or
?temp_obs a sosa:Observation ;
XSPARQL rules [13].
sosa:hasFeatureOfInterest ?room ;
sosa:observedProperty ?temperature ; The benefits of extending SPARQL-LD with the principles of RDF
sosa:hasSimpleResult ?temp_value . lifting is that, we shall be able to integrate in the same SPARQL
?occ_obs a sosa:Observation ; query: i) RDF data stored in RDF dataset, ii) data from SPARQL
sosa:hasFeatureOfInterest ?room ; endpoints, iii) RDF data fetched from any RDF source (in any of
sosa:observedProperty ?temperature ;
sosa:hasSimpleResult ?occ_value .
the RDF syntax) and iv) non-RDF data obtained in any arbitrary
} data format, but for which a RDF lifting mechanism is known.
There are multiple problems to be tackled here, such as the
physical setup during deployment, setting up APIs to access the
sensor’s data along with the content lifting rule [12], publishing of 4 QUERYING DOCUMENTS IN ARBITRARY
the data catalogue, etc. However our major focus for this paper is FORMATS
integrating the proposed solution with query federation principles, In order to query one or more SPARQL Protocol services, one
such that it allows us to query the light-weight format used by the can use the principles of the SPARQL 1.1 the SERVICE operator
sensors and the devices to expose their data. for federated SPARQL queries [3]. SPARQL 1.1 Federated Query
allows for combining group graph patterns that are to be evaluated
3 RELATED WORK over several SPARQL Protocol services within a single query. The
To execute queries over the Web of Linked Data, two main infras- endpoints of the services to be queried are provided as parameters
tructure exist based on the data source location: central repository to the SERVICE operator.
and distributed repository. With central repository, query service However, for data that is published/available in a RDF format
is provided over a repository where data is collected from various but not necessarily set up through an endpoint, we make use of
sources on the Web. For distributed repository, the data need not be SPARQL-LD to directly access and exploit the RDF graphs. SPARQL-
available at a single location for query service. Distributed method LD has an extended SERVICE definition that tries to fetch and query
of data access can be further divided into two different querying the RDF triples that may exist in the given resource at execution
approaches: Link Traversal [8] and federation [6]. In link traversal, time.
data is discovered by following HTTP URIs. Link Traversal could SPARQL-LD does not cater to resources that do not have a RDF
also be an efficient method for query federation [1], however the representation. Hence we make an extension to SPARQL-LD, en-
scenario we work with will not be able to exploit the advantages abling the use of non-RDF web documents published by constrained
of Link Traversal fully due to the lack of referenced links within devices that host their own HTTP server. Our implementation uses
a link to identify further data. Hence, we choose only Federated the Content-Lifting-Rule HTTP response header field as defined
Query principles for federating the results from various sources. in [10]. The value of this parameter is an absolute URI that iden-
Federated query is the ability to take a query and provide solutions tifies some RDF lifting mechanism (SPARQL-Generate, JSON-LD
Towards Federated Queries for Web of Things Devices SIS-IoT’17, July 2017, Amsterdam, Netherlands
Context, RML Mapping, etc.). Our extension of SPARQL-LD imple- transform the data defined in the lifting rule and execute the sub-
ments the support of lifting rules expressed as SPARQL-Generate query against the resulting RDF graph. The obtained intermediate
queries [11]. This allow us to execute portions of a query to the results are then federated and the final result is passed back to the
RDF generated from lightweight documents in arbitrary formats user.
exposed by constrained WoT devices.
Such a lifting rule could be hosted on the device manufacturer’s 6 USAGE SCENARIO: EXAMPLE
website for example. In the Sources Selection phase, we learn from the data catalogue
that not all triple patterns of the input query can be answered
5 QUERY FEDERATION from a single data source. Hence, we issue a SELECT query to
The proposed solution is described and integrated within the 3 determine what triples can be answered by each of the data sources.
main phases of any query federation engine: Sources Selection, This particular data source is added in the SERVICE clause of the
Federated Query Formation and Federated Query Execution as sub-query. In the Federated Query Formation phase, we use the
shown in Figure 1. information drawn from the Sources Selection phase to put together
the triples aimed for the same data sources and this creates a sub-
5.1 Sources Selection query. However, the SERVICE clause might not necessarily be a
It would not be efficient to send every piece of query to all the data SPARQL endpoint, such a case is the major focus of the proposed
sources, we need to determine the relevant data sources. In fact, it solution. If the solution is in a format not handled by SPARQL 1.1
is crucial in constrained environments to preserve the longevity of or SPARQL-LD, we look for lifting rule information in the data
the devices (battery life) and the bandwidth. For the solution we catalogue or directly in the HTTP response header field Content-
propose, we assume contextual information about the devices and Lifting-Rule. This lifting rule is used during the Federated Query
what they expose is available in a Data Catalogue. Such a Data Cat- Execution phase, and the intermediate results are then federated to
alogue could be constructed by the dataset publisher, the installer form the final result.
of the devices, or automatically thanks to automatic registration As a real-world use case, the end user can use the result to launch
from the devices. We issue queries against this catalogue to identify another command to change the temperature through the heater
which of the data sources are relevant for a particular part of the in Room 2 based on the occupancy sensor data.
query. More precisely, we suppose that each data source is linked
to some Basic Graph Pattern (i.e., a RDF Graph with variables) that 7 EVALUATION AND IMPLEMENTATION
describes the type of RDF graph that would be the result of lifting We ran 100 tests against various data sources. Figure 2 shows the
the document retrieved at any time. This would allow us to check run-time for the tests in increasing order. 50 tests were run against
whether a source has a partial solution to the high level query. It direct SPARQL endpoints and RDF sources and another 50 tests
is worth noting here, that many other mechanisms exists[7, 15] to against data sources in arbitrary formats. The information for all
identify proper data sources other than a Data Catalogue and the data sources was provided through a Data Catalogue.
proposed solution can be adjusted with respect to the meta-data As expected, data sources with either SPARQL endpoint or RDF
source available. formats have less run-time in general as compared to the arbitrary
data format sources. The results are highly affected by the network
5.2 Federated Query Formation status and the number of calls made to the Data Catalogue and
most importantly, the number of endpoints present and the time
In the Federated Query Formation phase, we decompose the input
taken for lifting the data to RDF as well as the size of the data. The
query and build a new query with the union of multiple SERVICE
average run-time for querying against SPARQL endpoints, RDF
clauses that are to be issued to the source endpoints selected in the
sources is noted as 30.31 seconds and the average run-time against
previous phase: that means those (i) whose context is relevant (e.g.,
arbitrary formats is noted as 38.02 seconds.
that are on the floor one want to query the temperature of), and (ii)
The experiments were run on a computer with processor Intel
capable of providing some relevant information. Each sub-query
Core i5 @2.5 GHz CPU, 4GB RAM and running macOS Sierra (64
is built combining the biggest subset possible that is common to
bit). The implementation is done in Java 1.8.
the Basic Graph Pattern the source exposes, and the Basic Graph
The extension of SPARQL-LD to leverage the RDF Presentation
Pattern in the input Query. Grouping several triples together like
protocol is implemented as a fork of Apache Jena v3.3, and is avail-
reduces the number of look-up to the same source and minimizes
able on GitHub4 . The motivating example is provided as a test-case.
the intermediate join process.
We aim to get better results on a more advanced system and using
optimization techniques as mentioned in the future works.
5.3 Federated Query Execution
In the Federated Query Execution phase, the sub-queries built in the 8 CONCLUSION AND FUTURE WORK
Federated Query Formation phase are executed upon the relevant
The problem of exploiting data from heterogeneous sources and
sources as identified in the Sources Selection phase. This phase
formats is common in the linked data world. In this paper, we
involves the use of the extended SERVICE clause as described in
have proposed a solution that draws benefits of SPARQL-LD and
Section 4. If the endpoint/document to be queried against is not in
is able to directly fetch and query RDF data from any HTTP Web
RDF format, we launch a GET request to get the lifting rule and the
document whose content are to be lifted or transformed. We then 4 https://github.com/thesmartenergy/sparql-ld-extended
SIS-IoT’17, July 2017, Amsterdam, Netherlands Sejal Jaiswal and Maxime Lefrançois
Figure 1: Proposed solution integrated in Query Federation Phases
Future work planned for the solution includes implementing the
federated query optimization techniques [7, 15] and to include more
test-cases.
REFERENCES
[1] F. Alahmari. Evaluating SPARQL using query federation and link traversal. In
Digital Information Management (ICDIM), 2011 Sixth International Conference on,
pages 79–84. IEEE, 2011.
[2] T. Berners-Lee. Linked data. https://www.w3.org/DesignIssues/LinkedData.html,
jul 2006. Last Accessed: 2017/05/02.
[3] C. Buil-Aranda, M. Arenas, O. Corcho, and A. Polleres. Federating queries in
SPARQL 1.1: Syntax, semantics and evaluation. Web Semantics: Science, Services
and Agents on the World Wide Web, 18(1):1–17, 2013.
[4] A. Dimou, M. V. Sande, P. Colpaert, R. Verborgh, E. Mannens, and R. V. de Walle.
RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data.
In Proceedings of the Workshop on Linked Data on the Web, co-located with the
23rd International World Wide Web Conference (WWW 2014), Seoul, Korea, 2014.
[5] P. Fafalios and Y. Tzitzikas. SPARQL-LD: a SPARQL Extension for Fetching
and Querying Linked Data. In International Semantic Web Conference (Posters &
Figure 2: Run time against various Data Sources Demos), 2015.
[6] O. Görlitz and S. Staab. Federated data management and query optimization for
linked open data. In New Directions in Web Data Management 1, pages 109–137.
Springer, 2011.
document. The solution comprises an extension of SPARQL-LD [7] O. Görlitz and S. Staab. Splendid: SPARQL endpoint federation exploiting void
that allows to also query data that is not directly presented in a RDF descriptions. In Proceedings of the Second International Conference on Consuming
format, but for which a RDF lifting rule is known. We then fetch the Linked Data-Volume 782, pages 13–24. CEUR-WS. org, 2011.
[8] O. Hartig, C. Bizer, and J. C. Freytag. Executing SPARQL queries over the web of
URI of the RDF lifting rule along with the document, lift the data linked data. The Semantic Web-ISWC 2009, pages 293–309, 2009.
to RDF format and then querying the resultant RDF Graph. This [9] G. Ladwig and T. Tran. Linked data query processing strategies. The Semantic
Web–ISWC 2010, pages 453–469, 2010.
solution allows us to exploit the dynamic nature of data sources [10] M. Lefrançois. Interopérabilité sémantique libérale pour les services et les objets.
such as light-weight sensors or devices. We have discussed exten- In Actes de la 17ème conférence Extraction et Gestion des Connaissances (EGC’17),
sively the working principles and phases of the proposed solution Grenoble, France, Jan. 2017.
[11] M. Lefrançois, A. Zimmermann, and N. Bakerally. A SPARQL extension for
through the use of a motivating example, which is very close to a generating RDF from heterogeneous formats. In Proc. Extended Semantic Web
real world use-case scenario. Conference (ESWC’17), Portoroz, Slovenia, May 2017.
The proposed solution has the potential to spawn research direc- [12] M. Lefrançois, A. Zimmermann, and N. Bakerally. Flexible RDF generation from
RDF and heterogeneous data sources with SPARQL-generate. In Proc. of the 20th
tions towards a plethora of exciting new use cases and services as International Conference on Knowledge Engineering and Knowledge Management
well as contribute towards the larger picture of flexible and scalable (EKAW’16), 2016.
[13] A. Polleres, T. Krennwallner, N. Lopes, J. Kopecký, and S. Decker. XSPARQL
semantic interoperability for devices and services on the Web of Language Specification. W3C Member Submission, Jan. 20 2009.
Things or the Internet of Things at large, making these devices [14] E. Prud’hommeaux, C. Buil-Aranda, et al. SPARQL 1.1 federated query. W3C
seemingly follow the principles of Linked Data [2]. Recommendation, 21, 2013.
[15] B. Quilitz. DARQ–Federated Queries with SPARQL, 2007.
The HTTP header field Content-Lifting-Rule we use could be worth [16] M. Sporny, G. Kellogg, and M. Lanthaler. A JSON-based Serialization for Linked
being standardized at the W3C to also include non-native RDF seri- Data. W3C Recommendation, Jan. 16 2014.
alizations as Linked Data sources.