Querying Dynamic Datasources
              with Continuously Mapped Sensor Data?

                               Ruben Taelman, Pieter Heyvaert,
                              Ruben Verborgh, and Erik Mannens

                           Data Science Lab (Ghent University - iMinds)
                                  firstname.lastname@ugent.be


         Abstract. The world contains a large amount of sensors that produce new data at
         a high frequency. It is currently very hard to find public services that expose these
         measurements as dynamic Linked Data. We investigate how sensor data can be
         published continuously on the Web at a low cost. This paper describes how the
         publication of various sensor data sources can be done by continuously mapping
         raw sensor data to rdf and inserting it into a live, low-cost server. This makes it
         possible for clients to continuously evaluate dynamic queries using public sensor
         data. For our demonstration, we will illustrate how this pipeline works for the
         publication of temperature and humidity data originating from a microcontroller,
         and how it can be queried.


Keywords: Linked Data, Linked Data Fragments, rml, sparql, dynamic data


1     Introduction
Countless sensors that are connected to the Internet produce various streams of continuous
data. Even though there are many sensors, no public interfaces are available which these
measurements can be consumed as dynamic Linked Data [2], which is essential when
these measurements need to be linked to related information.
    Dynamic Linked Data can be published through rdf Stream Processing (rsp) query
engines, like c-sparql [1] and cqels [9]. These rsp query engines are used to publish this
sensor data using an extended sparql [7] endpoint that enables continuous querying over
this data. These engines require expensive machines to host such data on the Web, because
queries can be complex, are continuous and originate from an unlimited number of
clients. These expensive solutions limit the development of sensor-based applications [4].
    In this work, we present a continuous pipeline for publishing dynamic sensor data
using a low-cost Triple Pattern Fragments (tpf) [11] server. This publication makes it
possible for evaluating queries continuously at the client using a tpf Query Streamer [10].
In order to demonstrate this pipeline, we publish and query the continuous data from a
temperature and humidity sensor. In the demonstration, we will show the different steps
conducted on the data, starting from the raw measurements until the final query results.
? The described research activities were funded by Ghent University, iMinds, the Institute for the
    Promotion of Innovation by Science and Technology in Flanders (IWT), the Fund for Scientific
    Research Flanders (FWO Flanders), and the European Union.
2     Architecture
Our continuous pipeline for publishing sensor data consists of a data reader, mapper and
publisher. These components connect a sensor with a dynamic data source, as can be seen
in Fig. 1. The reader receives data from a sensor. The mapper converts this data to rdf
using the rdf Mapping Language (rml) [6]. Finally, the publisher inserts the dynamic
rdf triples into a tpf server. These three components will be explained hereafter.


      Sensor             Reader              Mapper             Publisher         Data source

               Fig. 1: Overview of the continuous pipeline for publishing sensor data.


2.1    Reader
For this demonstration, we use a Tessel1 microcontroller to capture sensor data from
the environment. Tessel provides a Node.js2 module that can be used to listen to sensor
measurements from the device using JavaScript events. json objects are emitted for each
measurement at a configurable rate. These objects contain all required data to describe
each measurement.

2.2    Mapper
The mapper component [8] is responsible for converting the json output that it receives
from the reader component to rdf. This conversion is done using the rmlmapper 3. For
each measurement, the rmlmapper reads the rml mapping file, maps the json object to
rdf and forwards the result to the publisher component. The mapping file depends on the
type of sensor that is being used, because not all sensors output the same data in the same
structure and format and the data is not necessarily annotated using the same ontologies.

2.3    Publisher
For the final step in the continuous pipeline, the publisher component takes the measure-
ment data in rdf and converts it to dynamic data. The publisher does this by adding
a time annotation to it, as we described in previous work [10]. We chose to represent
our time annotations as expiration times and we only keep the latest version of each
measurement, because for this demonstration only the latest sensor measurements are
needed. Our system alternatively also supports storing all measurement versions, which
enables historical analysis. We serialized our annotations as rdf graphs [5], because
we showed that this is the most efficient method for time annotation [10]. Finally, each
time-annotated measurement is inserted into the tpf server and the previous measurement
for that sensor is removed.
 1 https://tessel.io/
 2 https://nodejs.org/
 3 https://github.com/RMLio/RML-Mapper
3   Demonstration Overview

We used our pipeline to publish data that is being created by a Tessel sensor that reads
temperature and humidity data. The reader component starts by emitting measurements
in json, like the following.
    { "humidity"   : { "value":47.0815124512 },
      "device"     : "TM-00-04-f000da30-0062473d-20a82586",
      "module"     : "climate",
      "temperature": { "value":30.0167749023 } }

In order to add semantical value to the measurements, the mapper component maps each
json object to rdf.
    <http://www.example.com/TM-00-04-f000da30-0062473d-20a82586>
      a <https://w3id.org/saref#Sensor> ;
      <http://vocab.datex.org/terms#humidity> "47.0815124512"^^xsd:double ;
      <http://dbpedia.org/ontology/temperature> "30.0167749023"^^xsd:double .

Next, the triples are time-annotated by the publisher component to make the triples valid
up until a certain time. This is represented with the TriG [3] serialization.
    _:g12 {
      <http://www.example.com/TM-00-04-f000da30-0062473d-20a82586>
        a <https://w3id.org/saref#Sensor> ;
        <http://vocab.datex.org/terms#humidity> "47.0815124512"^^xsd:double ;
        <http://dbpedia.org/ontology/temperature> "30.0167749023"^^xsd:double .
    }
    _:g12 <http://example.org/temporal#expiration> "2016-06-27T09:48:47.808Z"^^<
         xsd:dateTimeStamp> .

After the data is published, it can be consumed by clients. We use the client-side tpf
Query Streamer to continuously evaluate the following sparql query in order to fetch
the published measurements from our pipeline.
    PREFIX dat: <http://vocab.datex.org/terms#>
    PREFIX saref: <https://w3id.org/saref#>
    PREFIX dbp: <http://dbpedia.org/ontology/>

    SELECT ?temperature ?humidity WHERE {
        ?sensor a saref:Sensor;
                dat:humidity ?humidity;
                dbp:temperature ?temperature.
    }

This query selects the temperature and humidity data from all sensors. The tpf Query
Streamer is able to interpret this query as being dynamic, so it will automatically
re-evaluate the query from the moment new measurements become available. The query
can for example show the following intermediary result.
    Temperature: "30.0167749023"         Humidity: "47.0815124512"
These continuously updating query results could, instead of printing it as text, be used as
input to various types of applications.
    During the demo, we will show the different steps conducted on temperature and
humidity data measurements by the different components of the pipeline and the query
results. These steps will be shown as continuously updating log outputs for each
component. The Tessel microcontroller will be part of the demonstration, this will make
it possible for users to modify the temperature or humidity and immediately see the
changes in the query output. We recorded a screencast4 containing these live logs for
temperature and humidity data. The source code of our pipeline prototype can be found
at https://github.com/mmlab/demo-tessel-continuous-datasource.
    The demonstration illustrates the simplicity of hosting any kind of sensor data using
this technique. For future work, this solution can be improved and extended to support
all kinds of sensors and sensor streams. The pipeline could be extended to support
composing different pipelines and streams in order to create more complex dynamic data.
    With this demonstration we aim to show that publishing dynamic Linked Data does
not have to be complex or costly. We hope to trigger more efforts for making dynamic data
publicly available, in order to promote the development of more dynamic applications.


References
 1. Barbieri, D.F., Braga, D., Ceri, S., Valle, E.D., Grossniklaus, M.: Querying rdf streams with
    c-sparql. SIGMOD Rec. 39(1), 20–26 (Sep 2010)
 2. Berners-Lee, T.: Linked Data (July 2006), http://www.w3.org/DesignIssues/LinkedData.
    html
 3. Bizer, C., Cyganiak, R.: rdf 1.1 TriG. Rec., W3C (Feb 2014), https://www.w3.org/TR/trig/
 4. Corcho, O., García-Castro, R.: Five challenges for the Semantic Sensor Web. Semantic Web
    1(1, 2), 121–125 (2010)
 5. Cyganiak, R., Wood, D., Lanthaler, M.: rdf 1.1: Concepts and abstract syntax. Recommenda-
    tion, W3C (Feb 2014), http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/
 6. Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: rml:
    A generic language for integrated rdf mappings of heterogeneous data. In: LDOW (2014)
 7. Feigenbaum, L., Todd Williams, G., Grant Clark, K., Torres, E.: sparql 1.1 protocol. Rec.,
    W3C (Mar 2013), http://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/
 8. Heyvaert, P., Taelman, R., Verborgh, R., Mannens, E.: Linked Sensor Data generation using
    queryable rml mappings. In: Proceedings of the 2016 International Semantic Web Conference
    Posters & Demonstrations Track (2016), accepted for publication
 9. Le-Phuoc, D., Dao-Tran, M., Parreira, J.X., Hauswirth, M.: A native and adaptive approach
    for unified processing of linked streams and Linked Data. In: The Semantic Web–ISWC 2011,
    pp. 370–388 (2011)
10. Taelman, R., Verborgh, R., Colpaert, P., Mannens, E., Van de Walle, R.: Continuously updating
    query results over real-time Linked Data. In: Proceedings of the 2nd Workshop on Managing
    the Evolution and Preservation of the Data Web (May 2016)
11. Verborgh, R., Vander Sande, M., Hartig, O., Van Herwegen, J., De Vocht, L., De Meester,
    B., Haesendonck, G., Colpaert, P.: Triple Pattern Fragments: a low-cost knowledge graph
    interface for the Web. Journal of Web Semantics 37–38, 184–206 (2016)


 4 https://vimeo.com/172409187