<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>RDF Stream Processing Prototyping with Streaming MASSIF</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Streaming MASSIF Prototypes</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IDLab, Ghent University - imec</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Stream Reasoning (SR) is the research field that aims to solve the heterogeneity, noisiness, and incompleteness problems that come with the processing of continuously produced data. The use of Semantic Web technologies allows to mitigate many of these problems. RDF Stream Processing (RSP), a subfield of SR, specifically deals with the processing of RDF data streams. Many RSP engines and solutions exist, however, none provide the flexibility to rapidly create a prototype for new RSP applications or aid in the design of new components. In this demonstration paper, we showcase Streaming MASSIF Prototypes, a RSP prototyping platform that provides the necessary tooling to effortlessly create new RSP applications. Furthermore, the available tooling and the inherent flexibility of the platform allow researchers and developers to fully focus on the design of new RSP components, hiding superfluous complexities and configurations.</p>
      </abstract>
      <kwd-group>
        <kwd>Stream Reasoning</kwd>
        <kwd>Prototyping</kwd>
        <kwd>RDF Stream Processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Data is being produced at a volume and velocity that out limits the ability to consume
the data as a whole [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Therefore, data intensive applications consume data
continuously as it is being produced. However, data (streams) typically cope with challenges
of heterogeneity, noisiness, and incompleteness, which the Stream Reasoning (SR)
research field aims to solve [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The use of Semantic Web technologies allows to mitigate
many of these problems. RDF Stream Processing (RSP), a subfield of SR, focusses on
processing RDF data streams in particular. Many RSP engines and solutions exist [
        <xref ref-type="bibr" rid="ref1 ref2 ref6">1, 6,
2</xref>
        ], however, lacking the tooling for easy prototyping of RSP applications and limiting
the development and evaluation of new RSP components. RSP applications are typically
composed of multiple parts, e.g. a windowing function to chop the continuous stream
in processable chunks, a SPARQL engine for selecting elements from the streams,
(expressive) reasoning for inferring implicit and missing facts, and optionally temporal
reasoning for detecting time-related dependencies [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In this paper, we demonstrate
Streaming MASSIF Prototypes, an easy-to-use RSP prototyping platform, built upon
the Streaming MASSIF platform [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Our prototyping platform offers the tooling to
easily compose new RSP applications and provides the flexibility to focus on building
new RSP components. We have decomposed various RSP applications and engines in
various abstract building blocks. These building blocks can be composed in any arbitrary
order, allowing the creation of a flexible RSP processing graph. With Streaming
MASSIF Prototypes, RSP applications can easily be created because we offer the necessary
tooling to get started and focus on the definition of the business logic. New components
can easily be integrated as we provide abstractions, flexible interface, and monitoring
functionality out of the box.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>State of the art</title>
      <p>Looking at most RSP engines and applications we can see that they can be decomposed
into some abstract building blocks:</p>
      <p>
        C-SPARQL [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] first windows the data stream into processable chunks, optionally
allows performing RDFS reasoning to infer missing facts while selecting data from the
window using a SPARQL engine.
      </p>
      <p>
        CQELS [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] integrates the windowing directly into the evaluation of the queries.
      </p>
      <p>
        CityPulse [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] uses CQELS to select data from the data streams, ASP to reasoning
upon the selection, and CEP to enable temporal reasoning.
      </p>
      <p>
        Streaming MASSIF [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] uses C-SPARQL to select data from the stream, a Description
Logic (DL) reasoner to perform expressive reasoning and abstract events and CEP on
top of the abstractions.
      </p>
      <p>Many more engines exist, most can be abstracted to the following building blocks:
– Windowing: a function to divide the continuous stream into processable chunks.
– Selection: a selection function a select and filter data according to a registered query.
– Abstraction: a reasoning component to infer implicit and missing facts.
– Temporal: a temporal reasoning component to infer temporal dependencies.</p>
      <p>However, none of the existing engines or platforms provide any tools to easily load
different data sources, handle the results, monitor the different building blocks, add
some flexibility to reorder the building blocks, or add new components. We provide the
needed tooling for easy prototyping new RSP applications and components.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Stream Reasoning Prototyping</title>
      <p>To enable easy prototyping and integration of new RSP components, Streaming
MASSIF Prototypes has been built on top of the following principles:
1. Flexible interfaces: in order to integrate new components, or compose building
blocks in any order, interfacing between components should be flexible and
transparent. Inspired on the flexibility of the Unix pipelines, we chose the use text as
interface. This allows to easily plugin new components. Throughout the platform,
we support semantic data using the Turtle serialization (or column separated tabular
data if data has not been mapped to a semantic model yet).
2. Abstract components: all components are being abstracted on the highest level
to PipeLineComponents, this allows that upstream or downstream components can
be decoupled and are not depending on any specific implementation. To make
prototyping easier, we support the following components:
(a) Sources: the data ingestion points that load the data streams into the platform.</p>
      <p>Support for Kafka, HTTP Post and get (with configurable timeouts), websockets
and reading from file (with configurable timeouts).
(b) Windows: function to window the data stream in processable chunks. Support
for sliding and tumbling windows. Current implementation: Esper1.
(c) Filters: allow to select and filter certain parts from the data stream through</p>
      <p>
        SPARQL-queries. Current implementation: Apache Jena2.
(d) Abstractors: reasoning step that allows to infer implicit facts and abstract events
to high-level concepts. Current implementations: HermiT [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for expressive
ontology reasoning and C-Sprite [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for RDFS-reasoning.
(e) CEP: Complex Event Processing (CEP) module that allows detecting temporal
patterns. Currently through Esper.
(f) Mappers: Mapping raw data to semantic data.
(g) Enrichment: allows to enrich the data stream with static data.
(h) Sinks: functions as the end-point of the data stream. Support for Kafka, HTTP,
      </p>
      <p>WebSockets, writing to file, and print sinks (to terminal or web interface).
The benefit of these abstractions is that multiple implementations are possible. Note
that additional components can easily be integrated. Figure 1 depicts an example
pipeline configuration using the abstract components.
3. Monitoring: As depicted in Figure 1, each component is wrapped in a
PipeLineComponent that has its own queue for asynchronous processing of the components, and
a monitoring layer to allow easy evaluation of the component’s metrics. The metrics
can be accessed through an HTTP endpoint and contain the processing times, the
number of events in the queue, and the throughput.
4. Processing Graphs: Each component has the flexibility to have multiple inputs and
outputs, allowing the creation of a processing graph for RSP applications.
5. Graphical Interface: To simplify the creation of the processing graph, we provide
a drag and drop web interface that allows to compose RSP workflows. Figure 2
visualizes an example composition on the right pane. The left pane shows the
registered query of the filter component and its monitoring details. Alternatively,
the configuration can be loaded in JSON-LD format.
1 http://www.espertech.com/
2 https://jena.apache.org/</p>
    </sec>
    <sec id="sec-4">
      <title>Demonstrator</title>
      <p>In this demonstrator, we show how RSP applications, consisting of various components,
can be built using the workflow composer. The demonstrator shows how complex
applications can be built through drag and drop interaction and declarative definitions of the
components. It shows how we can be debug RSP applications by investigating
intermediate results of each of the components in the user interface and how each component
can be monitored in order to detect bottlenecks. A video of the demonstrator can be
found here3.</p>
      <p>Acknowledgments: Pieter Bonte is funded by a postdoctoral fellowship of Fonds
Wetenschappelijk Onderzoek Vlaanderen (FWO) (1266521N).
3 https://github.com/IBCNServices/StreamingMASSIF/wiki/Streaming-MASSIF-Prototypes</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Barbieri</surname>
            ,
            <given-names>D.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Braga</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ceri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>VALLE</surname>
            ,
            <given-names>E.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grossniklaus</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>C-sparql: a continuous query language for rdf data streams</article-title>
          .
          <source>International Journal of Semantic Computing</source>
          <volume>4</volume>
          (
          <issue>01</issue>
          ),
          <fpage>3</fpage>
          -
          <lpage>25</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bonte</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tommasini</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Turck</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ongenae</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valle</surname>
          </string-name>
          , E.D.:
          <article-title>C-sprite: Efficient hierarchical reasoning for rapid rdf stream processing</article-title>
          .
          <source>In: Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems</source>
          . pp.
          <fpage>103</fpage>
          -
          <lpage>114</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bonte</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tommasini</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Della</given-names>
            <surname>Valle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>De Turck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Ongenae</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>Streaming massif: cascading reasoning for efficient processing of iot data streams</article-title>
          .
          <source>Sensors</source>
          <volume>18</volume>
          (
          <issue>11</issue>
          ),
          <volume>3832</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dell'Aglio</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Della</given-names>
            <surname>Valle</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>van Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Stream reasoning: A survey and outlook</article-title>
          .
          <source>Data Science</source>
          <volume>1</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>59</fpage>
          -
          <lpage>83</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Hermit: an owl 2 reasoner</article-title>
          .
          <source>Journal of Automated Reasoning</source>
          <volume>53</volume>
          (
          <issue>3</issue>
          ),
          <fpage>245</fpage>
          -
          <lpage>269</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Le-Phuoc</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dao-Tran</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parreira</surname>
            ,
            <given-names>J.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hauswirth</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A native and adaptive approach for unified processing of linked streams and linked data</article-title>
          .
          <source>In: International Semantic Web Conference</source>
          . pp.
          <fpage>370</fpage>
          -
          <lpage>388</lpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Puiu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barnaghi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tönjes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kümper</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mileo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parreira</surname>
            ,
            <given-names>J.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolozali</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farajidavar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , et al.:
          <article-title>Citypulse: Large scale data analytics framework for smart cities</article-title>
          .
          <source>IEEE Access 4</source>
          ,
          <fpage>1086</fpage>
          -
          <lpage>1108</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Tommasini</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvaresi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calbimonte</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          :
          <article-title>Stream reasoning agents: Blue sky ideas track</article-title>
          .
          <source>In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems</source>
          . pp.
          <fpage>1664</fpage>
          -
          <lpage>1680</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>