=Paper=
{{Paper
|id=Vol-1963/paper487
|storemode=property
|title=Yasper 1.0: Towards an RSP-QL Engine
|pdfUrl=https://ceur-ws.org/Vol-1963/paper487.pdf
|volume=Vol-1963
|authors=Riccardo Tommasini,Emanuele Della Valle
|dblpUrl=https://dblp.org/rec/conf/semweb/0001V17a
}}
==Yasper 1.0: Towards an RSP-QL Engine==
Yasper 1.0: Towards an RSP-QL Engine Riccardo Tommasini1 and Emanuele Della Valle1 Politecnico di Milano, DEIB, Milan, Italy {riccardo.tommasini,emanuele.dellavalle}@polimi.it Abstract. In the Stream Reasoning (SR) research, working prototypes often came along with foundational investigations. For RDF Stream Pro- cessing (RSP) in particular, RSP engines empirically proved the ap- proach feasibility and paved the road to application design and com- parative analyses. Observing these real systems highlighted their het- erogeneity and fosters new foundational achievements: RSP-QL, i.e. a reference a model that explains, unifies and can be used for correctness checking and optimization of RSP approaches. In this paper, we present Yasper 1.0 a brand new RSP engine that implements RSP-QL semantics and, we hope, will foster new empirical research on RSP. 1 Introduction Since 2008, the Stream Reasoning (SR) community proposed SPARQL exten- sions to address the problem of continuous processing streams of RDF data (RSP) [4]. C-SPARQL, CEQLS and SPARQLstream emerged among many pro- posals for RSP and also thanks to the availability of working prototypes. These prototypes, called RSP Engines, enabled to design stream reasoning applications, i.e. they empirically proved SR feasibility. They paved the road to comparative researches and benchmarks [5](cfr. § 2). Last but not least, RSP engines fostered more foundational research. Indeed, Dell’Aglio et al. [3] ob- served heterogeneities of existing prototypes and proposed RSP-QL, i.e. a unify- ing model for continuous SPARQL extensions. The application scope of RSP-QL comprises, but is not limited to, correctness checking, query planning and opti- mization. Although RSP-QL is more than a query language, the community is designing a syntax [2]1 . We agree with this decision and we are also convinced that an RSP engine will encourage RSP-QL adoption and foster empirical anal- ysis as it happened with the aforementioned approaches. In this paper, we present Yasper 1.0 (Yet Another RSP Engine), a brand new RSP engine that implements RSP-QL semantics. Yasper can answer queries using the syntax proposed in [2] and incorporates some lessons learned as well as new ideas on RSP, which can be studied empirically. Yasper is released open- source2 under Apache 2.0 License and we welcome researchers and practitioners to test, study, benchmark and enhance it. 1 https://github.com/streamreasoning/rsp-ql 2 https://github.com/streamreasoning/yasper 2 R. Tommasini and Emanuele Della Valle. 2 Background In this section, we summarize the notions of RSP-QL [3] that are required to understand how Yasper works. An RDF Stream is a sequence of pairs (Oi , ti ), where ti is a non-decreasing timestamp and Oi is either an RDF Graph or an RDF Triple. The time-based sliding window operator W is a Stream-to-Relation (S2R) operator [1]. It is defined as a triple (α, β, t0 ) that, starting at the timestamp t0 , defines a series of windows of width (α) and that slide of (β). A Time-Varying Graph is a function that takes a time instant as input and produces as output RDF Graph which is called Instantaneous RDF Graph. The application of W on a RDF Stream S produces a Time-Varying Graph TVGW,S that for any given time instant t at which W is defined outputs an Instantaneous RDF Graph, which is the result of coalescing all the RDF Graphs or triples that the current window contains3 . An streaming dataset (SDS) is an extension of SPARQL dataset4 that is composed by: an optional default graph A0 , n (n ≥ 0) named Time-Varying Graphs, and m (m ≥ 0) named sliding windows over k (k ≤ m) data streams. An RSP-QL query is continuously evaluated against an SDS by an RSP en- gine. The set ET of all the instants at which the evaluation occurs is determined by the reporting policy of the RSP engine that executes the query. RSP-QL defines a reporting policy for an RSP as a combination of one or more of the following strategies: CC Content Change – the engine reports if the content of the current window changes–, WC Window Close – the engine reports if the current window closes –, NC Non-empty Content – the engine reports if the the current window is not empty –, and P Periodic – the engine reports periodically. The evaluation of a RSP-QL query outputs an instantaneous multiset of solution mappings for each evaluation time instant in ”ET”. Relation-to-Stream (R2S) [1] operators are required to transform the instantaneous multiset into a stream. RSP-QL comprises the following R2S operators: the RStream that emits each solution mappings; the IStream that emits the difference between the current solution mappings and previous ones, and; the DStream that emits the difference between the previous solution mappings and the current ones. 3 Yet Another RDF Stream Processing Engine In this section, we introduce Yasper’s architecture and how it implements RSP- QL concepts. Figure 1 shows the following Yasper’s modules: Streams, Window- ing, SDS, Querying and Reasoning. Modules that expose Yasper as a REST service are available5 but not discussed. The Stream module, Fig 1 (a), contains the classes to represent a Stream, which is identified by an URI and its content. A StreamItem is identified by 3 The current window identified by W with the oldest closing time instant at t 4 https://www.w3.org/TR/rdf-sparql-query/#specifyingDataset 5 https://github.com/streamreasoning/rspservices Yasper 1.0: Towards an RSP-QL Engine 3 Fig. 1: Yasper’s Modules: (a) Streams, (b) Windowing, (c) SDS, (d) Querying and (e) Reasoning. Only relevant architectural details are represented. the triple < ti , te , O > where ti is the time when s entered the system (Inges- tion Time); te is the time when s occurred (Event Time) and O is a generic representation of the data in s, i.e. for RSP either a RDF Graph or a Triple6 . The Windowing module, Fig 1 (b), is based on Esper7 , i.e. an open-source Data Stream Management Systems that relies on the Event Processing Lan- guage (EPL) and special objects called listeners that continuously receive the EPL queries outputs. Each EPL statement together with a dedicated listener represents a (named) time-based sliding window operator (henceforth referred as just window operator) on a RDF stream. The windowing is performed by means of temporal annotations of the StreamItems. Yasper works by default using Event Time, but it can be configured to work with Ingestion Time8 . An Esper-based window operator maintains one time-varying graph. A win- dow operator has two alternative ways to deliver content to Time-Varying Graph: (i) as a Snapshot – i.e. the window operator pushes to the Time-Varying Graph the whole window content – or (ii) as Deltas – i.e. window operator pushes to the Time-Varying Graph only the differences between the current window and the previous one in terms of additions and deletions. By default Yasper works in Snapshot mode, but it can be configured to work with Deltas. In both the ways, the Time-Varying Graphs reactively generates a Instantaneous RDF Graph any time the engine reports. Yasper reports the results on window close (WC) with- out empty content (NC). Since the set ET is determined reading the StreamItems, at each evaluation time instant we can identify an Instantaneous SDS to evaluate a RSP-QL query against. A streaming dataset SDS, Fig 1 (c), can be reactively consolidated into a set of (named) Instantaneous Graphs9 at the time t at which a Time-Varying 6 We implement O using Apache Jena 3. 7 www.espertech.com 8 Event Time does not guarantee total ordering of StreamItems 9 Slowly evolving RDF graph are represented as a (named) Time-Varying Graph too. 4 R. Tommasini and Emanuele Della Valle. REGISTER STREAMAS CONSTRUCT ISTREAM {? s ?p ? o } FROM NAMED WINDOW : win1 [RANGE 5 s , SLIDE 2 s ] ON STREAM : s t r e a m 1 FROM NAMED WINDOW : win2 [RANGE 5 s , SLIDE 5 s ] ON STREAM : s t r e a m 2 WHERE { WINDOW ?w1 {? s ?p ? o } WINDOW ?w2 {? s ?p ? o } FILTER ( ? w1 != ?w2 ) } Listing 1.1: An example of RSP-QL Query. Graph is updated. We implemented the SDS and this mechanism by extending Apache Jena 3 in-memory dataset. The Querying module, Fig 1 (d), contains the elements for query instantiation and continuous execution. At this stage of development, Yasper accepts SELECT and CONSTRUCT queries written in RSP-QL syntax (e.g. Listing 1.1). More- over, as the reader can observe in Listing 1.1, Yasper supports multi-streams query, despite [3] does not discuss how to handle them. The following Relation- To-Stream (R2S) operators are available R-, I- and DStream. Finally, the reasoning module, Fig 1 (e), allows to answer RSP-QL queries under OWL entailment by means of Jena Rule reasoner. In the future we want to integrate RDFox, Ontop and DL reasoners e.g. Pellet or Hermit. 4 Conclusion In this paper, we presented Yasper, an RSP engine for RSP-QL queries. Yasper adopts a generic stream representation, it can work with event time or inges- tion time and it implements multi-stream queries evaluation. At the moment of writing, Yasper supports SELECT and CONSTRUCT queries. ASK support is a work in progress, while DESCRIBE requires further investigations for RSP. We plan to support all the reporting Policies in Yasper, either by configuration or as an extension of the proposed syntax. Finally, we plan to perform an empirical evaluation [5] that compares Yasper architectural configurations and the existing RSP engine implementations. References 1. Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006) 2. Dell’Aglio, D., Calbimonte, J., Della Valle, E., Corcho, Ó.: Towards a unified lan- guage for RDF stream query processing. In: ESWC 2015 Satellite Events Portorož, Slovenia, May 31 - June 4, 2015, Revised Selected Papers. pp. 353–363 (2015) 3. Dell’Aglio, D., Della Valle, E., Calbimonte, J., Corcho, Ó.: RSP-QL semantics: A unifying query model to explain heterogeneity of RDF stream processing systems. Int. J. Semantic Web Inf. Syst. 10(4), 17–44 (2014) 4. DellAglio, D., Della Valle, E., van Harmelen, F., Bernstein, A.: Stream reasoning: A survey and outlook. Data Science (Preprint), 1–24 5. Tommasini, R., Della Valle, E., Balduini, M., Dell’Aglio, D.: Heaven: A framework for systematic comparative research approach for RSP engines. In: The 13th Inter- national ESWC, Heraklion, Crete, Greece, 2016, Proceedings. pp. 250–265 (2016)