Linked Data Notifications for RDF Streams Jean-Paul Calbimonte1 1 Institute of Information Systems, University of Applied Sciences and Arts Western Switzerland, HES-SO Valais-Wallis, Sierre, Switzerland {name.surname}@hevs.ch Abstract. Linked Data Notifications (LDN) is a W3C recommendation for inter- changing notifications on the Web through a decentralized protocol. As LDN is not specific to any application domain, this paper analyzes how it can be used to enable a decentralized communication among senders, receivers and consumers of RDF streams. We propose extensions to the protocol for this particular use case, and we show the feasibility with an initial implementation of an LDN-based RDF stream interface. 1 Introduction Linked Data Notifications (LDN) [7] is a new W3C Recommendation1 for decentralized data interchange of notifications on the Web. The protocol specified by LDN has the potential to be used for virtually any type of notifications, including social media activity, sensor updates, or document updates, to name some examples. Even though the adoption of this recommendation is still to be assessed, its generality and simplicity make it an interesting option for different types of applications on the Web, for which extensions and/or profiles could be defined. In this paper we focus on the case where the notification data has a stream- ing nature, and is represented using RDF [8]. Streams can be seen as poten- tially infinite sequences of data items, where recent items are usually more relevant that older ones. The require- ments commonly related to processing data streams [13], establish among other Fig. 1. Network of RSP actors communicating things, that reactivity, data flow handling, and sharing RDF streams with one another. scalabiltiy, and querying, should be guar- anteed by a stream processor. In the case of RDF streams, RSP (RDF Stream Processing)2 engines have been implemented with the goal of providing continuous querying, complex event processing and/or incremental reasoning [3, 10, 6, 9, 1]. However, in most of these cases, these engines stopped at the processing level, without indicating how the streams would be fed, and how the processing results would be consumed on the Web. 1 https://www.w3.org/TR/ldn/ 2 For more details, see https://www.w3.org/community/rsp/ In this context, we explore the possibility of using LDN as the backbone protocol for sending, receiving and consuming RDF stream elements, which in this case would be seen as notifications. We first explore the feasibility of using the protocol as-is, with the assumption that underneath the LDN actors there might be RSP engines handling the RDF streams. This scenario was envisioned years ago [5], as depicted in Figure 1, and we believe that the protocol presented in this paper will contribute to its achievement. We show that indeed, the usage of LDN with certain extensions shows promising characteristics that indicate its possible adoption as a Web-based interchange mechanism for RDF streams. We describe the proposed specific adaptation to LDN, as well as a working prototype that encapsulates a well-known RSP engine in an LDN-enabled interface. With this feasibility experiment, we intend to contribute to the wider usage of RDF stream technologies on the Web. 2 Linked Data Notifications The LDN protocol defines three basic types of actors: sender, receiver, and consumer, and the notifications refer to (or are about) a certain target. The target is detached of its inbox, which is the endpoint where notifications can be consumed or sent. As the name reflects it, senders may send notifications to an inbox, receivers may accept them and make them available, and consumers may retrieve them. The fact that a target is not necessarily attached to its inbox, makes it possible to separate a Web resource from the endpoint where notifications will be handled. As it can be seen in Figure 2 (left), a discovery process allows senders and consumers to retrieve the inbox location through a simple GET/HEAD HTTP request. Once the inbox location is known, senders can POST notifications to it, and consumers may GET the references to notifications contained in the inbox (see Figure 2, right). Fig. 2. LDN. Left: Discovery process of a target inbox from a sender and consumer. Right: sending and retreiving notifications from an LDN inbox. The design principles of LDN imply that no- tifications are not merely transient messages that flow within a system, but Web resources that can be identified, accessed and shared across applica- tions. In this way, notifications that were posted by one application can be retrieved by a differ- ent one, without the need of any inter-dependence Fig. 3. Retrieving individual notifica- between the two, or the notifications themselves, tions from a receiver in LDN. which reside in the inbox (Figure 3). The LDN specification does not provide further details on certain aspects, such as how the inbox contents should be handled, how notifications should be persisted, or how optimizations could be made for accessing and producing them. This openness leaves certain freedom for implementers to use it as a core interaction model, upon which specific formats, profiles and constraints can be added. 3 Notifications for RDF Stream Processors Handling data flow in RDF streams is fundamentally different from traditional RDF data management in different ways. Notifications & storage. First, the streaming nature of the data implies that conventional RDF storage is not an option. RDF stream elements flow through an RSP engine that processes them as they arrive, and produces results continuously. This is the case, for instance, in an RSP query processor, where queries produce continuous answers over the stream of data. The stream is not stored anywhere, but it flows through the engines. If we visualize an RDF stream element as a notification, then we are confronted with the problem that the notification is almost certainly bound to fade as the time progresses, and therefore it might be impossible (but yet desirable) to reference it again. Notification resolvability. A second important difference, which comes as a consequence of the first one, is that an incoming RDF stream element, once processed by an engine, is consumed and may not be retrievable again. Instead, an engine will produce a streaming response of elements, based on the input stream (or streams). These characteristics lead to a slightly different model, where a notification can be part of an RDF stream input, or an output. Still, nothing prevents from having and inbox of RDF stream notifications that acts as both input and output, which in this case would mean that the inbox is a special case of RSP engine that works as an identity function: it streams out the same input it receives. Push notifications. A third observation is that the pull-based mechanism described in the LDN specifications is not always the most appropriate way of getting updates (i.e. notifications) from an RSP engine (e.g. continuous answers from a standing query). Although LDN explicitly mentions the possibility of using other mechanisms that implement a push approach, such as WebSocket, it does not develop further how this would be put in place, as it is regarded to be out of the scope of the generic definition of LDN. We consider that for the case of RDF streams, it would be important to explicitly describe how the push-based mechanism is specified, so that this type of interaction can be implemented in a compatible way among adopters. Querying. Finally, in RSP engines a key access pattern is through querying, which may happen in windowed query engines, complex event processors, or even in stream reasoners. Therefore, it would be natural to include explicit interaction specifications for registering standing queries, as well as accessing their results as streaming notifications. 4 LDN for RDF streams In this section, we present a proposal of how LDN could be used as a generic protocol upon which RSP engines could share RDF stream data among them as notifications. In this proposal we take into account the considerations made in the previous section, while keeping most of the principles behind LDN. We organize the presentation of our approach, according to different key aspects of it. Stream identification. First, an RDF stream is uniquely identified by an IRI. This IRI is a Web resource, and it can be used to obtain information about the stream: what endpoints are available to retrieve its data, or to push data to it. An RDF stream is therefore a read/write Web resource detached from potentially multiple endpoints used to interact with its contents. Endpoint discovery. The endpoints of an RDF stream are discoverable by performing a GET operation over the stream IRI, e.g.: GET http://example.org/streams/my-stream The response should then include metadata about the stream, including the endpoint information. For instance, if the response was requested with a JSON-LD header, it could include an inbox URI, as in LDN: { "@context": "http://www.w3.org/ns/ldp", "@id": "http://example.org/streams/my-stream", "inbox": "http://example.org/streams/my-stream/inbox" } Similarly, this type of discovery could be performed using a HEAD request and a link HTTP header, as in LDN (See Figure 4). Fig. 4. The discovery interactions follow those of LDN: endpoint information is available at the target, which is and RDF stream. Stream input and output. The inbox, as described in LDN, allows both senders and consumers to post and retrieve notifications through that web resource. In our case, we propose to constraint the inbox and specialize it in two distinct types: an input inbox and an output inbox. The rationale behind this choice is that some streams are published on the Web only with the intention of receiving notifications (i.e. to be fed) by senders. In this case the receiver is expected to process these streaming notifications, so that the stream is not meant to be consumed by other actors on the Web. Conversely, other streams are only meant to be consumed, as they are produced by an RSP engine. this is the case, for instance, of the results of a continuous query, or the output of a stream reasoner. As s result, the discovery process is similar, only that now instead of simply returning an inbox endpoint, the RDF stream may reference input and output endpoints. As an example, an input stream would be exposed as: { "@context": "http://w3id.org/rsp/ldn-s", "@id": "http://example.org/streams/my-stream", "input": "http://example.org/streams/my-stream/input" } As it is specified in LDN, this type of write-only input could also be reflected in the response to an OPTIONS request, through an Allow header: Allow: OPTIONS, POST Notice that to differentiate from the inbox term of the LDP3 vocabulary, we have used a new input term from a new vocabulary. This vocabulary is yet to be specified, but we use the base IRI for the rest of the examples in this paper. 3 https://www.w3.org/TR/ldp/ Sending a stream notification An RSP Sender may POST notifications to an RDF stream input endpoint, in the same way that is specified in LDN. Essentially, the POST body should contain the stream element (e.g. and RDF graph) that will be fed to the stream (as in Figure 5). As an example consider the JSON-LD representation of a humidity observation posted as a timestamped graph: POST /streams/my-stream/input HTTP/1.1 Host: example.org Content-Type: application/ld+json {"prov:generatedAtTime": "2017-07-22T05:00:00.000Z", "@id": "ex:Graph1", "@graph": [ { "@id": "ex:humidityObservation", "ex:hasValue": 34.5}], "@context": { "prov": "http://www.w3.org/ns/prov#", "ex": "http://example.org#"} } The RSP receiver can respond with a 201 Created or 202 Accepted code, if successful. However, in this case, as there is no interest in sending a location header back to the sender, this part of the protocol would differ from standard LDN. Fig. 5. RSP Sender sends a notification to a receiver stream input, and a consumer retrieves elements form a stream output. Publicizing stream elements. As it is depicted in Fig 5, a consumer may GET stream elements from an RDF stream output endpoint. LDN specifies that performing a GET over an inbox should return the notification URIs listed as objects to the LDP ldp:contains predicate. Given that an RDF stream output endpoint behaves similarly to an inbox, this is also the expected behavior. However, as stream elements fade with time, depending on the stream fluctuations and the server configuration, the listed stream contents may progressively change. This means that if the stream updates are very frequent, when a consumer retrieves the list of notifications from the output endpoint, these may be quickly outdated when it tries to access one of them individually. In any case, it would be left to the implementations to configure properly how many and for how long the notifications should be kept. As an example, consider the following response JSON-LD containing the list of timestamped graphs: { "@context": "http://www.w3.org/ns/ldp", "@id": "http://example.org/streams/my-stream/output", "contains": [ "http://example.org/streams/my-stream/output/graph1", "http://example.org/streams/my-stream/output/graph2" ] } Pulling stream elements. While individual stream elements can be retrieved as noti- fications in LDN (i.e. with a GET to the resource URI obtained as described above), this methods is not too practical. First, it introduces the need of first fetching the list of available stream items (notifications), and only then fetching them individually. This strategy might be not too effective in common streaming data scenarios, so we propose a more direct approach, consisting in returning entire sequences of stream elements at once. The size or inclusion constraints of these sequences could be specified through parameters (e.g. the latest 10 minutes of data, or the latest 10 elements, etc.). As an ex- ample, consider the following time-annotated graphs about sensor observations, returned as JSON-LD for a given stream: {"@context": { "prov": "http://www.w3.org/ns/prov#", "ex": "http://example.org#"}, "@graph": [ { "prov:generatedAtTime": "2017-07-22T05:00:00.000Z", "@id": "ex:Graph1", "@graph": [ { "@id": "ex:humidityObservation", "ex:hasValue": 34.5 }] }, { "prov:generatedAtTime": "2017-07-22T06:00:00.000Z", "@id": "ex:Graph2", "@graph": [ { "@id": "ex:humidityObservation", "ex:hasValue": 44.5 }] } ]} Pushing stream elements. While the previous data access method provides control to the consumer as when it will request the data from a stream, it is not always convenient, specially for applications that require immediate access to data that is produced on a stream. As an alternative, we propose using push based mechanisms to retrieve the data. One example is by using the Server-Sent Events4 protocol, which is based on HTTP. Using this W3C Recommendation, it is possible to continuously push data, in this case RDF stream elements, from the server to the client, in a one-directional way (as opposed to bidirectional in WebSocket. Each data item is prefixed by the data: annotation. The usage of other push protocols could also be added, which in this case would mean to add an additional endpoint to the RDF Stream. In this regard, our proposal also diverges from LDN in that the latter can only advertise one inbox, while we propose having multiple endpoints for an RDF stream. Register a query. One final aspect concerns query processing. Although this fea- ture would be restricted to query-based RSPs, we consider important to include it, as these are one of the most prominent types of processors for RDF streams. An ac- tor may POST a query to an RSP endpoint, considering that in the query, there must be a reference to a valid registered RDF stream. Also, the RSP endpoint should re- turn the URI of the resulting output stream, so that its results can be retrieved, by either pulling or pushing. As an example, consider the CQELS query over the stream http://example.org/streams/my-stream. Notice that this type of queries could include references to more than one input stream. SELECT ?s ?p ?o WHERE { STREAM [RANGE 2s] {?s ?p ?o} } 4 https://www.w3.org/TR/eventsource/ 5 Implementation We have developed a minimal implementation of the LDN protocol that complies with most of the specification for a Sender, Receiver and Consumer5 . The implementation is in Scala, is based on the Akka Http library, and is available in Github6 . We have also implemented the proposed behavior for the case of handling RDF streams. Although this is a preliminary implementation, it shows its feasibility at the same time that showcases its main features. We can summarize the main implementation characteristics as follows: – A fully asynchronous HTTP processing mechanism has been chosen, as it avoids the usage of blocking operators, and is better suited for streaming HTTP responses. This mechanism is natively supported by the Akka library. – Streams and their input/output endpoints are managed by the server implementation. – A continuous query processor, CQELS [10], was used as an example of how an RSP engine can be encapsulated with ldn-streams. – Push-based notifications have been implemented using Server-Sent Events (SSE). 6 Related Work Most of the RSP engines developed to date [3, 6, 10, 9, 1] focus on the processing aspects of RDF streams (i.e. incremental reasoning, continuous querying, complex event processing, etc.), but disregard to a certain degree the Web dimension. Early attempts to design Web-based services on top of RDF streams were explored in [4, 12], although they focused mostly on interfaces for query processing engines, and only had partial implementations. A more recent development in this line is the RSP Service Interface7 , which further develops the ideas in [4], and provides a generic implementable programming API for continuous query engines. In a step forward targeting connectivity in networks of RSP engines we can mention the SLD Revolution framework [2], which optimizes a distributed workflow of RSP en- gines. One of the long-disregarded issues in RSP was also the availability of RDF streams on the Web. This theme was the main concern of efforts like TripleWave [11], which facilitates the publication of streams with a variety of possible modes and configurations. Furthermore, a more generic vision of how these streams could be not only published but consumed in the Web, was formalized in the WeSP proposal8 . This paper actually follows the WeSP vision, while going into details with one possible implementation path. The simplicity and generality of LDN are, in our view, positive aspect that make it a good candidate for a comprehensive RSP interchange protocol. 7 Discussion One of the key motivations for using RDF to represent and process streams of data, is that it provides a Web-native model that facilitates the integration and interpretation of data. 5 https://linkedresearch.org/ldn/tests/summary 6 https://github.com/jpcik/ldn-streams 7 http://streamreasoning.org/resources/rsp-services 8 http://w3id.org/wesp/web-data-streams However, while for stored data the standards for producing, publishing and consuming RDF are well-established, there is not yet a well-supported and agreed specification. Although recent efforts provide partial solutions to the problem, e.g. TripleWave for publication or SLD Revolution for orchestration, there is an impending need for a standardized Web method for communicating among RSP actors in general. The work in progress described in this paper provides initial evidence that a very generic protocol as LDN could serve as a starting point towards this goal. The decentralized nature of LDN, along with its simplicity and extensibility, are positive arguments for advocating its use. Nevertheless, it is important to consider that even in a generic case as LDN, there are certain assumptions about the data, in this case notifications, which are fundamentally different in the case of dealing with RDF streams. We believe that the proposed exten- sions, could be used to formalize an LDN profile which could be used for RDF streams in general. In contrast with previous works, the protocol that we propose is not targeted only towards querying, but to any type of processing over RDF streams, which can even include traditional SPARQL engines, reasoners, or even machine learning processors. The current trends in Big Data processing, show that even stored data (i.e. data that was not inherently represented as a stream) is now more and more processed in a streaming fashion, typically for efficiency reasons. This trend extends to the RDF processing world in general, and the lessons learned in the RSP community could be of great benefit to a larger audience. References 1. D. Anicic, P. Fodor, S. Rudolph, and N. Stojanovic. EP-SPARQL: a unified language for event processing and stream reasoning. In WWW, pages 635–644, 2011. 2. M. Balduini, E. D. Valle, and R. Tommasini. SLD revolution: A cheaper, faster yet more accurate streaming linked data framework. In RSP, pages 1–15, 2017. 3. D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle, and M. Grossniklaus. C-sparql: a continuous query language for rdf data streams. Intl. J. Semantic Computing, 4(01):3–25, 2010. 4. D. F. Barbieri and E. Della Valle. A proposal for publishing data streams as linked data - A position paper. In LDOW, 2010. 5. J.-P. Calbimonte. Rdf stream processing: let’s react. In OrdRing, pages 1–10, 2014. 6. J.-P. Calbimonte, H. Jeung, O. Corcho, and K. Aberer. Enabling query technologies for the semantic sensor web. Int. J. Semantic Web Inf. Syst., 8:43–63, 2012. 7. S. Capadisli, A. Guy, C. Lange, S. Auer, A. Sambra, and T. Berners-Lee. Linked data notifications: a resource-centric communication protocol. In ESWC, pages 537–553, 2017. 8. D. Dell’Aglio, M. Dao-Tran, J.-P. Calbimonte, D. L. Phuoc, and E. Della Valle. A Query Model to Capture Event Pattern Matching in RDF Stream Processing Query Languages. In EKAW, pages 145–162, 2016. 9. S. Komazec, D. Cerri, and D. Fensel. Sparkwave: continuous schema-enhanced pattern matching over RDF data streams. In DEBS, pages 58–68. 2012. 10. D. Le-Phuoc, M. Dao-Tran, J. X. Parreira, and M. Hauswirth. A native and adaptive approach for unified processing of linked streams and linked data. In ISWC, pages 370–388. 2011. 11. A. Mauri, J.-P. Calbimonte, D. Dell’Aglio, M. Balduini, M. Brambilla, E. D. Valle, and K. Aberer. TripleWave: Spreading RDF Streams on the Web. In ISWC, pages 140–149, 2016. 12. J. F. Sequeda and O. Corcho. Linked stream data: A position paper. In SSN, pages 148–157. CEUR-WS. org, 2009. 13. M. Stonebraker, U. etintemel, and S. B. Zdonik. The 8 requirements of real-time stream processing. SIGMOD Record, 34(4):42–47, 2005.