VoCaLS: Describing Streams on the Web Riccardo Tommasini1,a , Yehia Abo Sedira1,b , Daniele Dell’Aglio2 , Marco Balduini1,a , Muhammad Intizar Ali3 , Danh Le Phuoc4 , Emanuele Della Valle1,a Jean-Paul Calbimonte5 1 Politecnico di Milano, DEIB, Milan, Italy a {name.lastname}@polimi.it | b yehiamohamed.abosedera@mail.polimi.it 2 University of Zurich, Zurich, Switzerland dellaglio@ifi.uzh.ch 3 Insight Center for Data Analytics, Galway, Ireland ali.intizar@insight-centre.org 4 Technichal University of Berlin, Berlin, Germany danh.lephuoc@tu-berlin.de 5 University of Applied Sciences and Arts Western Switzerland, Sierre, Switzerland jean-paul.calbimonte@hevs.ch 1 Introduction & Motivation The interest in exploring stream publication and consumption mechanisms on the Web has recently gained attention [1,2], leveraging the progress in Stream Reasoning systems and approaches. Systems that consume streams for process- ing (e.g., reasoning, filtering, learning, event detection) require standards for interchanging data about the streams, including endpoint information, process- ing capabilities, data structure, pull and push retrieval options, etc. Although previous efforts partially tackled these problems in the past, there is still no general agreement on a shared set of principles and vocabularies for streaming data catalogs, as it is the case with static Linked Data. This paper presents the highlights of the Vocabulary for Cataloging and Linking Streams and streaming services on the web (VoCaLS1 ). This work is a complement to the full VoCaLS paper [3], focused on the reuse, dissemination, and adoption activities related to this vocabulary. VoCaLS includes concepts related not only to the publication of streams but also the consumption and processing, regardless of implementations details and design choices of different RDF Stream Processing (RSP) and Stream Reasoning systems and languages. This vocabulary constitutes a key step towards the long-term goal of allowing Web-centered interactions among RDF Stream processing services. VoCaLS has been engineered as a collaborative effort, following the discussions and results of the work of the W3C RSP Community Group2 . The vocabulary has been made openly available through a permanent URI, it has been submitted to the Linked Open Vocabularies (LOV) repository, it is published under a CC-BY 4.0 license, and its documentation is made available through the Widoco toolset3 . 1 VoCaLS URI: https://w3id.org/rsp/vocals# 2 https://www.w3.org/community/rsp/ 3 Widoco: https://doi.org/10.5281/zenodo.591294 2 Use-Cases & Requirements Several use-cases motivate the design and the adoption of a vocabulary for de- scribing streams and streaming services [1,2]. The adoption of a shared vocabulary would (i) allow decentralized & au- tomated discovery of streaming data publishers and consumers at Web scale. Moreover, (ii) it would to support interactions between RDF Stream Process- ing (RSP) engines on the Web, standardizing the communication between them and, thus, enabling service discovery and query federation. Finally, Experimen- tation and Empirical Research would benefit from cataloging available streams, profiling the engine features, and tracking the provenance of the experiments. From the aforementioned use-cases we identified the following challenges: Publication & discovery. A stream description should characterize the con- tents of a (RDF) stream and describe the capabilities of the stream source. Moreover, a streaming service description should describe available endpoints from which streams can be accessed/processed/generated. Access & processing. It is crucial to describe the capabilities of streaming services, such as stream processing engines and reasoners, in terms of their fea- tures. Moreover, it is important to allow the selection of steam partitions and windows, which can be dumped, transmitted or filtered. Provenance & Licensing. It is required to allow tracking the transforma- tions that involve streaming data, and those that occur on the streams, as well as contracts that regulate data access by actors involved in such transformations. VoCaLS addresses these challenges, and in fact complies with the following requirements, which were elicited during the design phase of the vocabulary, as detailed in [3]. In summary, such vocabulary must: (i) enable the description of streams, i.e. content, relevant statistics, and the license of use; (ii) enable the description of streaming services, i.e., characterizing their capabilities, their APIs, and the license of use; (iii) enable historical stream processing/analysis and replay, i.e., allowing stream storage and dumping of stream samples; (iv) enable provenance tracking at any level, i.e., characterizing stream (a) creation, (b) publication, and (c) storage; but also denoting manipulation and management concerning to existing theoretical frameworks; (v) tame velocity for streaming data management, i.e., prioritize push-based content provisioning to pull-based one, and encouraging the adoption of an active stream processing paradigm; (vi) tame variety for streaming data management, i.e., do not bind the specification to any domain specific vocabulary, and to any specific data models, e.g., RDF Streams. 3 The VoCaLS Vocabulary of Linked Streams The vocabulary is organized in three modules: VoCaLS Core, which describes the core elements of the vocabulary, VoCaLS Service Description, which de- scribes RDF stream service descriptions, and VoCaLS Provenance, focused on streaming data transformation and manipulation. We will introduce each module separately, along with illustrative examples. Fig. 1. VoCaLS Core module Core Vocabulary: VoCaLS Core concepts are based on an extension of DCAT to represent streams on the Web. As depicted in Figure 1, the model introduces the basic abstractions to represent streams. A (i) vocals:StreamDescriptor is a document accessible via HTTP that holds metadata about the stream and its contents. A (ii) vocals:Stream represents a Web stream, i.e., an unbounded sequence of time-varying data elements that might be findable and accessible on the Web, and which can be consumed via a (iii) vocals:StreamEndpoint. Finally, a (iv) vocals:FiniteStreamPartition is a portion of the stream avail- able for regular Linked Data services to access and process its content. Streaming Service Description: VoCaLS Service Description focuses on meta- data related to streaming services and their capabilities, enabling consumers to discover and select services suitable to their needs. The vsd:StreamingService is an abstraction to represent a service that deals data streams of any type. Con- tinuous query engines, stream reasoners, and RDF stream publishers are valid examples. Three classes of RDF streaming services were identified, although oth- ers could be added if needed: (i) vsd:CatalogService, a service that may provide metadata about streams, their content, query endpoints and more. (ii) vsd:PublishingService, which represents a service that publishes RDF streams, possibly following a Linked Data compliant scheme, and (iii) vsd:ProcessingService, which models a stream processing service that performs any kind of transformation on stream- ing data, e.g. querying, reasoning, filtering. Stream Transformation Provenance: VoCaLS Provenance module focuses on tracking the provenance of stream processing services, i.e., tracing the con- sequences of operations performed over the streams. The module defines four main classes: (i) vprov:R2ROperator refers to operators that produce RDF mappings (relations) from other RDF mappings. (ii) vprov:R2SOperator repre- sents operators that produce a stream from a relation. (iii) vprov:S2ROperator refers to operators that produce relations from streams, e.g., windowing. Finally, (iv) vprov:S2SOperator allows describing operators that produce a stream from another stream. 4 Discussion Dataset description vocabularies (e.g. DCAT, DCterms, VoID) were designed primarily with static and stored (linked) data in mind, and provide metadata descriptions for any sort of datasets published on the Web. Nevertheless, as stated before they do not allow describing Web streams and streaming services. On the other hand, VoCaLS is a vocabulary designed for describing streams, streaming services, and it includes the capability of describing stream transfor- mations: the operations that detail how streaming data is generated or processed. Previous attempts to cover this gap are VoIS [2] and WeSP [1], although they have several limitations regarding scope, quality, and coverage of the require- ments detailed earlier in this work. These two early attempts have been used as the basis for VoCaLS, which emerged by taking the lessons learned. VoCaLS is a generic resource that can, and should, be combined with domain-specific vocabularies. The design of VoCaLS has followed a community-driven approach, starting from the W3C RSP Community group results, and a requirement anal- ysis described in [3]. Last but not least, VoCaLS has been published following well-principled practices for the publication of the vocabulary, including the set up of permanent URIs, the availability of full open documentation using Widoco, the availability of sources in Github4 , and its inclusion in the LOV repository. Road Map: Regarding the adoption and sustainability plans for VoCaLS, sev- eral steps have been taken in this direction. Given that The establishment of a common vocabulary is one of the main goals of the W3C RSP Community Group, we have started the process of elevating this vocabulary as an official Group Note. The adoption and support from the authors, as a relevant part of this community, will contribute positively to this endeavor. Another important goal is to foster the adoption of VoCaLS within relevant communities. For this purpose we initiated the creation of a catalog of streams descriptions5 . More- over, we developed a simple utility6 to support the annotation of new streams. Finally, in order to lead by example, we have launched the integration of VoCaLS within relevant services and software available for the RSP community: the RSP Services, RSPLab, and TripleWave. References 1. Dell’Aglio, D., Le Phuoc, D., Le-Tuan, A., Ali, M.I., Calbimonte, J.P.: On a web of data streams. In: ISWC DeSemWeb (2017) 2. Sedira, Y.A., Tommasini, R., Della Valle, E.: Towards vois: a vocabulary of inter- linked streams. In: ISWC DeSemWeb (2017) 3. Sedira, Y.A., Tommasini, R., DellAglio, D., Balduini, M., Ali, M.I., Le Phuoc, D., Della Valle, E., Calbimonte, J.P.: Describing a web of streams. In: ISWC (2018) 4 https://github.com/ysedira/vocals 5 https://github.com/ysedira/vocals/tree/master/catalog 6 https://github.com/ysedira/stream-annotation-tool