<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>VoCaLS: Describing Streams on the Web</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Riccardo Tommasini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yehia Abo Sedira</string-name>
          <email>byehiamohamed.abosedera@mail.polimi.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniele Dell'Aglio</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Balduini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Intizar Ali</string-name>
          <email>ali.intizar@insight-centre.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danh Le Phuoc</string-name>
          <email>danh.lephuoc@tu-berlin.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emanuele Della Valle</string-name>
          <email>dellaglio@i</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jean-Paul Calbimonte</string-name>
          <email>jean-paul.calbimonte@hevs.ch</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Insight Center for Data Analytics</institution>
          ,
          <addr-line>Galway</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Introduction &amp;</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Politecnico di Milano, DEIB</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Technichal University of Berlin</institution>
          ,
          <addr-line>Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Applied Sciences and Arts Western Switzerland</institution>
          ,
          <addr-line>Sierre</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>University of Zurich</institution>
          ,
          <addr-line>Zurich</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>1 VoCaLS URI: https://w3id.org/rsp/vocals# 2 https://www.w3.org/community/rsp/ 3 Widoco: https://doi.org/10.5281/zenodo.591294</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation</title>
      <p>
        The interest in exploring stream publication and consumption mechanisms on
the Web has recently gained attention [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ], leveraging the progress in Stream
Reasoning systems and approaches. Systems that consume streams for
processing (e.g., reasoning, ltering, learning, event detection) require standards for
interchanging data about the streams, including endpoint information,
processing capabilities, data structure, pull and push retrieval options, etc. Although
previous e orts partially tackled these problems in the past, there is still no
general agreement on a shared set of principles and vocabularies for streaming
data catalogs, as it is the case with static Linked Data.
      </p>
      <p>
        This paper presents the highlights of the Vocabulary for Cataloging and
Linking Streams and streaming services on the web (VoCaLS1). This work is a
complement to the full VoCaLS paper [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], focused on the reuse, dissemination,
and adoption activities related to this vocabulary. VoCaLS includes concepts
related not only to the publication of streams but also the consumption and
processing, regardless of implementations details and design choices of di erent
RDF Stream Processing (RSP) and Stream Reasoning systems and languages.
This vocabulary constitutes a key step towards the long-term goal of allowing
Web-centered interactions among RDF Stream processing services. VoCaLS has
been engineered as a collaborative e ort, following the discussions and results of
the work of the W3C RSP Community Group2. The vocabulary has been made
openly available through a permanent URI, it has been submitted to the Linked
Open Vocabularies (LOV) repository, it is published under a CC-BY 4.0 license,
and its documentation is made available through the Widoco toolset3.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Use-Cases &amp; Requirements</title>
      <p>
        Several use-cases motivate the design and the adoption of a vocabulary for
describing streams and streaming services [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ].
      </p>
      <p>The adoption of a shared vocabulary would (i) allow decentralized &amp;
automated discovery of streaming data publishers and consumers at Web scale.
Moreover, (ii) it would to support interactions between RDF Stream
Processing (RSP) engines on the Web, standardizing the communication between them
and, thus, enabling service discovery and query federation. Finally,
Experimentation and Empirical Research would bene t from cataloging available streams,
pro ling the engine features, and tracking the provenance of the experiments.</p>
      <p>From the aforementioned use-cases we identi ed the following challenges:
Publication &amp; discovery. A stream description should characterize the
contents of a (RDF) stream and describe the capabilities of the stream source.
Moreover, a streaming service description should describe available endpoints
from which streams can be accessed/processed/generated.</p>
      <p>Access &amp; processing. It is crucial to describe the capabilities of streaming
services, such as stream processing engines and reasoners, in terms of their
features. Moreover, it is important to allow the selection of steam partitions and
windows, which can be dumped, transmitted or ltered.</p>
      <p>Provenance &amp; Licensing. It is required to allow tracking the
transformations that involve streaming data, and those that occur on the streams, as well
as contracts that regulate data access by actors involved in such transformations.</p>
      <p>
        VoCaLS addresses these challenges, and in fact complies with the following
requirements, which were elicited during the design phase of the vocabulary, as
detailed in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In summary, such vocabulary must: (i) enable the description
of streams, i.e. content, relevant statistics, and the license of use; (ii) enable
the description of streaming services, i.e., characterizing their capabilities, their
APIs, and the license of use; (iii) enable historical stream processing/analysis and
replay, i.e., allowing stream storage and dumping of stream samples; (iv) enable
provenance tracking at any level, i.e., characterizing stream (a) creation, (b)
publication, and (c) storage; but also denoting manipulation and management
concerning to existing theoretical frameworks; (v) tame velocity for streaming
data management, i.e., prioritize push-based content provisioning to pull-based
one, and encouraging the adoption of an active stream processing paradigm; (vi)
tame variety for streaming data management, i.e., do not bind the speci cation
to any domain speci c vocabulary, and to any speci c data models, e.g., RDF
Streams.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>The VoCaLS Vocabulary of Linked Streams</title>
      <p>The vocabulary is organized in three modules: VoCaLS Core, which describes
the core elements of the vocabulary, VoCaLS Service Description, which
describes RDF stream service descriptions, and VoCaLS Provenance, focused on
streaming data transformation and manipulation. We will introduce each module
separately, along with illustrative examples.
Core Vocabulary: VoCaLS Core concepts are based on an extension of DCAT
to represent streams on the Web. As depicted in Figure 1, the model introduces
the basic abstractions to represent streams. A (i) vocals:StreamDescriptor
is a document accessible via HTTP that holds metadata about the stream and
its contents. A (ii) vocals:Stream represents a Web stream, i.e., an unbounded
sequence of time-varying data elements that might be ndable and accessible
on the Web, and which can be consumed via a (iii) vocals:StreamEndpoint.
Finally, a (iv) vocals:FiniteStreamPartition is a portion of the stream
available for regular Linked Data services to access and process its content.
Streaming Service Description: VoCaLS Service Description focuses on
metadata related to streaming services and their capabilities, enabling consumers to
discover and select services suitable to their needs. The vsd:StreamingService
is an abstraction to represent a service that deals data streams of any type.
Continuous query engines, stream reasoners, and RDF stream publishers are valid
examples. Three classes of RDF streaming services were identi ed, although
others could be added if needed:
(i) vsd:CatalogService, a service that may provide metadata about streams,
their content, query endpoints and more. (ii) vsd:PublishingService, which
represents a service that publishes RDF streams, possibly following a Linked
Data compliant scheme, and (iii) vsd:ProcessingService, which models a
stream processing service that performs any kind of transformation on
streaming data, e.g. querying, reasoning, ltering.</p>
      <p>Stream Transformation Provenance: VoCaLS Provenance module focuses
on tracking the provenance of stream processing services, i.e., tracing the
consequences of operations performed over the streams. The module de nes four
main classes: (i) vprov:R2ROperator refers to operators that produce RDF
mappings (relations) from other RDF mappings. (ii) vprov:R2SOperator
represents operators that produce a stream from a relation. (iii) vprov:S2ROperator
refers to operators that produce relations from streams, e.g., windowing. Finally,
(iv) vprov:S2SOperator allows describing operators that produce a stream from
another stream.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>Dataset description vocabularies (e.g. DCAT, DCterms, VoID) were designed
primarily with static and stored (linked) data in mind, and provide metadata
descriptions for any sort of datasets published on the Web. Nevertheless, as
stated before they do not allow describing Web streams and streaming services.</p>
      <p>
        On the other hand, VoCaLS is a vocabulary designed for describing streams,
streaming services, and it includes the capability of describing stream
transformations: the operations that detail how streaming data is generated or processed.
Previous attempts to cover this gap are VoIS [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and WeSP [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], although they
have several limitations regarding scope, quality, and coverage of the
requirements detailed earlier in this work. These two early attempts have been used
as the basis for VoCaLS, which emerged by taking the lessons learned. VoCaLS
is a generic resource that can, and should, be combined with domain-speci c
vocabularies. The design of VoCaLS has followed a community-driven approach,
starting from the W3C RSP Community group results, and a requirement
analysis described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Last but not least, VoCaLS has been published following
well-principled practices for the publication of the vocabulary, including the set
up of permanent URIs, the availability of full open documentation using Widoco,
the availability of sources in Github4, and its inclusion in the LOV repository.
Road Map: Regarding the adoption and sustainability plans for VoCaLS,
several steps have been taken in this direction. Given that The establishment of
a common vocabulary is one of the main goals of the W3C RSP Community
Group, we have started the process of elevating this vocabulary as an o cial
Group Note. The adoption and support from the authors, as a relevant part of
this community, will contribute positively to this endeavor. Another important
goal is to foster the adoption of VoCaLS within relevant communities. For this
purpose we initiated the creation of a catalog of streams descriptions5.
Moreover, we developed a simple utility6 to support the annotation of new streams.
Finally, in order to lead by example, we have launched the integration of VoCaLS
within relevant services and software available for the RSP community: the RSP
Services, RSPLab, and TripleWave.
4 https://github.com/ysedira/vocals
5 https://github.com/ysedira/vocals/tree/master/catalog
6 https://github.com/ysedira/stream-annotation-tool
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Dell'Aglio</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le Phuoc</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le-Tuan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calbimonte</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          :
          <article-title>On a web of data streams</article-title>
          .
          <source>In: ISWC DeSemWeb</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Sedira</surname>
            ,
            <given-names>Y.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tommasini</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Della</given-names>
            <surname>Valle</surname>
          </string-name>
          , E.:
          <article-title>Towards vois: a vocabulary of interlinked streams</article-title>
          .
          <source>In: ISWC DeSemWeb</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Sedira</surname>
            ,
            <given-names>Y.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tommasini</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , DellAglio,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Balduini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.I.</given-names>
            ,
            <surname>Le Phuoc</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Della</given-names>
            <surname>Valle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Calbimonte</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.P.</surname>
          </string-name>
          :
          <article-title>Describing a web of streams</article-title>
          . In: ISWC (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>