=Paper= {{Paper |id=Vol-3339/paper4 |storemode=property |title=Linked Data Event Streams in Solid LDP containers |pdfUrl=https://ceur-ws.org/Vol-3339/paper4.pdf |volume=Vol-3339 |authors=Wout Slabbinck,Ruben Dedecker,Sindhu Vasireddy,Ruben Verborgh,Pieter Colpaert |dblpUrl=https://dblp.org/rec/conf/semweb/SlabbinckDVVC22 }} ==Linked Data Event Streams in Solid LDP containers== https://ceur-ws.org/Vol-3339/paper4.pdf
Linked Data Event Streams in Solid LDP containers
Wout Slabbinck1,* , Ruben Dedecker1 , Sindhu Vasireddy1 , Ruben Verborgh1 and
Pieter Colpaert1
1
    IDLab, Departement of Electronics and Information Systems, Ghent University - imec, Belgium


                                         Abstract
                                         The Solid Project – at the time of writing – uses containers with resources in them as defined in the
                                         LDP specification as a way to give developers the flexibility to write to a storage in the way they see
                                         fit. With cross-app interoperability and read performance in mind, choosing an application profile and
                                         container-resource structure becomes guesswork for the app writing the data, as all possible apps reading
                                         from the storage are not yet defined. Event sourcing is a technique used in data architecture to decouple
                                         writing from reading. Multiple views will always stay in-sync with an event source, or allow one to
                                         view a historic state or study the changes that happened over time. In this paper, we study whether we
                                         can use the current version of the Solid protocol to store an event source using the Linked Data Event
                                         Streams (LDES) specification. We successfully implemented a client library, which we tested on the use
                                         case of storing your live location with history, for both reading and writing in two modes: version aware
                                         and version agnostic. However, the current Solid protocol based on LDP also shows some limitations
                                         towards event sourcing: (i) re-balancing the hypermedia structure publishing the LDES is not possible
                                         due to slash semantics, (ii) as the event source is fully managed by clients, a faulty client may corrupt
                                         the event source, and (iii) the client is also in charge of enforcing the retention policy, having to delete
                                         older resources one by one, while they have no information about the internal limits of the Solid storage.
                                         We conclude that the Solid spec as-is can be used to store an event source, and that client libraries
                                         can create an abstraction of the history without any server-specific functionality. However, we also
                                         had to work our way around some limitations, putting more strain on the client, and want to open the
                                         discussion on whether the Solid server protocol needs to be extended for more native support of the
                                         event sourcing pattern.

                                         Keywords
                                         Solid, LDP, LDES, Event Sourcing




1. Introduction
In the Solid Project1 , app developers decide the strategy to write – and thus also read – data
within the Solid ecosystem. Leaving those decisions to the developer that first writes the


Managing the Evolution and Preservation of the Data Web (MEPDaW 2022)
*
 Corresponding author.
$ wout.slabbinck@ugent.be (W. Slabbinck); ruben.dedecker@ugent.be (R. Dedecker); sindhu.vasireddy@ugent.be
(S. Vasireddy); ruben.verborgh@ugent.be (R. Verborgh); pieter.colpaert@ugent.be (P. Colpaert)
€ https://rubendedecker.be/ (R. Dedecker); https://ruben.verborgh.org/ (R. Verborgh); https://pietercolpaert.be/
(P. Colpaert)
 0000-0002-3287-7312 (W. Slabbinck); 0000-0002-3257-3394 (R. Dedecker); 0000-0002-3522-5504 (S. Vasireddy);
0000-0002-8596-222X (R. Verborgh); 0000-0001-6917-2167 (P. Colpaert)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
              CEUR Workshop Proceedings (CEUR-WS.org)
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073




1
    https://solidproject.org/
data (e.g., using footprints2 or by planting shape trees3 ) raises the read efficiency for only
a few use cases, and limits it for others. For example, if an app writing my live location
writes each update in a file location/{timestamp}.ttl, then we can imagine an app –
even if it uses exactly the same application profile – that wants to check when I was last in
Paris will take a long time, whereas a write strategy with a geospatial sub-structure such as
location/{geohash}/{timestamp.ttl} would work faster if this structure is transparent
to the client. The application writing is thus in some way biased towards a specific type of
reuse, and cross-app interoperability is hampered.
   Since cross-app interoperability is one of the main ideas behind the Solid project, we want to
introduce a generic write-strategy in which any other kind of “view” (historic or live) on top of
this data can be built. A view can stay in-sync with the event source and translate the data to
something that can then be used specifically by the application at hand. In data architecture, the
strategy to build up a history, and keep derived view in-sync with the source is called the event
sourcing pattern [1]. In event sourcing, each change by an application results in appending
an immutable event to an event log.
   In this paper, we study how we can support event sourcing within Solid without changing the
current server specification. We (i) introduce a specification to write an event source to a Solid
storage following the current specification, (ii) create an implementation of a client library to
facilitate the management of the event source, and (iii) demo a live location sharing application
following this specification. In the conclusion, we then discuss the limitations of this approach.


2. Preliminaries
The Solid protocol4 is a web standard designed to give people control over their online data. The
specification is built with existing W3C standards that together form a basis for decentralized
data storage.
   An agent, such as an end-user, can interact with a Solid storage through create, read, update,
and delete (CRUD) operations using the Linked Data Platform (LDP) API.5 Each operation
interacts with an LDP Resource. A special kind of resource is the LDP Container, which
represents a collection of resources. Creating hierarchical structures within a storage is achieved
using containers. This is what we will further refer to as the container-resource pattern.
   An agent is identifiable with a WebID, which is an HTTP IRI. Derefencing the WebID leads to
a profile document where various aspects of the agent are stored. Solid servers can authenticate
agents by following the Solid OpenID Connect (Solid-OIDC) specification.6 Authorization over
resources in a storage is defined by the Web Access Control (WAC) specification.7 It defines
how to give specific permissions on resources to agents with Access Control Lists (ACLs).
   The Linked Data Event Streams (LDES) specification8 defines an immutable collection of
2
  https://www.w3.org/DesignIssues/Footprints.html
3
  https://shapetrees.org/
4
  https://solidproject.org/TR/protocol
5
  https://www.w3.org/TR/ldp/
6
  https://solidproject.org/TR/oidc
7
  https://solidproject.org/TR/wac
8
  https://w3id.org/ldes/specification
members and is an extension of the TREE hypermedia specification.9 TREE defines hypermedia
controls to navigate over a large collection of members. LDES is designed to be an append-only
publishing interface [2] which indicates that it is an Event Source. A versioned LDES is a type
of event stream that deals with versioned members. Each versioned member has a timestamp
and a link to the entity of which it is a version.


3. Related Work
Meinhardt et al [3] discussed five different reasons or use cases to adopt a version-based approach
for entities in knowledge graphs: (i) Version References and Data Consistency, (ii) Change
Inspection, (iii) Data Quality Assessment, (iv) Dynamic Processes, and (v) Data Dynamics. The
fact that the authors provide these arguments indicates that to allow as many use cases to build
on data from a Solid pod, it is required that the data is stored in a structure that is version-based.
   The Memento framework10 can be used for DateTime negotiation to interact with all the
versions in the event source. Fedora11 and Trellis12 [4, 5] show how Memento can work in
combination with LDP. With this protocol, clients can negotiate the DateTime of resources,
through HTTP requests, with those enhanced LDP servers. Server-side support for the event
source is thus required in order to interact with different versions with Memento.
   As a running example in this paper, we will use writing your live location to a Solid pod. Van
de Winkel et al [6] already created a Solid location app that works with multiple versions of the
location points. However, they update the same LDP resource to store new versions, which is a
custom implementation that this way will only work for their app. Our challenge is to generally
solve this problem for all use cases with live data, from slow-moving to fast-moving.


4. Event Sourcing in Solid with the LDES in LDP specification
Solid is currently built on the container-resource pattern of the LDP protocol without Memento
or Fedora. In this chapter, we assume we cannot influence the server specification, and thus
want to find a specification to describe an Event Source within this container-resource pattern.
We identified a versioned LDES as a perfect match as it does not per se require any changes to
the server specification:
     1. LDES is an RDFS vocabulary that allows to describe a collection of immutable members.
        Each member can also be described as a versioned entity if they indicate this through a
        dct:isVersionOf property (or something similar).
     2. LDES uses TREE to fragment itself into a materializable interface. An interface that is
        materializable can be stored into basic files, which thus becomes straightforward for
        usage on top of a read-write interface that employs the container-resource pattern.


9
 https://w3id.org/tree/specification
10
   https://rfc-editor.org/rfc/rfc7089.txt
11
   https://fedora.info/2018/11/22/spec/
12
   https://www.trellisldp.org/
Figure 1: The strategy used for storing a Linked Data Event Stream in an LDP container-resource
structure.


   In order to write an LDES into an LDP container-resource structure, we introduce the LDES
in LDP specification13 . While the up-to-date specification can be fully read on its webpage,
we discuss a couple of highlights here.
   In Figure 1, the container-resource structure is provided. Data is always being POST-ed to a
specific container until it is full. This container is indicated by the root container’s ldp:inbox
property. When the container is full, the client needs to create a new container, start writing
there, and update the ldp:inbox property of the root container. Furthermore, the client will
also keep the root.ttl file in-sync with all sub-containers, now describing the event source.
This event source contains links to all containers, on which also TREE metadata is added through
the .meta resource.


5. Client implementation
An implementation of the LDES in LDP specification was made and applied to the use case of
storing and reading location data to an Event Source.
   In the following subsections, we elaborate on an implementation of a client library of the
LDES in LDP specification. Next, we explain how location data can be stored and retrieved with
the demo application.

5.1. Implementation of LDES in LDP specification
We published @treecg/versionawareldesinldp14 on npm. This client library allows end-users to
both interact with an LDES in LDP as with a versioned LDES in LDP.

5.1.1. LDES in LDP Protocol
The first component of the library interacts with the LDES in LDP Protocol as is. For this the
class LDESinLDP is used for initializing an LDES in LDP, appending a member and creating
13
     https://woutslabbinck.github.io/LDESinLDP/
14
     https://doi.org/10.5281/zenodo.7234355
a new fragment.
   The initialization method executes the following operations in order: (i) creates the root
container, (ii) stores the LDES and first view information, (iii) creates the first fragment and (iv)
adds the ldp:inbox property to the root container.
   Adding data to the LDES in LDP is done with the append method. First, it retrieves the write
location, through extracting the inbox URL from the root container. This inbox URL is then used
to send an HTTP POST request with as body the data that an end-user wants to be appended to
the LDES in LDP.
   When a container is full, the newFragment method must be executed. It takes care of
the creation of a new container, updates the ldp:inbox property and finally adds a new
tree:Relation to the view to keep root.ttl in sync.


5.1.2. Versioned LDES in LDP Protocol
The second component of the library provides the tools to interact with the LDES in LDP in a
version-based manner. With a version-based approach, the component has to deal with both the
object identifier and the timestamp of the version-objects. In listing 1, an example of a versioned
LDES is given with one version-object. The object identifier, marked by dct:isVersionOf, is
ex:resource and the timestamp, marked by dct:issued is
2021-12-15T10:00:00.000Z.

               Listing 1: Example of a versioned LDES with one version-object.
 1 ex:ES a ldes:EventStream;
 2    ldes:versionOfPath dct:isVersionOf;
 3    ldes:timestampPath dct:issued;
 4    tree:member ex:resource1v0.
 5
 6 ex:resource1v0
 7    dct:isVersionOf ex:resource1;
 8    dct:issued "2021-12-15T10:00:00.000Z"^^xsd:dateTime;
 9    dct:title "First version of the title".

   The class VersionAwareLDESinLDP abstracts the LDES in LDP protocol by providing CRUD
methods to interact with version-objects. When reading an object, a snapshot is taken over
all version-objects and the most recent version of the object is returned. When a timestamp
𝑡 is provided, the result of the snapshot is the most recent version of the object until time 𝑡.
Additionally, when the client want to provide a version-agnostic view to the developer, the read
method can return a materialization of the version-object.
   Creating, updating and deleting a member is done by appending a version-object to the LDES
in LDP together with a timestamp to indicate when this action happened. Furthermore, there is
the extractVersions method that extracts all the versions for a given object identifier. Optionally,
this can be constrained by a time window.
5.2. Live location demo
To verify the implementation of the protocol, we applied it to the use case of live location data
and created two applications:

     • A Solid application with a User Interface that tracks the location of a user in real-time
       and has the functionality to share this location with another user and vice versa.15
     • A command-line interface (CLI) service that can transform location points and store them
       in an event source in Solid.16

   In this demo, we take into account multiple ways of generating location data. We have created
a service that can digest those different inputs and append the measurements into the Solid
event source.

   In the first approach, the Location History application is used to track your location and
store them in a container using the model described in the previous subsection. The service
reads this container and follows the LDES in LDP specification to store the measurements in
the event source.

   A second approach uses data generated from third-party applications which can be exported
to the GPX file format.17 The service supports adding track points from a GPX file to the event
source. As GPX is not RDF, there first needs to be a transformation from the points to the model,
which we accomplish with the RDF Mapping Language (RML) [7].18 After the mapping, the
measurements are appended to the event source.


6. Conclusion
We conclude that writing an event source within Solid LDP containers is possible thanks to the
Linked Data Event Streams vocabulary and the TREE specification. However, we have stumbled
on three limitations when we use the Solid protocol on the server as-is:

     1. In the LDES in LDP specification, a B+ Tree structure is introduced as a way to optimize
        the LDES to deal with a large number of resources. An important aspect of having a
        B+ Tree is the fact that it can increase or decrease in size easily through re-balancing.
        However, in LDP, re-balancing would require moving resources from one container to the
        other, but that would entail their URIs need to change due to slash semantics19 . This
        hinders the long-term scalability of the event source.
     2. Clients are required to write to an LDES in LDP by strictly following the protocol.
        However, when there is a bug in one of the clients that write to the storage, it may
        corrupt the event source for everyone. As an illustration, there could be a client that
15
   https://doi.org/10.5281/zenodo.7234420
16
   https://doi.org/10.5281/zenodo.7234316
17
   https://wiki.openstreetmap.org/wiki/GPX
18
   https://rml.io/specs/rml/
19
   https://solidproject.org/TR/protocol#uri-slash-semantics
          does not update the ldp:inbox property with the consequence that new members are
          added in the wrong fragment. The LDES view is now invalid, which leads to clients using
          the TREE hypermedia relations not being able to retrieve the correct objects.
       3. Retention policies on top of a tree:ViewDescription of an LDES make sure that
          the storage requirements are controlled. With a client fully organizing and maintaining
          the event source, the client will also be in charge of enforcing the retention policy. This
          means that it needs to run regularly to actively delete older resources one by one. While
          it is not desirable for the client for efficiency reasons, it is also not desirable for the server,
          as probably it is the server that will need to indicate these policies based on its physical
          storage capacity.

    However, if cross-app interoperability (cfr. the introduction) is indeed core to the ideas in
the Solid Project, then we express our opinion that event sourcing should be in the core of the
Solid specification. We propose to open a discussion within the Solid specification for a new
kind of append-only container fully managed by the server. This would allow the server to:
(i) create a balanced search tree, and possibly apply slow storage techniques for historic data;
(ii) let multiple apps write to a single inbox, and let the server organize it into one or multiple
read-views; and (iii) let the server enforce a negotiable retention policy based on application
needs and internal storage capabilities. Furthermore, it would allow the server to also support
a version agnostic view on the data when developers do not want to be bothered with the
complexities of working with version objects.
    In future work, we will prototype such a container in the Solid LDES server20 and propose
the design to be added to the Solid protocol.


Acknowledgements
Supported by SolidLab Vlaanderen (Flemish Government, EWI and RRF project VV023/10) and
the Flemish Smart Data Space (Flemish Government, Digital Flanders and RRF project VV073).


References
[1] M. Kleppmann, Designing data-intensive applications: the big ideas behind reliable, scal-
    able, and maintainable systems, first edition ed., O’Reilly Media, Boston, 2017. OCLC:
    ocn893895983.
[2] D. Van Lancker, P. Colpaert, H. Delva, B. Van de Vyvere, J. Rojas Meléndez, R. Dedecker,
    P. Michiels, R. Buyle, A. De Craene, R. Verborgh, Publishing base registries as Linked Data
    Event Streams, in: M. Brambilla, R. Chbeir, F. Frasincar, I. Manolescu (Eds.), Proceedings
    of the 21th International Conference on Web Engineering, volume 12840 of Lecture Notes
    in Computer Science, Springer, 2021, pp. 28–36. URL: https://link.springer.com/chapter/10.
    1007/978-3-030-74296-6_3. doi:10.1007/978-3-030-74296-6_3.
[3] P. Meinhardt, M. Knuth, H. Sack, TailR: a platform for preserving history on the web
    of data, in: Proceedings of the 11th International Conference on Semantic Systems,
20
     https://github.com/TREEcg/ldes-solid-server
    ACM, Vienna Austria, 2015, pp. 57–64. URL: https://dl.acm.org/doi/10.1145/2814864.2814875.
    doi:10.1145/2814864.2814875.
[4] G. Jansen, A. Coburn, A. Soroka, R. Marciano, Using Data Partitions and Stateless Servers
    to Scale Up Fedora Repositories., in: IEEE BigData, 2019, pp. 3098–3102.
[5] G. Jansen, A. Coburn, A. Soroka, W. Thomas, R. Marciano, DRAS-TIC Linked data: Evenly
    distributing the past, Publications 7 (2019) 50. Publisher: Multidisciplinary Digital Publishing
    Institute.
[6] M. Van de Wynckel, B. Signer, A Solid-based Architecture for Decentralised Interopera-
    ble Location Data, in: 12th International Conference on Indoor Positioning and Indoor
    Navigation, CEUR Workshop Proceedings, 2022.
[7] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannens, R. Van de Walle, RML:
    A Generic Language for Integrated RDF Mappings of Heterogeneous Data, in: C. Bizer,
    T. Heath, S. Auer, T. Berners-Lee (Eds.), Proceedings of the 7th Workshop on Linked Data
    on the Web, volume 1184 of CEUR Workshop Proceedings, 2014. URL: http://ceur-ws.org/
    Vol-1184/ldow2014_paper_01.pdf, iSSN: 1613-0073.