=Paper= {{Paper |id=Vol-2941/paper1 |storemode=property |title=m-ld: Realtime Information Sharing with RDF |pdfUrl=https://ceur-ws.org/Vol-2941/paper1.pdf |volume=Vol-2941 |authors=George Svarovsky |dblpUrl=https://dblp.org/rec/conf/i-semantics/Svarovsky21 }} ==m-ld: Realtime Information Sharing with RDF== https://ceur-ws.org/Vol-2941/paper1.pdf
    m-ld: Realtime Information Sharing with RDF

                       George Svarovsky[0000−0002−7480−2888]?

       m-ld.io Ltd., Lyndale House, 24 High Street, Addlestone, KT15 1TN, UK
                          https://m-ld.org/ info@m-ld.io



        Abstract. Users of information systems increasingly expect information
        to be available to edit from multiple devices and by multiple users, online
        and offline. Strategies exist for shared data types with strong eventual
        consistency guarantees. These can be complex and fault-prone to im-
        plement de novo for application data, and library implementations do
        not present standard APIs. To improve data interoperability, portabil-
        ity and extensibility, these strategies can be applied to a standard and
        self-describing data format, RDF. We introduce m-ld, a component pro-
        viding eventual consistency for RDF data, showing how it can be used
        to create a collaborative message board program.

        Keywords: Realtime collaborative editing · CRDT · RDF.


1     Introduction
Real-time collaborative editing of documents is now a well-established pattern in
groupware programs, popularised by Google Docs. Other notable ad hoc imple-
mentations include Figma [14] and Wikidocs [7]. Evan Wallace (Figma) justifies
their investment with the comment “it just felt wrong not to offer multiplayer
as a tool on the web”.
    However, collaborative editing features are particularly complex to imple-
ment from scratch. Reasons for this relate to ensuring strong eventual consistency
in the face of concurrent edits for a non-trivial data structure [10]; implement-
ing the consistency algorithm in the face of network and compute failures; and
testing the implementation. Haymo Meran (Wikidocs) comments, in relation to
his subsequent work integrating collaborative editing into Atlassian’s tools, “you
need a lot of endurance” (https://youtu.be/EgCYd6ei7QI?t=21).
    To address this, re-usable software components have been developed that
provide abstract shared data types. Active open-source projects include Yjs [9]
and automerge [5]. These tools alleviate the implementation complexity of real-
time collaborative editing for stand-alone programs.
    If two different programs are to support collaborative editing on the same
information, it is necessary for the shared data types’ behaviours to be correctly
implemented in both. This will generally require changes to source code, whether
to re-implement an ad-hoc data type, or to include a library.
?
    Copyright © 2021 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
2       G. Svarovsky

    We can conceive improvements to this integration complexity. First, if each
program were to publish the identities of the shared data types in use, other
programs would be able to dynamically select and apply a suitable implemen-
tation. However, this would still require a specific mapping of the program’s
presentation layer to each of the supported data types’ interfaces. This could
be addressed with a common representation such as the Resource Description
Framework (RDF).
    A key feature of RDF is that it is extensible, including with new data struc-
tures, which can be strongly identified in the data so consuming applications can
adapt to them. Extensions to RDF use vocabularies of known Internationalised
Resource Identifiers (IRIs), given meanings by specifications or conventions. An
extension may define an entailment regime, that is, a set of logical consequences
of the statements in an RDF graph. It may also impose syntactic conditions or
restrictions.
    The m-ld project is exploring how to use these properties for shared data
types. The goal is to facilitate the re-use of shared information in different con-
texts and by different applications.


2   Related Work

m-ld builds directly on prior work on the theoretical basis for real-time collabo-
rative editing of RDF [3,6,15]. Besides implementing strong eventual consistency
for RDF graphs, this project also seeks to consider the automatic maintenance
of extended semantics in real-time.
    Other work has focused on a Distributed Version Control System (DVCS)
approach [2,13], such as for improving the quality of Linked Open Data [4]. The
DVCS model supports extended semantics by requiring an external intervention
if concurrent operations give rise to an inconsistent state. So, this approach
is better suited to systems without a real-time constraint, or with a central
authority such as a database.
    Consistency can also be achieved in a decentralised system using distributed
consensus, often used with blockchains. RDF is being considered in this space, for
example, for indexing [12]. Consensus does not itself address the merge algorithm
for concurrent or offline edits, but is a means for an observer to know that a state
has been agreed. This property is likely to be useful in future work on m-ld.


3   Approach

The m-ld project (https://m-ld.org/) approaches RDF shared data types by:

 1. Implementing the RDF graph itself as a shared data type.
 2. Establishing a pattern for the composition of extended shared data types
    within the RDF graph, using vocabularies.
                             m-ld: Realtime Information Sharing with RDF           3

3.1   Shared RDF
The realisation of an RDF graph as a shared data type is a direct implementation
of SU-Set, a Conflict-free Replicated Data Type (CRDT) [3]. Sharing is effected
by making a copy of the graph (a clone), with a new process identity. Upon
mutation, clones publish operations which are delivered to every peer clone and
merged into their graphs. The SU-Set permits these mutations to be concurrent,
while guaranteeing that all clones will eventually converge on the same graph.
    As a CRDT, the SU-Set does not require central coordination for a total
ordering of operation messages. Instead, it requires causal ordering. This is re-
alised in m-ld using a logical clock and re-ordering of incoming messages, as
necessary, in each clone. The clock chosen is a simplification of an Interval Tree
Clock [1], which is more space-efficient than a vector clock.
    The SU-Set also does not require “tombstones” (markers for deleted data).
However, this means that it is not possible to arbitrarily merge clone graphs
without knowledge of the operations applied to each since they diverged. For
this reason, m-ld clones maintain a journal of operations, to allow clones to
rev-up from a peer if they have missed operation messages.

3.2   Shared Data Type Vocabularies
The realisation of the RDF graph as a shared data type ensures eventual con-
sistency for all graph mutations. However, it does not guarantee correctness of
the graph content according to any applicable extended semantics.
     The m-ld project has explored the embedding of ordered lists in the shared
graph, using a sequence CRDT inspired by LSEQ [8]. The CRDT behaviour is
not core to m-ld but rather implemented using a constraint, which encapsulates
the list syntax and semantics. The input into a constraint is the current state
and the proposed operation, whether local or received from a remote clone, and
it is able to reject an invalid local operation, rewrite the operation, entail conse-
quences of the operation, or even assert new data. The constraint is dynamically
selected and executed based on the vocabulary used in the data (or also, in this
case, as the default list implementation in m-ld [11]).
     In principle the same process, of defining a vocabulary and providing a con-
straint implementation, can be used to implement other shared data type exten-
sions.


4     Demonstration
The m-ld website provides two web applications that demonstrate the compo-
nent, using an engine running in the browser. In both cases the Javascript of
the web application loads and uses the engine as a library. The dynamic RDF
graph is stored locally in the browser; services are only used for website content
delivery and operation message delivery.
    The demo web application (https://m-ld.org/demo/; Fig. 1, left) presents
a “message board” which can be edited concurrently by multiple users worldwide.
4         G. Svarovsky

The playground (https://m-ld.org/playground/; Fig. 1, right) is a utility for
interrogating a m-ld graph using the API syntax. In the figure, the playground
has been directed to share a graph with the message board, and the representa-
tion of the one of the messages is visible, as JSON-LD.




               Fig. 1. The m-ld demo and playground web applications




5      Evaluation & Ongoing Work

The m-ld project has built a shared data types engine using RDF as its foun-
dational data representation. m-ld supports real-time collaborative editing of
information, including maintenance of extended semantics for ordered lists. Fur-
ther work is needed to validate the extensibility pattern against other shared
data types, and in realistic use-cases.
    Other ongoing and future work in the m-ld project includes:

    – Publication of the m-ld protocol specification.
    – Research into supporting strong cryptographic assurance of data integrity
      and traceability, with authority assignable to identified users or groups.
    – Research into tunable strategies to truncate or compress the clone journal
      to balance availability against storage consumption.
    – Performance characterisation and tuning.


References
 1. Almeida, P.S., Baquero, C., Fonte, V.: Interval Tree Clocks. In: Baker,
    T.P., Bui, A., Tixeuil, S. (eds.) Principles of Distributed Systems. pp. 259–
    274. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg (2008).
    https://doi.org/10.1007/978-3-540-92221-6 18
 2. Cassidy, S., Ballantine, J.: Version Control for RDF Triple Stores. ICSOFT 2007
    - 2nd International Conference on Software and Data Technologies, Proceedings
    p. 12 (Jan 2007)
                               m-ld: Realtime Information Sharing with RDF              5

 3. Ibáñez, L.D., Skaf-Molli, H., Molli, P., Corby, O.: Live linked data: Synchro-
    nising semantic stores with commutative replicated data types. International
    Journal of Metadata, Semantics and Ontologies 8(2), 119–133 (Jan 2013).
    https://doi.org/10.1504/IJMSO.2013.056605
 4. Ibáñez, L.D., Skaf-Molli, H., Molli, P., Corby, O.: Col-Graph: Towards Writable
    and Scalable Linked Open Data. In: Mika, P., Tudorache, T., Bernstein, A., Welty,
    C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.)
    The Semantic Web – ISWC 2014. pp. 325–340. Lecture Notes in Computer Science,
    Springer International Publishing, Cham (2014). https://doi.org/10.1007/978-3-
    319-11964-9 21
 5. Kleppmann, M., Beresford, A.R.: Automerge: Realtime data sync between
    edge devices. In: 1st UK Mobile, Wearable and Ubiquitous Systems Re-
    search Symposium (MobiUK 2018) (2018), https://mobiuk.org/abstract/
    S4-P5-Kleppmann-Automerge.pdf
 6. Mechaoui, M.D., Guetmi, N., Imine, A.: Towards Real-Time Co-authoring of
    Linked-Data on the Web. In: Amine, A., Bellatreche, L., Elberrichi, Z., Neuhold,
    E.J., Wrembel, R. (eds.) Computer Science and Its Applications. pp. 538–548. IFIP
    Advances in Information and Communication Technology, Springer International
    Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-19578-0 44
 7. Meran,      H.:    Wikidocs      -   Real     time     collaborative   editing     for
    HTML           (Oct        2013),       https://www.slideshare.net/draftkraft/
    wikidocs-real-time-collaborative
 8. Nédelec, B., Molli, P., Mostefaoui, A., Desmontils, E.: LSEQ: An adaptive
    structure for sequences in distributed collaborative editing. In: Proceedings
    of the 2013 ACM Symposium on Document Engineering. pp. 37–46. DocEng
    ’13, Association for Computing Machinery, New York, NY, USA (Sep 2013).
    https://doi.org/10.1145/2494266.2494278
 9. Nicolaescu, P., Jahns, K., Derntl, M., Klamma, R.: Yjs: A Framework for Near
    Real-Time P2P Shared Editing on Arbitrary Data Types. In: Cimiano, P., Fras-
    incar, F., Houben, G.J., Schwabe, D. (eds.) Engineering the Web in the Big Data
    Era. pp. 675–678. Lecture Notes in Computer Science, Springer International Pub-
    lishing, Cham (2015). https://doi.org/10.1007/978-3-319-19890-3 55
10. Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M.: Conflict-Free Replicated
    Data Types. In: Défago, X., Petit, F., Villain, V. (eds.) Stabilization, Safety, and
    Security of Distributed Systems. pp. 386–400. Lecture Notes in Computer Science,
    Springer, Berlin, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24550-3 29
11. Svarovsky, G.: Truth and Just Lists (Feb 2021), https://codeburst.io/
    truth-and-just-lists-67c0e0e22a9d
12. Third, A., Domingue, J.: Linked Data Indexing of Distributed Ledgers. In: Pro-
    ceedings of the 26th International Conference on World Wide Web Companion.
    pp. 1431–1436. WWW ’17 Companion, International World Wide Web Confer-
    ences Steering Committee, Republic and Canton of Geneva, CHE (Apr 2017).
    https://doi.org/10.1145/3041021.3053895
13. van Otterdijk, M., Mendel-Gleason, G., Feeney, K.: Succinct Data Structures and
    Delta Encoding for Modern Databases (Jan 2020), https://terminusdb.com/t/
    papers/terminusdb-git.pdf
14. Wallace, E.: How Figma’s multiplayer technology works (Oct 2019), https://www.
    figma.com/blog/how-figmas-multiplayer-technology-works/
15. Zarzour, H., Sellami, M.: srCE: A collaborative editing of scalable semantic stores
    on P2P networks. International Journal of Computer Applications in Technology
    48(1), 1–13 (Jan 2013). https://doi.org/10.1504/IJCAT.2013.055562