Concurrency Effects Over Variable-size Identifiers in Distributed Collaborative Editing Brice Nédelec, Pascal Molli, Achour Mostefaoui, Emmanuel Desmontils LINA, 2 rue de la Houssinière BP92208, 44322 Nantes Cedex 03 first.last@univ-nantes.fr ABSTRACT 1. INTRODUCTION Distributed collaborative editors such as Google Docs or Distributed collaborative editors allow to distribute the Etherpad allow to distribute the work across time, space work across space, time and organizations. Some trending and organizations. In this paper, we focus on distributed editors such as Google Docs [5] use the Operational Trans- collaborative editors based on the Conflict-free Replicated form (OT) approach [14, 15]. However, an alternative based Data Type approach (CRDT). CRDTs encompass a set of on Conflict-free Replicated Data Types (CRDTs) [12, 13] well-known data types such as sets, graphs, sequences, etc. exists. Compared to OT, CRDTs are more decentralized CRDTs for sequences model a document as a set of ele- and scale better. ments (character, line, paragraph, etc.) with unique iden- The CRDTs belong to the optimistic replication [10, 11] tifiers, providing two commutative update operations: in- approach. Therefore, replicas involved in the collaboration sert and delete. The identifiers of elements can be either of are guaranteed eventual convergence to an identical state. fixed-size or variable-size. Recently, a strategy for assigning To provide convergence in a replicated sequence, CRDTs use variable-size identifiers called LSEQ has been proposed for unique identifiers to link elements. When the sequence is a CRDTs for sequences. LSEQ lowers the space complexity document, the elements can be characters, lines, paragraphs, of variable-size identifiers CRDTs from linear to sub-linear. etc. While experiments show that it works locally, it fails to pro- We distinguish two types of sequence CRDTs: (i) The vide this bound with multiple users and latency. In this tombstone CRDTs [1, 3, 7, 8, 9, 16, 18, 19] use fixed-size paper, we propose h-LSEQ, an improvement of LSEQ that identifiers. However, the delete operations only mark and preserves its space complexity among multiple collaborators, hide elements to the users. Consequently, these deleted el- regardless of the latency. Ultimately, this improvement al- ements permanently affect performances. (ii) The variable- lows to safely build distributed collaborative editors based size identifiers CRDTs [8, 17] directly encode the order re- on CRDTs. We validate our approach with simulations in- lation in the identifiers. However, their identifiers can grow volving latency and multiple users. linearly depending on the editing behaviour. Consequently, they remain unsafe in the distributed collaborative editing Categories and Subject Descriptors context. To overcome their respective limitations, these approaches I.7.1 [Document and Text Processing]: Document need an additional protocol [4, 9] related to garbage col- and Text Editing—Document management; C.2.4 [Computer- lection mechanisms. Nevertheless, these protocols require Communication Networks]: Distributed Systems—Dis- global knowledge over participants. In the general case, such tributed applications; D.2.8 [Software Engineering]: Met- knowledge remains prohibitively expensive in the context of rics—Complexity measures distributed networks subject to churn. Recently, LSEQ [6] allowed variable-size identifiers CRDTs Keywords to get rid of this costly additional protocol by lowering their upper bound on space complexity from linear to sub-linear. Distributed Documents; Document Authoring Tools and Yet, LSEQ does not guarantee a safe allocation. Indeed, it Systems; Distributed Collaborative Editing; Real-time Edit- uses multiple antagonist strategies which, without any coor- ing; Conflict-free Replicated Data Types dination between collaborators, leads to an quadratic growth of identifiers. The contributions of this paper are: (i) experiments that highlight the negative effect of multiple users edition on the size of LSEQ identifiers (ii) experiments that highlight that an increasing latency does not have a negative impact on the size of LSEQ identifiers. (iii) an improvement of LSEQ called h-LSEQ that extends the single user sub-linear upper This work is licensed under the Creative Commons Attribution- bound on space complexity to any number of users, regard- ShareAlike 3.0 Unported License (CC BY-SA 3.0). To view a copy of the license, visit http://creativecommons.org/licenses/by-sa/3.0/. less of the latency, and without any additional cost. Ultimately, h-LSEQ allows distributed collaborative edi- DChanges 2013, September 10th, 2013, Florence, Italy. tors to safely use variable-size identifiers CRDTs. ceur-ws.org Volume 1008, http://ceur-ws.org/Vol-1008/paper5.pdf. The rest of the paper is organized as follow: Section 2 de- previously mandatory garbage collecting protocol. Three tails variable-size identifiers CRDTs for sequences, the allo- aspects define LSEQ: cation strategy LSEQ, and highlights the motivation of this – base doubling: regarding the underlying tree model, paper. Section 3 presents h-LSEQ a combination of LSEQ each node can have twice more children than its par- and a hash-based choice strategy to overcome the LSEQ ent. It follows the properties of exponential trees [2]. limitations. Section 4 shows the results of experiments over – two antagonist allocation strategies: boundary+ and LSEQ and h-LSEQ considering latency and multiple users. boundary– that are designed for end-edition and front- Finally, Section 5 reviews the related works. edition respectively. – random strategy choice: it randomly assigns an allo- cation strategy to each depth of the tree. It follows 2. BACKGROUND the intuition: since we have no prior knowledge of the Conflict-free Replicated Data Types belong to the opti- editing behaviour, the strategy choice should not favor mistic replication approach [10, 11]. In the context of dis- any editing behaviour. Consequently, the frequencies of tributed collaborative editing, the replicated data is a doc- appearance of each allocation strategy have to be equal. ument, where: (1) each insert/delete operation is prepared Despite the fact that numerous experiments have been locally and broadcast, (2) each remote replica receives and performed using LSEQ including real documents extracted integrates the changes, (3) all involved replicas eventually from Wikipedia, they did not include any concurrency. Al- converge to an identical state. though, Wikipedia’s revisions already present a serialization To provide the convergence property, CRDTs for sequences of the document without explicit concurrency, experiments model a document as a set of couples helt, idi where id ∈ with this type of documents are of great importance [1]. (I, egy fails to provide sub-linearly upper-bounded identifiers in 5: collaboration involving multiple users. 6: function alloc(p, q ∈ I) The solution is to reach a global agreement over replicas 7: let depth := 0; interval := 0; on which is the allocation strategy used at each depth of the 8: while (interval < 1) do  Not enough for 1 insert underlying tree model. Since LSEQ aims to get rid of proto- 9: depth + +; cols related to garbage collecting mechanisms, any solution 10: interval := pref ix(q, depth) − pref ix(p, depth) − 1; that requires additional communication is inconceivable. 11: end while 12: let step := min(boundary, interval);  Process the We propose to use a hash strategy choice component which maximum step to stay between p and q have the same local behaviour than the original random 13: let idStrat := h(depth); strategy choice, but it also provides the required implicit 14:  Call the hash function global agreement. Indeed, instead of synchronizing the set 15: let id := S.get(idStrat).invoke(p, q, depth, step); of users, we provide an a priori agreement by the mean of 16:  Call the allocation strategy according to hash a hash function. This agreement avoids the possibility of 17: return id; antagonist choices which would have led to a bad global al- 18: end function location of identifiers. Furthermore, it does not introduce 19: any additional cost to LSEQ. 20: function h ( depth ∈ N∗ ) Algorithm 1 details the h-LSEQ allocation strategy. The 21: return Random(seed ∗ depth).nextInt(0, 1); changes over LSEQ are simply highlighted. This section ex- 22: end function plains this algorithm. Extracted from [6], the prefix function 23: return a copy of the identifier id, i.e., a series of numbers 24: function boundary+(p, q ∈ I, depth, step ∈ N∗ ) truncated at depth or increased until depth. The function 25: let addV al := RandInt(0, step) + 1; carefully encodes each number of this copy in a base depend- 26: return pref ix(p, depth) + addV al; ing on its depth in order to directly apply the arithmetic 27: end function 28: operations of line 26 and line 31. For instance, assuming 29: function boundary–(p, q ∈ I, depth, step ∈ N∗ ) that the departure base is 25 , a call to pref ix([13.42.37], 2) 30: let subV al := RandInt(0, step) + 1; encodes [13.42] using 5 + 6 bits. 31: return pref ix(q, depth) − subV al; The functions boundary+ and boundary− are two allo- 32: end function cation strategies, note that they do not strictly respect the 33: alloc function signature in order to factorize the computa- 34: function prefix(id ∈ I, depth ∈ N∗ ) 35: let idCopy := [ ]; tion of depth and step. 36: for (cpt := 1 to depth) do 37: if (cpt < id.size) then  Copy the value 3.1 Locally 38: idCopy := idCopy.append(id.at(cpt)); The strategy choice function is surjective and its signature 39: else  Add 0 encoded in the right base 40: idCopy := idCopy.append(0base(cpt) ); is: h(depth ∈ N∗ ) : N. The returned value corresponds to an 41: end if allocation strategy unique identifier. The algorithm of the 42: end for allocation strategy h-LSEQ follows 3 steps: 43: return idCopy; 1. lines 8-11: computes the depth of the future identifiers 44: end function to allocate 2. line 13: calls the strategy choice function h using the depth 3.2 Remotely Each collaborator generates the same hash function thanks 3. line 15: calls the allocation strategy using the identifier to the shared seed. Also, all collaborators use the same map- returned by h ping id → allocation strategy (cf. line 3). Consequently, all The strategy choice function returns strategy identifiers hash functions map the same depth to the same allocation following a uniform law. Thus, frequencies of appearance strategies. The idea is to reach a consensus on which strate- of each strategy are equal and does not favor any editing gies to employ in order to avoid the waste of identifiers’ space behaviour. Furthermore, the unpredictability due to ran- due to the choice of antagonist strategies. domness preserves the model from intentional and malicious Figure 2 highlights the differences between the original attacks. random strategy choice and our hash strategy choice. Both The hash function fulfills the requirement of the strategy cases present two collaborators inserting 2 elements one af- choice function h. However, contrarily to the random strat- ter the other. The expected result is the sequence “abcd”. egy choice component, it requires an initialization. Indeed, Hence, Collaborator 1 generates the “a” and “c” and Collab- the creator of the document must share a hidden seed (cf. orator 2 generates the “b” and “d”. In both cases they draw line 2) within the document that each collaborator will use the same numbers. However, in the case of random strategy to generate the hash function specific to this document. In choice, one collaborator randomly chooses boundary+ twice Algorithm 1 at line 21, we simply use the common random while the other randomly chooses boundary– twice. When function initialized with the seed and the depth. the editing behaviour is monotonic, the size of identifiers LSEQ h-LSEQ 100 revision 90 80 9 30 9 10 17 18 70 60 n rev 0 31 0 31 b 50 Begin a 20 ids End Begin a b c d End 40 30 7 63 20 10 c d 0 55 ids 0 20 40 60 80 100 Collaborator 1 Collaborator 2 250 LSEQ+rand LSEQ+hash 200 Figure 2: Representation of the underlying exponential tree id bit-length model of LSEQ and h-LSEQ. Two collaborators insert two 150 elements at the end of the document in order to get the 100 sequence “abcd”. On the random side, the collaborators em- 50 ploy antagonist strategies. On the hash side, the collabora- 0 tors use the same allocation strategy. Both sides draw the 0 20 40 60 80 100 n line same random numbers. Figure 3: Experiments on a synthetic document of 100 lines edited at the end by 10 collaborators with 10 operations quickly grows. On the opposite, the collaborators using h- per user. The top figure shows the revision number of the LSEQ implicitly agree on using boundary+ at depth-1, con- operation. The bottom figure shows the bit-length of each sequently the allocation follows the properties of the one identifier allocated to each line. On average, the bit-length user edition presented in [6]. of LSEQ and h-LSEQ identifiers are 94.7 and 39.1 bit/id The next section aims to corroborate our assumptions by respectively. performing experiments on different setups with varying the number of collaborators and the latency of the network. (hash). Both of these setups use boundary+ and boundary– and have the same variable values boundary = 10 and base = 4. EXPERIMENTS 24+depth . A set of 10 collaborators produces a document of This section is composed of two parts. First, the experi- 100 lines by performing 10 insert operations each. mentation focuses on the influence of the number of collab- Results: Figure 3 shows on the top part the spectrum of orators on the size of LSEQ and h-LSEQ identifiers. A set the synthetic document, i.e., the revision date of each line. of synthetic collaborators generates a document by succes- Since the bars are getting taller when they are closer of the sively performing insert operations at the end. This experi- end of the document, it indicates that the users edited at ment aims to show the behaviour of the two strategy choices, the end monotonically. On the second part of the Figure 3, and compare the results with the expected sub-linear space it shows the identifier bit-length associated to each line. We complexity. observe that the identifiers of rand setup quickly increase The second part of experiments consists in highlighting while the hash identifiers remain similar to the ones in the the effect of latency on a document edited by multiple users. experiment made for one user in [6]. Consequently, the hash Once again, it aims to compare LSEQ and h-LSEQ and setup is the most suitable setup in the context of distributed to show the impact of concurrency on the average size of collaborative editing. identifiers in the document. Reasons: both setups use boundary+ and boundary– al- The experiments focus on the digit bit-length of generated location strategies. However, each collaborator in the rand identifiers. Indeed, all variable-size CRDTs rely on source setup makes independent choices of allocation strategies when and clock to guarantee the unicity of identifiers, neverthe- required. Thus, if a collaborator chooses a particular strat- less, the space complexity mainly depends on the digit choice egy, and another user chooses the antagonist strategy, then made by the allocation strategy. a large number of identifiers is wasted when their operations To perform these experiments, we implemented a simu- are delivered to each other. On the other hand, when the lation framework called HumbleSimulator. The sources are same hash function spread over all collaborators generates available on the Github platform under the terms of the the same strategy choices, it keeps the random behaviour lo- GPL licence 1 . cally and also makes an implicit agreement on which strategy to employ at any given depth. 4.1 Multiple users experiment Objective: show that the random strategy choice with- 4.2 Latency experiment out taking into account the strategies employed by other Objective: show that increasing the latency, i.e., the collaborators leads to a quick growth in the size of identi- time between the generation and the delivery of an oper- fiers. On the opposite, when all collaborators uses a common ation, does not imply a growth in the size of identifiers. On allocation strategy at a given depth, the space complexity the contrary, when the latency increases, the average size of remains sub-linearly upper-bounded. allocated identifiers decreases. This behaviour is expected Description: we evaluate LSEQ and h-LSEQ strategies for LSEQ and h-LSEQ strategies. on a group of 10 collaborators. Thus, the setups are (i) a ran- Description: we simulate the production of a 100 lines dom strategy choice (rand) and (ii) a hash strategy choice document by a set of 10 collaborators. Each one of these col- 1. https://github.com/Chat-Wane/HumbleSimulator laborators performs 10 insert operations at the end of the 160 LSEQ+rand allocation strategy permits to reach this agreement. Conse- LSEQ+hash quently, the improvement of LSEQ called h-LSEQ can be 140 safely used in distributed collaborative editing. The latency experiment shows that increasing the latency parameter only 120 results in sharing the identifier space, and consequently low- 100 ers the size of identifiers. As a corollary, considering an im- mediate delivery of operations constitutes an upper-bound id bit-length 80 on the behaviour of the overall system for any latency. 60 5. RELATED WORK 40 Current distributed collaborative editors use optimistic replication [10, 11] to provide availability of data at a con- 20 sistency cost. Optimistic replication designs operations in two phases. First a replica prepares the result of an opera- 0 0 100 200 300 400 500 600 700 tion and then broadcasts it to other collaborators. Second, latency remote replicas integrate the newly received data. Operational Transform approach (OT) and Conflict-free Figure 4: Experiments of latency effects over the average Replicated Data Type approach (CRDT) belong to the op- bit-length of identifiers. The x-axis shows the latency varia- timistic class of replication. While CRDTs share the com- tion in number of rounds, i.e., the average time between the putational cost of operations between the local and remote send and the receive of an operation. The y-axis shows the parts of the replication protocol, the OT approaches prefer average bit-length of identifiers. A group of 10 collaborators to offer free local generation and pay an additional cost on generates a synthetic document of 100 lines by performing the integration of remote operations. Since the number of 10 operations each. Two configurations of LSEQ: the orig- remote operations quadratically grows when the number of inal random strategy choice and the hash strategy choice collaborators increases, the CRDTs scale better than OT (h-LSEQ). approaches. We distinguish two classes of CRDTs for sequences. The tombstone based CRDT approaches use death certificate on removed elements. Henceforth, these elements are hidden document. The users generate operations following the Pois- to the users, but still remain in the underlying model and son distribution of parameter λ = 100, i.e., users more likely undermine the performance. For instance, the number of generate an operation after 100 rounds (arbitrary unit) from tombstones in popular or controversial pages in Wikipedia their previous operation. Measurements are done at a la- pages would be very high due to vandalism and recoveries. tency of 5, 10, 20, 40, 80, 160, 320, 640 rounds. They concern The representatives of this class of CRDTs are WOOT [7], the average bit-length of identifiers in the document. We WOOTO [16], WOOTH [1], CT [3], Treedoc [8], PPS [18], evaluate two setups: the random strategy choice (rand) and RGA [9] and [19]. the hash strategy choice (hash). Both setups use the alloca- To get rid of these marked elements, the tombstone based tion strategies boundary+ and boundary− with boundary = CRDTs require additional protocols related to garbage col- 10 and departure base = 24+depth . lecting mechanisms. In [4, 9] they propose approaches to Results: Figure 4 shows the results of the latency ex- safely purge the tombstones. However, it requires a global periments. As expected, the two setups allocate identifiers knowledge over collaborators. In [9], they describe an ap- with an average bit-length that quickly decreases when the proach that obtains the global knowledge by maintaining a latency increases. Therefore, the latency does not badly vector clock with one entry per participant although it is affect LSEQ and h-LSEQ allocation strategies. Also, it con- too costly in large distributed network where collaborators firms observations made in the previous experiment: hash can enter and leave the system. In [4], they describe the performs better than rand, especially under low latency. core nebula approach that permits to reach the consensus Reasons: increasing the latency delays the arrival of op- over participants. However, this approach constrains the erations to the remote collaborators. Thus, each collabora- network topology to make the consensus reachable and uses tor works locally until operations generated by other users a costly catch up protocol. are delivered. Considering the extreme case with the maxi- The alternative to tombstones is the variable-size identi- mum latency, each collaborator generates his 10 insertions, fiers CRDT approach. These CRDTs directly encode the and then merges the others’ 90 operations. Consequently, total order of elements into their unique identifiers. Thus, the 100 operations share the same space, i.e., the digit part they do not require tombstones. However, the identifiers of identifiers can be the same for multiple elements, and the can grow. In Logoot [17] and Treedoc [8], identifiers have total order is maintained by the source and clock. a linear space complexity compared to the number of in- sert operations. Furthermore, their size heavily depends on 4.3 Synthesis the position of insertions. For example, performing inser- The experiments evaluated the effect of concurrency on tions repetitively at the beginning of a document leads to the average bit-length of identifiers by varying the number identifiers with a size equals to the number of insert opera- of collaborators and the latency. They showed that an im- tions. Consequently, such approaches require an additional plicit and a priori agreement on allocation strategy choice re-balance protocol, like tombstone CRDTs. However, their is required to maintain the sub-linear upper-bound on space underlying allocation strategies can be replaced by LSEQ [6] complexity. Employing the same hash function to choose the that provides a sub-linear upper-bound on the space com- plexity regardless of the editing behaviour. Symposium on Document Engineering, pages 103–112, The LSEQ approach uses two antagonist allocation strate- Mountain View, California, États-Unis, Sept. 2011. gies. Although paper [1] argues about the importance of con- currency in CRDTs experiments, the original paper of LSEQ [2] A. Andersson and M. Thorup. Dynamic ordered sets does not consider this aspect. In this paper, we showed with exponential search trees. J. ACM, 54(3), June on synthetic documents that its original configuration does 2007. not scale in the number of users. The added value of syn- [3] V. Grishchenko. Deep hypertext with embedded thetic documents over real collaborative documents is the revision control implemented in regular expressions. In total flexibility on their parameters, especially on the edit- Proceedings of the 6th International Symposium on ing behaviour. Indeed, the number of available real collab- Wikis and Open Collaboration, WikiSym ’10, pages orative documents (including concurrency) is very limited. 3:1–3:10, New York, NY, USA, 2010. ACM. Moreover, such experiments are costly to set up, and often biased by the nature of the task to perform. [4] M. Letia, N. Preguiça, and M. Shapiro. Crdts: In this paper, we propose to replace the random strategy Consistency without concurrency control. Arxiv choice component of LSEQ by a hash-based choice strategy preprint arXiv:0907.0929, 2009. to reach an implicit consensus on which strategy to employ. This configuration called h-LSEQ, does not require addi- [5] D. A. Nichols, P. Curtis, M. Dixon, and J. Lamping. tional computation and extends the sub-linear complexity High-latency, low-bandwidth windowing in the jupiter bound of LSEQ from a single user to multiple collaborators collaboration system. In Proceedings of the 8th annual regardless of the latency. Thus, h-LSEQ constitutes a safe ACM symposium on User interface and software allocation strategy for sequences CRDTs, even in concurrent technology, pages 111–120. ACM, 1995. cases. [6] B. Nédelec, P. Molli, A. Mostefaoui, and E. Desmontils. LSEQ: an Adaptive Structure for Sequences in Distributed Collaborative Editing. In 6. CONCLUSION ACM, editor, 13th ACM Symposium on Document In this paper, we presented an improvement of the alloca- Engineering, Sept. 2013. tion strategy LSEQ called h-LSEQ. Contrarily to the origi- nal approach, our configuration supports multiple users re- [7] G. Oster, P. Urso, P. Molli, and A. Imine. Data gardless of the latency. Indeed, replacing the random strat- consistency for p2p collaborative editing. In egy choice by the hash-based choice strategy allows to reach Proceedings of the 2006 20th anniversary conference a global sub-linear upper-bound on space complexity. As a on Computer supported cooperative work, pages consequence, distributed collaborative editors can safely use 259–268. ACM, 2006. h-LSEQ and show better scalability than current trending [8] N. Preguiça, J. M. Marquès, M. Shapiro, and editors such as Google Docs, Etherpad, etc. M. Letia. A commutative replicated data type for The hash-based choice strategy is a common function over cooperative editing. In Distributed Computing participants that maps a depth in the underlying tree to an Systems, 2009. ICDCS’09. 29th IEEE International allocation strategy identifier. This surjective function ini- Conference on, pages 395–403. Ieee, 2009. tialized with a shared seed allows to reach an agreement on strategies with no additional synchronization cost. Con- [9] H.-G. Roh, M. Jeon, J.-S. Kim, and J. Lee. Replicated sequently, there is no loss in the identifiers’ space due to abstract data types: Building blocks for collaborative antagonist allocation strategies. applications. Journal of Parallel and Distributed This paper highlights the importance of multiple users Computing, 71(3):354–368, 2011. analysis in CRDTs for sequences, particularly in the case of multiple underlying allocation strategies. We also showed [10] Y. Saito and M. Shapiro. Replication: Optimistic that latency does not badly affect the size of identifiers in Approaches. technical report, 2002. variable-size identifiers CRDTs. However, we performed ex- [11] Y. Saito and M. Shapiro. Optimistic replication. ACM periments with synthetic documents in order to show the Comput. Surv., 37(1):42–81, Mar. 2005. general behaviour of the allocation without considering se- mantically consistent documents. [12] M. Shapiro, N. Preguiça, C. Baquero, and Future works include the formal proof of the sub-linear M. Zawirski. A comprehensive study of Convergent upper-bound on space complexity of h-LSEQ. A probability and Commutative Replicated Data Types. Rapport de analysis of the worst-case scenario is mandatory to show that recherche RR-7506, INRIA, Jan. 2011. it seldom happens. We plan to experiment h-LSEQ on a cor- pus of real documents including multiple collaborators and [13] M. Shapiro, N. Preguiça, C. Baquero, and concurrency. Finally, we aim to build a real distributed col- M. Zawirski. Conflict-free replicated data types. laborative editor based on a variable-size identifiers CRDT Stabilization, Safety, and Security of Distributed using h-LSEQ as allocation strategy. Systems, pages 386–400, 2011. [14] C. Sun and C. Ellis. Operational transformation in real-time group editors: issues, algorithms, and References achievements. In Proceedings of the 1998 ACM [1] M. Ahmed-Nacer, C.-L. Ignat, G. Oster, H.-G. Roh, conference on Computer supported cooperative work, and P. Urso. Evaluating CRDTs for Real-time CSCW ’98, pages 59–68, New York, NY, USA, 1998. Document Editing. In ACM, editor, 11th ACM ACM. [15] C. Sun, X. Jia, Y. Zhang, Y. Yang, and D. Chen. Achieving convergence, causality preservation, and intention preservation in real-time cooperative editing systems. ACM Transactions on Computer-Human Interaction (TOCHI), 5(1):63–108, 1998. [16] S. Weiss, P. Urso, and P. Molli. Wooki: A p2p wiki-based collaborative writing tool. In B. Benatallah, F. Casati, D. Georgakopoulos, C. Bartolini, W. Sadiq, and C. Godart, editors, Web Information Systems Engineering – WISE 2007, volume 4831 of Lecture Notes in Computer Science, pages 503–512. Springer Berlin Heidelberg, 2007. [17] S. Weiss, P. Urso, and P. Molli. Logoot: a scalable optimistic replication algorithm for collaborative editing on p2p networks. In Distributed Computing Systems, 2009. ICDCS’09. 29th IEEE International Conference on, pages 404–412. IEEE, 2009. [18] Q. Wu, C. Pu, and J. Ferreira. A partial persistent data structure to support consistency in real-time collaborative editing. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on, pages 776–779, 2010. [19] W. Yu. A string-wise crdt for group editing. In Proceedings of the 17th ACM international conference on Supporting group work, GROUP ’12, pages 141–144, New York, NY, USA, 2012. ACM.