=Paper=
{{Paper
|id=None
|storemode=property
|title=Concurrency Effects Over Variable-size Identifiers in Distributed Collaborative Editing
|pdfUrl=https://ceur-ws.org/Vol-1008/paper5.pdf
|volume=Vol-1008
|dblpUrl=https://dblp.org/rec/conf/doceng/NedelecMMD13a
}}
==Concurrency Effects Over Variable-size Identifiers in Distributed Collaborative Editing==
Concurrency Effects Over Variable-size Identifiers in
Distributed Collaborative Editing
Brice Nédelec, Pascal Molli, Achour Mostefaoui, Emmanuel Desmontils
LINA, 2 rue de la Houssinière
BP92208, 44322 Nantes Cedex 03
first.last@univ-nantes.fr
ABSTRACT 1. INTRODUCTION
Distributed collaborative editors such as Google Docs or Distributed collaborative editors allow to distribute the
Etherpad allow to distribute the work across time, space work across space, time and organizations. Some trending
and organizations. In this paper, we focus on distributed editors such as Google Docs [5] use the Operational Trans-
collaborative editors based on the Conflict-free Replicated form (OT) approach [14, 15]. However, an alternative based
Data Type approach (CRDT). CRDTs encompass a set of on Conflict-free Replicated Data Types (CRDTs) [12, 13]
well-known data types such as sets, graphs, sequences, etc. exists. Compared to OT, CRDTs are more decentralized
CRDTs for sequences model a document as a set of ele- and scale better.
ments (character, line, paragraph, etc.) with unique iden- The CRDTs belong to the optimistic replication [10, 11]
tifiers, providing two commutative update operations: in- approach. Therefore, replicas involved in the collaboration
sert and delete. The identifiers of elements can be either of are guaranteed eventual convergence to an identical state.
fixed-size or variable-size. Recently, a strategy for assigning To provide convergence in a replicated sequence, CRDTs use
variable-size identifiers called LSEQ has been proposed for unique identifiers to link elements. When the sequence is a
CRDTs for sequences. LSEQ lowers the space complexity document, the elements can be characters, lines, paragraphs,
of variable-size identifiers CRDTs from linear to sub-linear. etc.
While experiments show that it works locally, it fails to pro- We distinguish two types of sequence CRDTs: (i) The
vide this bound with multiple users and latency. In this tombstone CRDTs [1, 3, 7, 8, 9, 16, 18, 19] use fixed-size
paper, we propose h-LSEQ, an improvement of LSEQ that identifiers. However, the delete operations only mark and
preserves its space complexity among multiple collaborators, hide elements to the users. Consequently, these deleted el-
regardless of the latency. Ultimately, this improvement al- ements permanently affect performances. (ii) The variable-
lows to safely build distributed collaborative editors based size identifiers CRDTs [8, 17] directly encode the order re-
on CRDTs. We validate our approach with simulations in- lation in the identifiers. However, their identifiers can grow
volving latency and multiple users. linearly depending on the editing behaviour. Consequently,
they remain unsafe in the distributed collaborative editing
Categories and Subject Descriptors context.
To overcome their respective limitations, these approaches
I.7.1 [Document and Text Processing]: Document need an additional protocol [4, 9] related to garbage col-
and Text Editing—Document management; C.2.4 [Computer- lection mechanisms. Nevertheless, these protocols require
Communication Networks]: Distributed Systems—Dis- global knowledge over participants. In the general case, such
tributed applications; D.2.8 [Software Engineering]: Met- knowledge remains prohibitively expensive in the context of
rics—Complexity measures distributed networks subject to churn.
Recently, LSEQ [6] allowed variable-size identifiers CRDTs
Keywords to get rid of this costly additional protocol by lowering their
upper bound on space complexity from linear to sub-linear.
Distributed Documents; Document Authoring Tools and
Yet, LSEQ does not guarantee a safe allocation. Indeed, it
Systems; Distributed Collaborative Editing; Real-time Edit-
uses multiple antagonist strategies which, without any coor-
ing; Conflict-free Replicated Data Types
dination between collaborators, leads to an quadratic growth
of identifiers.
The contributions of this paper are: (i) experiments that
highlight the negative effect of multiple users edition on the
size of LSEQ identifiers (ii) experiments that highlight that
an increasing latency does not have a negative impact on
the size of LSEQ identifiers. (iii) an improvement of LSEQ
called h-LSEQ that extends the single user sub-linear upper
This work is licensed under the Creative Commons Attribution-
bound on space complexity to any number of users, regard-
ShareAlike 3.0 Unported License (CC BY-SA 3.0). To view a copy
of the license, visit http://creativecommons.org/licenses/by-sa/3.0/. less of the latency, and without any additional cost.
Ultimately, h-LSEQ allows distributed collaborative edi-
DChanges 2013, September 10th, 2013, Florence, Italy. tors to safely use variable-size identifiers CRDTs.
ceur-ws.org Volume 1008, http://ceur-ws.org/Vol-1008/paper5.pdf.
The rest of the paper is organized as follow: Section 2 de- previously mandatory garbage collecting protocol. Three
tails variable-size identifiers CRDTs for sequences, the allo- aspects define LSEQ:
cation strategy LSEQ, and highlights the motivation of this – base doubling: regarding the underlying tree model,
paper. Section 3 presents h-LSEQ a combination of LSEQ each node can have twice more children than its par-
and a hash-based choice strategy to overcome the LSEQ ent. It follows the properties of exponential trees [2].
limitations. Section 4 shows the results of experiments over – two antagonist allocation strategies: boundary+ and
LSEQ and h-LSEQ considering latency and multiple users. boundary– that are designed for end-edition and front-
Finally, Section 5 reviews the related works. edition respectively.
– random strategy choice: it randomly assigns an allo-
cation strategy to each depth of the tree. It follows
2. BACKGROUND the intuition: since we have no prior knowledge of the
Conflict-free Replicated Data Types belong to the opti- editing behaviour, the strategy choice should not favor
mistic replication approach [10, 11]. In the context of dis- any editing behaviour. Consequently, the frequencies of
tributed collaborative editing, the replicated data is a doc- appearance of each allocation strategy have to be equal.
ument, where: (1) each insert/delete operation is prepared Despite the fact that numerous experiments have been
locally and broadcast, (2) each remote replica receives and performed using LSEQ including real documents extracted
integrates the changes, (3) all involved replicas eventually from Wikipedia, they did not include any concurrency. Al-
converge to an identical state. though, Wikipedia’s revisions already present a serialization
To provide the convergence property, CRDTs for sequences of the document without explicit concurrency, experiments
model a document as a set of couples helt, idi where id ∈ with this type of documents are of great importance [1].
(I,
egy fails to provide sub-linearly upper-bounded identifiers in
5:
collaboration involving multiple users. 6: function alloc(p, q ∈ I)
The solution is to reach a global agreement over replicas 7: let depth := 0; interval := 0;
on which is the allocation strategy used at each depth of the 8: while (interval < 1) do Not enough for 1 insert
underlying tree model. Since LSEQ aims to get rid of proto- 9: depth + +;
cols related to garbage collecting mechanisms, any solution 10: interval := pref ix(q, depth) − pref ix(p, depth) − 1;
that requires additional communication is inconceivable. 11: end while
12: let step := min(boundary, interval); Process the
We propose to use a hash strategy choice component which maximum step to stay between p and q
have the same local behaviour than the original random 13: let idStrat := h(depth);
strategy choice, but it also provides the required implicit
14: Call the hash function
global agreement. Indeed, instead of synchronizing the set
15: let id := S.get(idStrat).invoke(p, q, depth, step);
of users, we provide an a priori agreement by the mean of
16: Call the allocation strategy according to hash
a hash function. This agreement avoids the possibility of 17: return id;
antagonist choices which would have led to a bad global al- 18: end function
location of identifiers. Furthermore, it does not introduce 19:
any additional cost to LSEQ. 20: function h ( depth ∈ N∗ )
Algorithm 1 details the h-LSEQ allocation strategy. The 21: return Random(seed ∗ depth).nextInt(0, 1);
changes over LSEQ are simply highlighted. This section ex- 22: end function
plains this algorithm. Extracted from [6], the prefix function 23:
return a copy of the identifier id, i.e., a series of numbers 24: function boundary+(p, q ∈ I, depth, step ∈ N∗ )
truncated at depth or increased until depth. The function 25: let addV al := RandInt(0, step) + 1;
carefully encodes each number of this copy in a base depend- 26: return pref ix(p, depth) + addV al;
ing on its depth in order to directly apply the arithmetic 27: end function
28:
operations of line 26 and line 31. For instance, assuming 29: function boundary–(p, q ∈ I, depth, step ∈ N∗ )
that the departure base is 25 , a call to pref ix([13.42.37], 2) 30: let subV al := RandInt(0, step) + 1;
encodes [13.42] using 5 + 6 bits. 31: return pref ix(q, depth) − subV al;
The functions boundary+ and boundary− are two allo- 32: end function
cation strategies, note that they do not strictly respect the 33:
alloc function signature in order to factorize the computa- 34: function prefix(id ∈ I, depth ∈ N∗ )
35: let idCopy := [ ];
tion of depth and step. 36: for (cpt := 1 to depth) do
37: if (cpt < id.size) then Copy the value
3.1 Locally 38: idCopy := idCopy.append(id.at(cpt));
The strategy choice function is surjective and its signature 39: else Add 0 encoded in the right base
40: idCopy := idCopy.append(0base(cpt) );
is: h(depth ∈ N∗ ) : N. The returned value corresponds to an
41: end if
allocation strategy unique identifier. The algorithm of the 42: end for
allocation strategy h-LSEQ follows 3 steps: 43: return idCopy;
1. lines 8-11: computes the depth of the future identifiers 44: end function
to allocate
2. line 13: calls the strategy choice function h using the
depth 3.2 Remotely
Each collaborator generates the same hash function thanks
3. line 15: calls the allocation strategy using the identifier
to the shared seed. Also, all collaborators use the same map-
returned by h
ping id → allocation strategy (cf. line 3). Consequently, all
The strategy choice function returns strategy identifiers hash functions map the same depth to the same allocation
following a uniform law. Thus, frequencies of appearance strategies. The idea is to reach a consensus on which strate-
of each strategy are equal and does not favor any editing gies to employ in order to avoid the waste of identifiers’ space
behaviour. Furthermore, the unpredictability due to ran- due to the choice of antagonist strategies.
domness preserves the model from intentional and malicious Figure 2 highlights the differences between the original
attacks. random strategy choice and our hash strategy choice. Both
The hash function fulfills the requirement of the strategy cases present two collaborators inserting 2 elements one af-
choice function h. However, contrarily to the random strat- ter the other. The expected result is the sequence “abcd”.
egy choice component, it requires an initialization. Indeed, Hence, Collaborator 1 generates the “a” and “c” and Collab-
the creator of the document must share a hidden seed (cf. orator 2 generates the “b” and “d”. In both cases they draw
line 2) within the document that each collaborator will use the same numbers. However, in the case of random strategy
to generate the hash function specific to this document. In choice, one collaborator randomly chooses boundary+ twice
Algorithm 1 at line 21, we simply use the common random while the other randomly chooses boundary– twice. When
function initialized with the seed and the depth. the editing behaviour is monotonic, the size of identifiers
LSEQ h-LSEQ 100
revision
90
80
9 30 9 10 17 18 70
60
n rev
0 31 0 31
b 50
Begin a 20 ids End Begin a b c d End 40
30
7 63
20
10
c d 0
55 ids 0 20 40 60 80 100
Collaborator 1
Collaborator 2 250
LSEQ+rand
LSEQ+hash
200
Figure 2: Representation of the underlying exponential tree
id bit-length
model of LSEQ and h-LSEQ. Two collaborators insert two 150
elements at the end of the document in order to get the 100
sequence “abcd”. On the random side, the collaborators em- 50
ploy antagonist strategies. On the hash side, the collabora-
0
tors use the same allocation strategy. Both sides draw the 0 20 40 60 80 100
n line
same random numbers.
Figure 3: Experiments on a synthetic document of 100 lines
edited at the end by 10 collaborators with 10 operations
quickly grows. On the opposite, the collaborators using h- per user. The top figure shows the revision number of the
LSEQ implicitly agree on using boundary+ at depth-1, con- operation. The bottom figure shows the bit-length of each
sequently the allocation follows the properties of the one identifier allocated to each line. On average, the bit-length
user edition presented in [6]. of LSEQ and h-LSEQ identifiers are 94.7 and 39.1 bit/id
The next section aims to corroborate our assumptions by respectively.
performing experiments on different setups with varying the
number of collaborators and the latency of the network.
(hash). Both of these setups use boundary+ and boundary–
and have the same variable values boundary = 10 and base =
4. EXPERIMENTS 24+depth . A set of 10 collaborators produces a document of
This section is composed of two parts. First, the experi- 100 lines by performing 10 insert operations each.
mentation focuses on the influence of the number of collab- Results: Figure 3 shows on the top part the spectrum of
orators on the size of LSEQ and h-LSEQ identifiers. A set the synthetic document, i.e., the revision date of each line.
of synthetic collaborators generates a document by succes- Since the bars are getting taller when they are closer of the
sively performing insert operations at the end. This experi- end of the document, it indicates that the users edited at
ment aims to show the behaviour of the two strategy choices, the end monotonically. On the second part of the Figure 3,
and compare the results with the expected sub-linear space it shows the identifier bit-length associated to each line. We
complexity. observe that the identifiers of rand setup quickly increase
The second part of experiments consists in highlighting while the hash identifiers remain similar to the ones in the
the effect of latency on a document edited by multiple users. experiment made for one user in [6]. Consequently, the hash
Once again, it aims to compare LSEQ and h-LSEQ and setup is the most suitable setup in the context of distributed
to show the impact of concurrency on the average size of collaborative editing.
identifiers in the document. Reasons: both setups use boundary+ and boundary– al-
The experiments focus on the digit bit-length of generated location strategies. However, each collaborator in the rand
identifiers. Indeed, all variable-size CRDTs rely on source setup makes independent choices of allocation strategies when
and clock to guarantee the unicity of identifiers, neverthe- required. Thus, if a collaborator chooses a particular strat-
less, the space complexity mainly depends on the digit choice egy, and another user chooses the antagonist strategy, then
made by the allocation strategy. a large number of identifiers is wasted when their operations
To perform these experiments, we implemented a simu- are delivered to each other. On the other hand, when the
lation framework called HumbleSimulator. The sources are same hash function spread over all collaborators generates
available on the Github platform under the terms of the the same strategy choices, it keeps the random behaviour lo-
GPL licence 1 . cally and also makes an implicit agreement on which strategy
to employ at any given depth.
4.1 Multiple users experiment
Objective: show that the random strategy choice with- 4.2 Latency experiment
out taking into account the strategies employed by other Objective: show that increasing the latency, i.e., the
collaborators leads to a quick growth in the size of identi- time between the generation and the delivery of an oper-
fiers. On the opposite, when all collaborators uses a common ation, does not imply a growth in the size of identifiers. On
allocation strategy at a given depth, the space complexity the contrary, when the latency increases, the average size of
remains sub-linearly upper-bounded. allocated identifiers decreases. This behaviour is expected
Description: we evaluate LSEQ and h-LSEQ strategies for LSEQ and h-LSEQ strategies.
on a group of 10 collaborators. Thus, the setups are (i) a ran- Description: we simulate the production of a 100 lines
dom strategy choice (rand) and (ii) a hash strategy choice document by a set of 10 collaborators. Each one of these col-
1. https://github.com/Chat-Wane/HumbleSimulator laborators performs 10 insert operations at the end of the
160
LSEQ+rand
allocation strategy permits to reach this agreement. Conse-
LSEQ+hash
quently, the improvement of LSEQ called h-LSEQ can be
140
safely used in distributed collaborative editing. The latency
experiment shows that increasing the latency parameter only
120
results in sharing the identifier space, and consequently low-
100
ers the size of identifiers. As a corollary, considering an im-
mediate delivery of operations constitutes an upper-bound
id bit-length
80 on the behaviour of the overall system for any latency.
60
5. RELATED WORK
40 Current distributed collaborative editors use optimistic
replication [10, 11] to provide availability of data at a con-
20 sistency cost. Optimistic replication designs operations in
two phases. First a replica prepares the result of an opera-
0
0 100 200 300 400 500 600 700
tion and then broadcasts it to other collaborators. Second,
latency remote replicas integrate the newly received data.
Operational Transform approach (OT) and Conflict-free
Figure 4: Experiments of latency effects over the average Replicated Data Type approach (CRDT) belong to the op-
bit-length of identifiers. The x-axis shows the latency varia- timistic class of replication. While CRDTs share the com-
tion in number of rounds, i.e., the average time between the putational cost of operations between the local and remote
send and the receive of an operation. The y-axis shows the parts of the replication protocol, the OT approaches prefer
average bit-length of identifiers. A group of 10 collaborators to offer free local generation and pay an additional cost on
generates a synthetic document of 100 lines by performing the integration of remote operations. Since the number of
10 operations each. Two configurations of LSEQ: the orig- remote operations quadratically grows when the number of
inal random strategy choice and the hash strategy choice collaborators increases, the CRDTs scale better than OT
(h-LSEQ). approaches.
We distinguish two classes of CRDTs for sequences. The
tombstone based CRDT approaches use death certificate on
removed elements. Henceforth, these elements are hidden
document. The users generate operations following the Pois- to the users, but still remain in the underlying model and
son distribution of parameter λ = 100, i.e., users more likely undermine the performance. For instance, the number of
generate an operation after 100 rounds (arbitrary unit) from tombstones in popular or controversial pages in Wikipedia
their previous operation. Measurements are done at a la- pages would be very high due to vandalism and recoveries.
tency of 5, 10, 20, 40, 80, 160, 320, 640 rounds. They concern The representatives of this class of CRDTs are WOOT [7],
the average bit-length of identifiers in the document. We WOOTO [16], WOOTH [1], CT [3], Treedoc [8], PPS [18],
evaluate two setups: the random strategy choice (rand) and RGA [9] and [19].
the hash strategy choice (hash). Both setups use the alloca- To get rid of these marked elements, the tombstone based
tion strategies boundary+ and boundary− with boundary = CRDTs require additional protocols related to garbage col-
10 and departure base = 24+depth . lecting mechanisms. In [4, 9] they propose approaches to
Results: Figure 4 shows the results of the latency ex- safely purge the tombstones. However, it requires a global
periments. As expected, the two setups allocate identifiers knowledge over collaborators. In [9], they describe an ap-
with an average bit-length that quickly decreases when the proach that obtains the global knowledge by maintaining a
latency increases. Therefore, the latency does not badly vector clock with one entry per participant although it is
affect LSEQ and h-LSEQ allocation strategies. Also, it con- too costly in large distributed network where collaborators
firms observations made in the previous experiment: hash can enter and leave the system. In [4], they describe the
performs better than rand, especially under low latency. core nebula approach that permits to reach the consensus
Reasons: increasing the latency delays the arrival of op- over participants. However, this approach constrains the
erations to the remote collaborators. Thus, each collabora- network topology to make the consensus reachable and uses
tor works locally until operations generated by other users a costly catch up protocol.
are delivered. Considering the extreme case with the maxi- The alternative to tombstones is the variable-size identi-
mum latency, each collaborator generates his 10 insertions, fiers CRDT approach. These CRDTs directly encode the
and then merges the others’ 90 operations. Consequently, total order of elements into their unique identifiers. Thus,
the 100 operations share the same space, i.e., the digit part they do not require tombstones. However, the identifiers
of identifiers can be the same for multiple elements, and the can grow. In Logoot [17] and Treedoc [8], identifiers have
total order is maintained by the source and clock. a linear space complexity compared to the number of in-
sert operations. Furthermore, their size heavily depends on
4.3 Synthesis the position of insertions. For example, performing inser-
The experiments evaluated the effect of concurrency on tions repetitively at the beginning of a document leads to
the average bit-length of identifiers by varying the number identifiers with a size equals to the number of insert opera-
of collaborators and the latency. They showed that an im- tions. Consequently, such approaches require an additional
plicit and a priori agreement on allocation strategy choice re-balance protocol, like tombstone CRDTs. However, their
is required to maintain the sub-linear upper-bound on space underlying allocation strategies can be replaced by LSEQ [6]
complexity. Employing the same hash function to choose the that provides a sub-linear upper-bound on the space com-
plexity regardless of the editing behaviour. Symposium on Document Engineering, pages 103–112,
The LSEQ approach uses two antagonist allocation strate- Mountain View, California, États-Unis, Sept. 2011.
gies. Although paper [1] argues about the importance of con-
currency in CRDTs experiments, the original paper of LSEQ [2] A. Andersson and M. Thorup. Dynamic ordered sets
does not consider this aspect. In this paper, we showed with exponential search trees. J. ACM, 54(3), June
on synthetic documents that its original configuration does 2007.
not scale in the number of users. The added value of syn- [3] V. Grishchenko. Deep hypertext with embedded
thetic documents over real collaborative documents is the revision control implemented in regular expressions. In
total flexibility on their parameters, especially on the edit- Proceedings of the 6th International Symposium on
ing behaviour. Indeed, the number of available real collab- Wikis and Open Collaboration, WikiSym ’10, pages
orative documents (including concurrency) is very limited. 3:1–3:10, New York, NY, USA, 2010. ACM.
Moreover, such experiments are costly to set up, and often
biased by the nature of the task to perform. [4] M. Letia, N. Preguiça, and M. Shapiro. Crdts:
In this paper, we propose to replace the random strategy Consistency without concurrency control. Arxiv
choice component of LSEQ by a hash-based choice strategy preprint arXiv:0907.0929, 2009.
to reach an implicit consensus on which strategy to employ.
This configuration called h-LSEQ, does not require addi- [5] D. A. Nichols, P. Curtis, M. Dixon, and J. Lamping.
tional computation and extends the sub-linear complexity High-latency, low-bandwidth windowing in the jupiter
bound of LSEQ from a single user to multiple collaborators collaboration system. In Proceedings of the 8th annual
regardless of the latency. Thus, h-LSEQ constitutes a safe ACM symposium on User interface and software
allocation strategy for sequences CRDTs, even in concurrent technology, pages 111–120. ACM, 1995.
cases. [6] B. Nédelec, P. Molli, A. Mostefaoui, and
E. Desmontils. LSEQ: an Adaptive Structure for
Sequences in Distributed Collaborative Editing. In
6. CONCLUSION ACM, editor, 13th ACM Symposium on Document
In this paper, we presented an improvement of the alloca- Engineering, Sept. 2013.
tion strategy LSEQ called h-LSEQ. Contrarily to the origi-
nal approach, our configuration supports multiple users re- [7] G. Oster, P. Urso, P. Molli, and A. Imine. Data
gardless of the latency. Indeed, replacing the random strat- consistency for p2p collaborative editing. In
egy choice by the hash-based choice strategy allows to reach Proceedings of the 2006 20th anniversary conference
a global sub-linear upper-bound on space complexity. As a on Computer supported cooperative work, pages
consequence, distributed collaborative editors can safely use 259–268. ACM, 2006.
h-LSEQ and show better scalability than current trending [8] N. Preguiça, J. M. Marquès, M. Shapiro, and
editors such as Google Docs, Etherpad, etc. M. Letia. A commutative replicated data type for
The hash-based choice strategy is a common function over cooperative editing. In Distributed Computing
participants that maps a depth in the underlying tree to an Systems, 2009. ICDCS’09. 29th IEEE International
allocation strategy identifier. This surjective function ini- Conference on, pages 395–403. Ieee, 2009.
tialized with a shared seed allows to reach an agreement
on strategies with no additional synchronization cost. Con- [9] H.-G. Roh, M. Jeon, J.-S. Kim, and J. Lee. Replicated
sequently, there is no loss in the identifiers’ space due to abstract data types: Building blocks for collaborative
antagonist allocation strategies. applications. Journal of Parallel and Distributed
This paper highlights the importance of multiple users Computing, 71(3):354–368, 2011.
analysis in CRDTs for sequences, particularly in the case of
multiple underlying allocation strategies. We also showed [10] Y. Saito and M. Shapiro. Replication: Optimistic
that latency does not badly affect the size of identifiers in Approaches. technical report, 2002.
variable-size identifiers CRDTs. However, we performed ex- [11] Y. Saito and M. Shapiro. Optimistic replication. ACM
periments with synthetic documents in order to show the Comput. Surv., 37(1):42–81, Mar. 2005.
general behaviour of the allocation without considering se-
mantically consistent documents. [12] M. Shapiro, N. Preguiça, C. Baquero, and
Future works include the formal proof of the sub-linear M. Zawirski. A comprehensive study of Convergent
upper-bound on space complexity of h-LSEQ. A probability and Commutative Replicated Data Types. Rapport de
analysis of the worst-case scenario is mandatory to show that recherche RR-7506, INRIA, Jan. 2011.
it seldom happens. We plan to experiment h-LSEQ on a cor-
pus of real documents including multiple collaborators and [13] M. Shapiro, N. Preguiça, C. Baquero, and
concurrency. Finally, we aim to build a real distributed col- M. Zawirski. Conflict-free replicated data types.
laborative editor based on a variable-size identifiers CRDT Stabilization, Safety, and Security of Distributed
using h-LSEQ as allocation strategy. Systems, pages 386–400, 2011.
[14] C. Sun and C. Ellis. Operational transformation in
real-time group editors: issues, algorithms, and
References achievements. In Proceedings of the 1998 ACM
[1] M. Ahmed-Nacer, C.-L. Ignat, G. Oster, H.-G. Roh, conference on Computer supported cooperative work,
and P. Urso. Evaluating CRDTs for Real-time CSCW ’98, pages 59–68, New York, NY, USA, 1998.
Document Editing. In ACM, editor, 11th ACM ACM.
[15] C. Sun, X. Jia, Y. Zhang, Y. Yang, and D. Chen.
Achieving convergence, causality preservation, and
intention preservation in real-time cooperative editing
systems. ACM Transactions on Computer-Human
Interaction (TOCHI), 5(1):63–108, 1998.
[16] S. Weiss, P. Urso, and P. Molli. Wooki: A p2p
wiki-based collaborative writing tool. In
B. Benatallah, F. Casati, D. Georgakopoulos,
C. Bartolini, W. Sadiq, and C. Godart, editors, Web
Information Systems Engineering – WISE 2007,
volume 4831 of Lecture Notes in Computer Science,
pages 503–512. Springer Berlin Heidelberg, 2007.
[17] S. Weiss, P. Urso, and P. Molli. Logoot: a scalable
optimistic replication algorithm for collaborative
editing on p2p networks. In Distributed Computing
Systems, 2009. ICDCS’09. 29th IEEE International
Conference on, pages 404–412. IEEE, 2009.
[18] Q. Wu, C. Pu, and J. Ferreira. A partial persistent
data structure to support consistency in real-time
collaborative editing. In Data Engineering (ICDE),
2010 IEEE 26th International Conference on, pages
776–779, 2010.
[19] W. Yu. A string-wise crdt for group editing. In
Proceedings of the 17th ACM international conference
on Supporting group work, GROUP ’12, pages
141–144, New York, NY, USA, 2012. ACM.