<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Operation-based Merging of Hierarchical Documents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Claudia-Lavinia Ignat</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Moira C. Norrie</string-name>
          <email>norrieg@inf.ethz.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Information Systems</institution>
          ,
          <addr-line>ETH Zurich</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Version control systems allow a group of people to work together on a set of documents over a network by merging their changes into the same source repository. The existing version control systems o®er limited support concerning con°ict resolution and tracking of user activity. In this paper we propose a customisable operational transformation merging approach for hierarchical documents that o®ers the possibility to specify and resolve the con°icts at di®erent granularity levels. Our proposed approach also achieves better e±ciency compared to existing approaches for merging documents with linear structures.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Asynchronous collaborative editing systems have been developed to support a
group of people editing a document collaboratively over a network by
allowing the users to modify in isolation the copies of the document and afterwards
synchronise their copies in order to reestablish a common view of the data.</p>
      <p>
        Well known versioning systems such as CVS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], RCS [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and Subversion [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
o®er limited support concerning con°ict resolution and tracking of user activity.
In these systems there is no °exible way of specifying the con°icts, the basic
con°ict unit being the line. The changes performed by two users are in con°ict
if they refer to the same line, meaning that multiple changes within a single
line cannot be handled by these systems. The merging approaches adopted by
these systems are state-based, i.e. the merging process uses only the states of the
documents. In this way con°icts are presented in the line order in which they
appear in the ¯nal document. On the other side, operation-based merging [
        <xref ref-type="bibr" rid="ref5 ref8">5,
8</xref>
        ] keeps the information about the evolution of one state of the document into
another. Therefore, it provides the possibility of tracking the user operations
and of presenting the con°icts in the context in which they were generated.
      </p>
      <p>
        The FORCE [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] operation based merging approach uses a °exible way of
de¯ning con°icts, but it assumes a linear representation of the document, the
operations do not taking into account the structure of the document.
      </p>
      <p>In this paper we propose a customisable operation-based merging algorithm
that works on a hierarchical representation of documents, allowing the
possibility of de¯ning and resolving the con°icts by using di®erent semantic units
corresponding to the document levels. An important advantage of our algorithm
based on the tree representation is its improved e±ciency compared to other
merging approaches that use a linear representation of the documents. Our
merging algorithm recursively applies over the di®erent document levels any existing
merging algorithm relying on the linear structure of the document.</p>
      <p>The paper is structured as follows. In the next section we present an
existing linear based merge algorithm that has been used by our tree-based merging
algorithm recursively over the document levels. We describe our tree-based
approach in Section 3. In Section 4 we compare our approach with some related
work. Concluding remarks are presented in the last section.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Operational Transformation Linear-based Merging</title>
      <p>In this section we are going to brie°y introduce the operational
transformation mechanism and the way this mechanism has been applied for linear-based
merging approaches.</p>
      <p>The basic operations supplied by a con¯guration management tool are
checkout, commit and update. A checkout operation creates a local working copy of
the document from the repository. A commit operation creates in the repository
a new version of the document based on the local copy of the document, given
that the repository does not contain a more recent version of the document to
be committed than the local copy. An update operation performs the merging of
the local copy of the document with the last version of that document stored in
the repository.</p>
      <p>
        In order to explain the basic operational transformation approach [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] let us
consider the following scenario. Suppose the repository contains the document
consisting of one sentence \He saw the movie." and two users check-out this
version of the document and perform some operations in their workspaces.
Further, suppose U ser1 performs the operation O11 = InsertW ord (\yesterday",5),
i.e. intending to insert the word \yesterday" as the 5th word, in order to
obtain \He saw the movie yesterday." Afterwards, U ser1 commits the changes
to the repository and the repository stores the list of operations performed by
U ser1 consisting of O11. Concurrently, U ser2 executes operation O21 =
InsertWord(\actually",2) in order to obtain \He actually saw the movie." Before
performing a commit, U ser2 needs to update the local copy of the document.
The operation O11 has to be transformed to include the e®ect of the operation
O21 when it is executed in the workspace of U ser2. This operation
transformation is called inclusion transformation. Because the operation O21 inserted a
word before the word to be inserted by operation O11, operation O21 will become
an insert operation of the word \yesterday" as the 6th word, the result being
\He actually saw the movie yesterday." The inverse operation of inclusion
transformation is exclusion transformation which transforms an operation Oa
against the operation Ob that precedes Oa such that the e®ect of Ob is excluded
from Oa.
      </p>
      <p>
        In what follows we are going to brie°y describe the merging algorithms
applied for the linear representation of documents as implemented in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>In the commit phase, the repository simply executes sequentially the
operations performed in the local workspace. In the checkout phase the local workspace
is emptied and all the operations from the repository representing the delta
between the version of the document the user wants to work on and the initial
version of the document are executed into the local workspace of the user.</p>
      <p>In the updating phase, the merging algorithm has to be performed between
the list of operations executed in the local workspace LL and the list of
operations DL representing the delta between the most recent version from the
repository and the version the local user started working on. Two basic steps
have to be performed. The ¯rst step consists of transforming the remote
operations from DL in order to include the e®ect of the local operations. The second
step consists of transforming the operations in LL in order to include the e®ects
of the operations in DL, the list of the transformed local operations representing
the new delta into the repository. In the case that operation Oi belonging to DL
is in con°ict with an operation from LL, Oi cannot be executed in the local
workspace and it needs to be included into the delta as its inverse in order to
cancel the e®ect of Oi. Moreover, all operations following it in the list DL need
to exclude its e®ect.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Merging of Hierarchical Documents</title>
      <p>In this section we are going to present a generalisation of the previously described
linear-based merging algorithm, that works for a hierarchical structure of the
document.</p>
      <p>We model the text document as consisting of the following levels of
granularity: document, paragraph, sentence, word and character, document being the
highest granularity level and character being the lowest granularity level.</p>
      <p>Each workspace stores locally a copy of the hierarchical structure of the
document. Each node (excluding leaf nodes) will keep a history of insertion
or deletion operations associated with its children nodes. The structure of the
document is illustrated in Figure 1.</p>
      <p>Levels Hiasttoprayrafgorraopphelreavtieolns</p>
      <p>Document Doc. Hist.</p>
      <p>Paragraph Pa 1 Pa1 Hist. Pa 2 Pa2 Hist.</p>
      <p>Pa 3 Pa3 Hist.</p>
      <p>History for operations on
sentences in paragraph Pa3
Document
Sentence
Word
Character
…</p>
      <p>Se 3.1 Se3.1 Hist.</p>
      <p>Se 3.2 Se3.2 Hist.</p>
      <p>W 3.1.1 W3.1.1 Hist. W 3.1.2 W3.1.2 Hist.
books, the hierarchical structure consisting of chapters, sections, paragraphs,
sentences, words and characters or on XML documents.</p>
      <p>The commit and checkout phase follow the same principles as described for
the linear representation of the documents, with the addition that, in the commit
phase, the hierarchical representation of the history of the document is linearised
by using a breadth-¯rst traversal of the tree. In this way, the ¯rst operations
in the log will be the ones belonging to the paragraph logs, followed by the
operations belonging to the sentence logs and ¯nally the operations belonging
to the word logs.</p>
      <p>The update procedure achieves the actual update of the local version of the
hierarchical document with the changes that have been committed by other users
to the repository and kept in a linear order into the remote log. The merging
procedure has as objective the computing of a new delta to be saved in the
repository, i.e. the replacement of the local log associated with each node with
a new one which includes the e®ects of all non con°icting operations from the
remote log and the execution of a modi¯ed version of the remote log on the local
version of the document in order to update it to the version on the repository.</p>
      <p>The update procedure is repeatedly applied for each level of the document
starting from the document level. The ¯rst step of the merging algorithm
consists in the selection of the remote level log containing those operations from the
remote log that have the level identical with the level of the operations from
the history bu®er of the current node. The operations belonging to the remote
level log are eliminated from the remote log. Since some nodes of the tree might
get deleted or inserted during the update process applied for the previous upper
levels of the document, the operations in the local log of the current node need
to adapt their positions to correspond to the current position in the tree.
Afterwards, the basic merging procedure for linear structures is applied to merge
the local log of the current node with the remote level log. As a result of the
merging procedure, the new remote log representing the operations that need to
be applied on the local document and the new local log representing the
operations to be saved on the repository are computed. After the new remote log is
applied locally, the operations from the remote log are transformed against the
operations in the local log and are divided among the children of the current
node. Afterwards, the merging procedure is recursively called for each child.</p>
      <p>
        As in the case of the real-time collaborative text editing [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], due to the
hierarchical structure of the document, only few transformations need to be performed
when an operation is integrated into the log as described above, because the
operations in the log are distributed throughout the tree model of the document.
Only those histories that are distributed along a certain path in the tree are
spanned and not the whole log as in the case of a linear model of the document.
Moreover, con°icts can be easily expressed using the semantic units (paragraphs,
sentences, words, characters), such as rules interdicting the concurrent insertion
of two di®erent words into the same sentence. We allow di®erent policies for
con°ict resolution, such as automatic resolution where the local changes are kept
in the case of a con°ict or manual resolution, where the user can choose the
modi¯cations to be kept.
      </p>
      <p>In what follows we will illustrate the asynchronous communication by means
of an example. Consider that the repository contains as version V0 the
document consisting of only one paragraph with one sentence: \Absence increase great
loves." Further suppose a con°ict is de¯ned between two operations concurrently
inserting a character in the same position of the word and the policy of merging
is that, in the case of con°ict, local modi¯cations are kept automatically.
Further, assume two users check out version V0 from the repository into their private
workspaces. The ¯rst user performs the operations O11=InsertChar(\d",1,1,2,9)
of inserting the character \d" in the ¯rst paragraph, ¯rst sentence, second word
as the last character and O12 = InsertW ord(\the"; 1; 1; 3) in order to obtain
the version \Absence increased the great loves." The second user performs the
operation O21, O21 =InsertSentence(\And diminishes small ones.",1,2) and
O22 =InsertChar(\s",1,1,2,9) in order to obtain \Absence increases great loves.
And diminishes small ones." Suppose that both users try to commit, but U ser1
gets access to the repository ¯rst, while U ser2's request will be queued.
After the commit operation of U ser1, the last version in the repository will be
V1 =\Absence increased the great loves." The di®erence between V1 and V0 in
the repository DL10 = [O12; O11] is obtained as a result of the linearisation of
the history bu®er distributed throughout the tree.</p>
      <p>When U ser2's request is processed, U ser2 has to update the local copy, and
therefore the update procedure is applied. First the document level history of
the local document is analysed. Because no remote operations of paragraph
level have to be merged, the update is then applied for the paragraph level by
analysing the history of paragraph 1. There are no remote operations of sentence
level, so the processing is applied for the sentence level. There are no operations
referring to sentence 2, so we will analyse the merging for sentence 1. Operation
O12 is of word level, and because there are no local operations of word level,
O12 will keep its original form. The update procedure will be recursively applied
for each of the words belonging to sentence 1. We will analyse only the update
applied for the second word of sentence 1, since the remote logs corresponding
to the other words in the sentence are empty. The merge procedure will be
applied between the list of operations consisting of O11 and the list consisting of
O22. O11 and O22 are con°icting and according to the assumed policy the local
operation will be kept. As result of this merging, the newLocalLog, i.e. the list of
operations to be transmitted to the repository, will be [inv(O11); O22] and the
newRemoteLog, i.e. the list of operations to be applied on the local copy of the
document will be empty. Therefore, the new local version of the document in the
workspace of U ser2 will be \Absence increases the great loves. And diminishes
small ones." This will be also the new version V2 of the document into the
repository after U ser2 commits. D21 will become D21 = [O21; inv(O11); O22].</p>
      <p>In order to highlight the fact that operations of higher level granularity do
not need to be transformed against the operations of lower level granularity,
consider the case that U ser1 updates his local version of the document with
version V2. Operation O21 of sentence level does not need to be transformed
against any of the local operations in the workspace of U ser1.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Related Work</title>
      <p>
        Due to the hierarchical structure of the document, in our approach the con°icts
can be de¯ned at di®erent semantic levels, as opposed to the RCS [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], CVS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
and Subversion [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] systems and to the FORCE [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] approach.
      </p>
      <p>
        Other research works looked at the tree representation of documents for
asynchronous collaborative editing, such as working with XML documents. But
the merging approach described in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is state-based and the one proposed in
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] uses a linear log rather than a distributed log.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], an operational transformation approach has been used for
synchronising ¯le systems and ¯le contents. The ¯le systems have a hierarchical structure,
however, for the merging of the text documents the authors proposed using a
¯xed working unit, i.e. the block unit consisting of several lines of text.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper we proposed a tree based approach for maintaining the consistency
in the case of the asynchronous collaborative text editing that o®ers also the
possibility of de¯ning and resolving the con°icts at di®erent granularity levels
corresponding to the document levels.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Berliner</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>CVS II: Parallelizing software development</article-title>
          .
          <source>Proceedings of USENIX</source>
          , Washington D.C.,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Collins-Sussman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fitzpatrick</surname>
            ,
            <given-names>B.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pilato</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          :
          <article-title>Version Control with Subversion</article-title>
          .
          <source>O'Reilly</source>
          ,
          <year>2004</year>
          , ISBN:
          <fpage>0</fpage>
          -
          <lpage>596</lpage>
          -00448-6
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ellis</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gibbs</surname>
            ,
            <given-names>S.J.:</given-names>
          </string-name>
          <article-title>Concurrency control in groupware systems</article-title>
          .
          <source>Proc. of the ACM SIGMOD Conf. on Management of Data, May</source>
          <year>1989</year>
          , pp.
          <fpage>399</fpage>
          -
          <lpage>407</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ignat</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Norrie</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          :
          <source>Customizable Collaborative Editor Relying on treeOPT Algorithm, Proc. of ECSCW</source>
          , Helsinki, Finland, Sept.
          <year>2003</year>
          , pp.
          <fpage>315</fpage>
          -
          <lpage>334</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lippe</surname>
            , E., van Oosterom,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>Operation-based merging</article-title>
          .
          <source>Proc. of SIGSOFT Symposium on Software development environments</source>
          ,
          <year>1992</year>
          , pp.
          <fpage>78</fpage>
          -
          <lpage>87</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Molli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oster</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skaf-Molli</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Imine</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Using the transformational approach to build a safe and generic data synchronizer</article-title>
          .
          <source>Proc. of Group</source>
          , Nov.
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Molli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skaf-Molli</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oster</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Jourdain</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Sams: Synchronous, asynchronous, multi-synchronous environments</article-title>
          .
          <source>Proc. of CSCWD</source>
          , Rio de Janeiro, Brazil, Sept. 2002
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Flexible merging for asynchronous collaborative systems</article-title>
          .
          <source>Proc. of CoopIS/DOA/ODBASE</source>
          <year>2002</year>
          , pp.
          <fpage>304</fpage>
          -
          <lpage>321</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Tichy</surname>
            ,
            <given-names>W.F.</given-names>
          </string-name>
          :
          <article-title>RCS- A system for version control</article-title>
          .
          <source>Software - Practice and Experience</source>
          ,
          <volume>15</volume>
          (
          <issue>7</issue>
          ), Jul.
          <year>1985</year>
          , pp.
          <fpage>637</fpage>
          -
          <lpage>654</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Torii</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kimura</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Segawa</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The consistency control system of XML documents</article-title>
          .
          <source>Symposium on Applications and the Internet, Jan</source>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>