Merging Uncertain Multi-Version XML Documents

                    M. Lamine Ba                             Talel Abdessalem                        Pierre Senellart
               Institut Mines–Télécom;                     Institut Mines–Télécom;               Institut Mines–Télécom;
               Télécom ParisTech; LTCI                     Télécom ParisTech; LTCI               Télécom ParisTech; LTCI
                     Paris, France                               Paris, France                         Paris, France
               mouhamadou.ba@                              talel.abdessalem@                        pierre.senellart@
               telecom-paristech.fr                        telecom-paristech.fr                   telecom-paristech.fr

ABSTRACT                                                                  they do not integrate this important property (the fact that the qual-
Merging is a fundamental operation in revision control systems that       ity and trust in every single update operation varies) in their model.
enables integrating different changes made to the same documents.         To tackle this lack, we propose in [2] an XML version control sys-
In open platforms, such as Wikipedia, uncertainty is ubiquitous,          tem tailored for uncertain tree-structured multi-version documents.
essentially due to a lack of knowledge about the reliability of con-      The targeted applications mostly handle tree-structured data, or for-
tributors. We propose in [2] a version control framework designed         mats akin to it like XML documents: HTML or XHTML docu-
for uncertain multi-version tree-structured documents, based on a         ments, office documents and structured Wiki formats. Our uncer-
probabilistic XML model. In this paper, we define a merge opera-          tain multi-version XML model is based on the general framework
tion that complements our framework and enables the conciliation          for updating probabilistic XML data proposed in [8]. This frame-
of uncertain versions. We devise an efficient algorithm that imple-       work specifies uncertainty modeling and assessing in a typical ver-
ments the merge operation and prove its correction.                       sion control process, ensuring the efficiency of updates.
                                                                          Problem statement. The current paper extends our uncertain
Categories and Subject Descriptors                                        version control model with merge capabilities. Merging is a fun-
                                                                          damental operation in version control systems. It allows integrat-
H.2.1 [Database Management]: Logical Design—Data models;                  ing different changes (revisions) made to the same document. This
I.7.1 [Document and Text Processing]: Document and Text Edit-             operation is particularly helpful for software configuration man-
ing—Version control                                                       agement, where the configurations and their components can be
                                                                          built based on a combination of different versions of different ele-
Keywords                                                                  mentary parts of the software (see [5]). In web-scale collaborative
XML, collaborative work, uncertain version control, merge                 platforms, the merging operation as known in traditional version
                                                                          control systems is not yet supported. Its use in this kind of environ-
                                                                          ment (large-scale, open and collaborative) is of interest, as soon as
1.    INTRODUCTION                                                        concurrent editing and alternative revisions are allowed. We detail
Uncertain version control. Version control of uncertain data              motivations next.
has concrete applicability in open environments such as web-scale         Motivations. Amongst the motivations for this work, we can
editing platforms. Most of these platforms, in particular Wikipedia 1 ,   cite the following ones. On one hand, users may trust only some
are facing (1) the rapid growth of the number of contributors with        contributors and want to see the document resulting from the con-
different level of reliability, and (2) the will to provide the users     ciliation of their contributions, i.e., the merge of the revisions pro-
with the most trustworthy content. This latter purpose is especially      duced by these contributors. If the user preferences are known (e.g.,
challenging because of uncertainties in data inherent to unreliable       based on her personal settings or past behaviour), a recommender
contributors, recurrent conflicts (contradictions or edit wars) and       system can be built on Wikipedia in order to propose to the user a
frequent malicious contributions (e.g., spam). Besides that, trust is     version resulting from the merge of the contributions of her trusted
a subjective notion which sorely depends on the user preferences.         authors. On the other hand, open platforms such as Wikipedia
   So far, within web-scale collaborative platforms like Wikipedia,       requires as a core functionality the merge of articles which over-
version control allows maintaining the integrity of each document         lap (articles related to the same topic or sharing a large common
by tracking all contributions, as well as their history and their au-     part). This operation is currently done manually and requires a lot
thors. This gives therefore the ability to revert to a given revision     of time and coordination between contributors. This results in a
when some editing problems such as vandalism acts and unsourced           tedious and error-prone supervised merge process. Providing an
information appear. Used version control approaches are however           automated integration processes of these articles is certainly use-
not necessarily intended to the versioning of uncertain data, and         ful. To this goal, a merging operation is needed and it has to take
1 http://www.wikipedia.org/
                                                                          into account the uncertainty associated to the merged data. In this
                                                                          paper, we present our uncertain version control model with a merg-
                                                                          ing operation that covers common deterministic merge scenarios
                                                                          over XML documents while managing uncertain data. We devise
This work is licensed under the Creative Commons Attribution-             an efficient algorithm for merging uncertain multi-version XML
ShareAlike 3.0 Unported License (CC BY-SA 3.0). To view a copy            documents and prove its correctness.
of the license, visit http://creativecommons.org/licenses/by-sa/3.0/.     Outline. First, we present in Section 2 the merge and edit detec-
DChanges 2013, September 10th, 2013, Florence, Italy.                     tion techniques used for XML documents. Then, we summarize in
ceur-ws.org Volume 1008, http://ceur-ws.org/Vol-1008/paper1.pdf .         Section 3 our uncertain muti-version model. Section 4 concretely
presents our merging operation, as well as a corresponding efficient     sists of the same steps as in deterministic settings. The main dis-
algorithm. Finally, we conclude the paper in Section 5.                  tinction with [11] is that its merge outcome is an XML document
                                                                         where nodes come with some elements modeling their amount of
                                                                         uncertainty (the synchronizer [11] is based on Dempster–Shafer
2.    XML MERGE ALGORITHMS                                               theory to deal with uncertainty, in the form of probability values,
    The increasing use of XML-based systems, in particular those         degrees of beliefs, or necessity measures, associated to data) that
with a built-in version control engine, has lead to the adoption of      does not retain enough information for retrieving back individual
new XML merge techniques, e.g., [6, 10, 13]. These algorithms,           versions merged. [1] is most closely related in spirit to the current
aware of the tree-like structure of XML documents, have arisen as        paper since both rely on the same general framework for manag-
a reliable alternative to classical methods within XML settings. In-     ing uncertain XML in a typical versioning process; merging is not
deed, traditional methods for merging text or binary files, cannot       formally considered in [1].
detect meaningfully the semantics of changes over trees. Most of
current XML merge algorithms share as a baseline the diff step (edit
detection) always preceding the generation of the merged docu-           3. UNCERTAIN MULTI-VERSION XML
ment. Some main differences can be stated as follows: (i) two-way           A multi-version (XML) document with uncertain data evolves,
versus three-way approaches, that is, the use or not of the common       through uncertain updates, and leads to uncertain versions. It is rep-
base document from which merged ones are derived; (ii) the set of        resented by the means of an uncertain multi-version XML document
handled edit operations; (iii) the compliance to ordered XML ele-        model, that describes the version space of this document together
ments or unordered ones and; (iv) the conflict management strategy.      with a probability distribution over the set of possible versions.
In the following, we briefly survey a few of these algorithms for           The following is a concise summary of the formal foundations of
merging XML documents. We refer to [3, 12] for a more exhaus-            our model and the evaluation of updates over it. For more details,
tive overview about deterministic XML merge and edit detection           see [2].
techniques.
   Merging in [13] and [10] has tackled ordered XML trees, more          Model: Formal Definition. A multi-version XML document
suitable in some human-edited contexts such as structured reports,       Tmv with uncertainty is a couple (G , Ω) where G is a directed
rich text formats, etc. In [13], the motivation was the synchro-         acyclic graph (DAG) over a set V ∪ {e0 } of events e0 . . . en rep-
nization of versions of documents edited by different users. The         resenting the version space of Tmv , and Ω is a function giving the
author has explored a structural two-way merge via a polynomial-         possible versions of the document, as we now detail.
time algorithm which directly computes two isomorphic trees rep-            An event ei in V has a random nature and happens with a cer-
resenting the merge output from the two input XML documents.             tain probability. It is defined as a conjunction of random Boolean
The trees are progressively built in a bottom-up fashion with nodes      variables b1 . . . bm that each model a given source of uncertainty
(having unique identifiers) from the two documents, while ensuring       (e.g., the source of information). This definition of the events using
their isomorphism during this construction by applying a series of       Boolean variables lies on the following: (i) variables are pairwise
node insertion, deletion and update when a difference is detected.       independent, that is, Pr(b j ∧ bk ) = Pr(b j ) × Pr(bk ) for all b j 6= bk ;
As a result, the process of generating isomorphic trees, thereby the     (ii) a variable b j , correlating different events, can be reused across
merge result, slightly involves a detection of the differences be-       events; (iii) one revision variable b(i) , representing more specifi-
tween merged XML documents. Therefore, there is an implicit pro-         cally the uncertainty in the content, is not shared across other events
cessing of edit changes. However, no details are given by the author     and only occurs in ei . Our version control system is state-based
about the processing of conflicts. As for [10], the focus was on the     with events modeling the different uncertain states of the evolution
reintegration 2 of changes to a document in cases where multiple         of the versioned document. A state, i.e., an event, has contextual
independently modified copies of the document have been made.            information about a given version (in the form of, first, Boolean
The paper has proposed a three-way XML merging algorithm with            random variables involved; second, an edit script ∆i ; third, possible
clear merge rules (e.g., node sameness, node context) and a cate-        other metadata).
gorization of conflicts based on real-world use cases. In contrast          The DAG G = (V ∪ {e0 }, E ) keeps the history of the evolution
to [13], the algorithm of Lindholm [10] uses a trees matching pro-       of Tmv with: (i) the particular event e0 6∈ V , which represents the
cess detecting move operations in addition to insertions, deletions      initial state of Tmv , as the root of G ; (ii) E ⊆ V 2 defines the set
and updates of nodes. In its merge step, core and optional con-          of directed edges of G that enable to implicitly track derivation
flicts are defined: a core conflict (e.g., update/update of a node)      relationships between the generated (uncertain) versions. A branch
will cause a failure of the merge, whereas an optional conflict (e.g.,   of G is a directed path in which the tail e j is reachable from the
delete/update of a sub-tree) may be tolerated. The system does           head ei by traversing a set of ordered edges in E . A rooted branch
not pretend to resolve all conflicts, but it always reports unresolved   is a branch with e0 as head node.
scenarios. La Fontaine, in [6], has focused more on the best XML            A version of Tmv is an XML document mapping to a set of events
matching strategy regarding node insertions and deletions. An in-        in G , the events whose edit scripts together made this version hap-
termediate (optimal) XML diff file encoding the matches is used          pen. Such an event set is always a rooted branch in G in a de-
to ease the merge process with the help of an XML transformation         terministic versioning case, whereas it can be arbitrary in the un-
language such as XSLT dialect. This algorithm was designed to            certain setting. In the model, formally an XML document is an
run both in a two-way setting and a three-way one regardless of the      unordered3 , unranked, and labeled tree T in which a node x has a
considered XML document model. Note that the aforementioned              unique identifier α (x) in I and a label φ (x) in L with I ∩ L = 0/
XML merge algorithms are all deterministic.                              (for brevity, we do not mention node identifiers when depicting ex-
   In contrast, two-way merging operations in [11] and [1] are in-       ample trees). In addition, all trees considered share the same root
tended for uncertain XML documents. The followed process con-            node (same label, same identifier). Given the set 2V of all sub-
                                                                         sets of V and the infinite set D of all XML trees, the mapping
2 Merging changes that led to two distinct documents and apply the
merge result into a third document.                                      3 We leave the extension to ordered trees open as in [2].
Ω : 2V → D associates sets of events to versions of Tmv in such a         evaluates operations in ∆ over the p-document P       c as follows. This
way that (a) Ω(0) / corresponds to the root-only XML tree of D and;       is the usual implementation of updates in probabilistic XML [8]
(b) for all i, for all F ⊆ 2V \{ei } , Ω({ei } ∪ F ) = [Ω(F )]∆i where    and we show that this is compatible with the semantics of multi-
∆i is the script attached to the event ei and [Ω(F )]∆i its evalua-       version encodings in [2].
tion over the document Ω(F ). A mapping Ω implicitly defines a                • For an insertion u = ins(i,x) in ∆: fie(x) of x in P    c is set to
probability distribution over the set of versions, as detailed in [2].           fie(x) ∨ (e′ ) if x already occurs in P;c otherwise, u inserts x
   We have just defined an abstract multi-version XML document                   in Pc with fie(x) = e′ .
– we now provide a general and concise syntax for it, that has for                                                                     c such that
semantics such a multi-version document.                                      • For a deletion u = del(i) in ∆: the node x in P
                                                                                 α (x) = i (if it exists) has its formula fie(x) updated to fie(x) ∧
Probabilistic Encoding: Syntax and Semantics. We have                            (¬e′ ).
                           cmv for an uncertain multi-version XML
introduced in [2] a syntax T
document, based on probabilistic XML [9]. An uncertain multi-                E XAMPLE 3.1. Figure 1 shows an uncertain multi-version doc-
                          cmv is defined by a pair (G , P)   c where      ument Tmv with: (a) the version space G with four staged events;
version XML encoding T
                                                                          (b) four event sets and associated versions; (c) the p-document
                                                 c
(a) G is as before a DAG of events and; (b) P is a PrXMLfie p-
                                                                          Pc encoding all the possible versions based on staged events and
document with random variables b1 . . . bm representing efficiently
                                                                          their attached edit scripts, resulting in formulas attached to nodes
all possible versions and their corresponding event sets. Formally,
                                                                          (shown above each node). As a sketch, (e1 ∧ ¬e2 ) reveals that s1
the PrXMLfie p-document P    c is an unordered, unranked, and la-
                                                                          was added at e1 and then removed at e2 . The given four versions
beled tree where every node (except the root) x may be annotated          are exactly those modeled by deterministic systems. In contrast,
with an arbitrary propositional formula fie(x) over b1 . . . bm . Dif-    the possible world mapping to {e1 , e4 } is only valid within our
ferent nodes in the p-document can be correlated by the use of com-       framework. It occurs with the reject of the changes introduced by
mon variables. A valuation ν of the variables b1 . . . bm is a Boolean    event e2 . Note that the probability of each possible version can be
function that sets some variables to true and the remaining to false.     evaluated based on event sets that map to it and their probabilities.
This valuation ν produces over P   c the particular XML document
    c
ν (P), also known as a possible world, in which only nodes an-            4. MERGE APPROACH
notated with formulas valuated at true by ν are kept (nodes whose
                                                                             We detail in this section the translation of the usual XML merge
formulas are valuated to false by ν are deleted from the tree, along
                                                                  c is    operation within our uncertain versioning model.
with their descendants). The probability of this document ν (P)              A merge operation considers a set of versions and integrates their
given by the sum of the probability of the valuations that yield the      content in a single new one. We view this outcome as obtained via
document. For a more detailed picture of the PrXMLfie representa-         a three-way merge 4 , that is, an integration of the changes from the
tion system, see [8, 9].                                                  inputs with respect to their common base version. We focus here
   The semantics of an encoding T  cmv , denoted JT cmv K, is an uncer-   on merging two versions which is the most common case in real
tain multi-version XML document (G , Ω). The DAG G does not               applications. However, an extension to n > 2 versions is straight-
change, whereas Ω is such that Ω(F ) := νF (P)    c for all F ⊆ V ,       forward. In addition, we also assume that all the merged versions
where νF is a valuation over variables b1 . . . bm defined as follows.    are only originated from updates over the base versions, i.e., we do
Let B+F be the set of all random variables occurring in one of the        not consider merging of versions with a different merge history –
events of F and set B−                                       (i)
                       F the set of all revision variables b ’s for ei
                                                                          this is again for the sake of clarity of the exposition.
not in F . Then νF sets variables of BF to true, variables of B−
                                          +                                  The merge process usually implies two steps: a) an extraction of
                                                                     F
to false, and other variables to an arbitrary value. This semantics       the different sets of edit scripts that have led to the input versions
                                                              c are ex-   and; b) a generation of the merge result by evaluating a unified
remains non-ambiguous as long as formulas occurring in P
                                                                          set of the extracted edit scripts over the initial data. This last step
pressed as formulas over the events of V , i.e., do not make use of
                                                                          must deal with possible conflicting edits (for the definition of con-
the Boolean variables separately of the events.
                                                                          flicts, see next) due to concurrent changes (i.e., when two editors
Updates: Semantics and Evaluation. An edit script ∆ is a                  independently changes the same piece of data). The resolution of
set of edit operation over XML nodes. An edit operation is either an      conflicts may yield several different content items for the merge.
insertion or a deletion of nodes. An insertion is formally defined        As a result, each possible merge outcome is obtained by making a
as ins(i,x) with i the identifier of the node where the insertion         choice between several possible edits. This naturally fits in the sys-
must take place and x the label of the new node to be added. As           tem with uncertainty handling because in such a setting there is no
for a deletion, it is introduced as del(i) where i represents the         longer only one truth but several different possibilities, each with a
identifier of the node to remove. The evaluation of ∆ over any XML        certain probability of validity.
document T produces the document [T ]∆ by applying insertions                We first present the process of computing the edit scripts to use
and deletions to T ; if no node is selected by a given insertion or       for the merge, as well as common merge scenarios. Then we intro-
deletion, it is simply ignored.                                           duce the semantics of merging uncertain multi-version XML doc-
   An update operation is set up in the uncertain multi-version XML       uments, as well as an efficient algorithm on the probabilistic XML
framework as updOP∆, e, e′ where ∆ is an edit script, e is an actual      encoding.
event pointing to the edited version and e′ is a fresh one assess-
ing the uncertainty in this update. Its semantics on Tmv = (G , Ω)        4.1 Detection of Edits and Merge Scenarios
(i) updates G to (G ∪ ({e′ }, {(e, e′ )}) and; (ii) extends Ω to Ω′          Assume an unordered XML document under version control.
by letting for all F ⊆ V ∪ {e′ }: Ω′ (F ) := Ω(F ) if e′ 6∈ F and         Let us consider two arbitrary versions T1 and T2 , along with their
Ω′ (F ) := [Ω(F \{e′ })]∆ otherwise. The translation of this seman-       common lowest ancestor Ta , of this.
tics on the general syntax Tcmv = (G , P)
                                        c is done through an update       4 A three-way merge enables a better matching of nodes and detec-
algorithm updPrXML that first modifies G as before, and then it           tion of conflicts.
   G)                          T1 ) r            T2 ) r         T3 )           r               T4 ) r                   c
                                                                                                                        P)
        e0                                                                                                                                    r
                                                                                                                             e1 ∧ ¬e2             e2
                                    s1               s2                       s1                    s2
        e1                                                                                                                         s1             s2
                                                                                                                                        e3
                                    p1                                 p1          p1               p2                                                  e4
   e2         e3
                                                                                                                              p1        p′1       p2
                                    t1                                 t1          t′1              t2
   e4
                                   {e1 }           {e1 , e2 }               {e1 , e3 }          {e1 , e2 , e4 }               t1        t′1        t2

   (a) Version space                      (b) Four event sets and corresponding versions                              (c) PrXMLfie p-document
                                         Figure 1: Encoding of an uncertain multi-version XML document

4.1.1 Computation of Edit Scripts                                                       A large majority of current versioning models provide three
    We do not assume here given any explicit edit script. Instead                   common merge scenarios that consider the resolution of possible
of this, we include edit detection as an integral part of the merge                 conflicts. Recall that in most cases, this resolution is manual, that
process for the sake of generality. We define the edit script speci-                is, it requires user involvement. Let Tm be the outcome of the
fying the merge of versions T1 and T2 through the three-way diff                    merge of T1 and T2 . We formalize the possible merge scenarios as
algorithm diff3(T1 , T2 , Ta ) on unordered trees with unique identi-               follows.
fiers for nodes. The algorithm will return a script with only node                     1. First, one would like to perform the merge based on T1 and
inserts and deletes as edit operations. Like in [7], we set up our                  by updating this with the non-conflicted edits from ∆2 . For this
                                                                                    case, we have Tm = [T1 ]∆2 −∆ .
                                                                                                                    C
diff3 based on the two-way diffs diff2(Ta , T1 ) and diff2(Ta , T2 )
as subroutines. These two-way functions separately compute two                         2. The second scenario is symmetric to the first one: it considers
intermediate edit scripts using the same process.                                   as a base version T2 and fetches the non-conflicted edits from ∆1 .
                                                                                    For this case, we set Tm = [T2 ]∆1 −∆ .
                                                                                                                           C
   • diff2(Ta , T1 ) initially matches the nodes in trees Ta and T1 in
order to find out the shared, deleted, and inserted nodes. Then, the                   3. Finally, the last case maps to the update of the common ver-
algorithm encodes the matches in terms of a set of node insertions                  sion Ta with the non-conflicted edits in ∆3 , that is, one would like
and deletions which evaluated on Ta give T1 . A node x ∈ Ta with                    to reject all the conflicting edits in the merge outcome. For this
                                                                                    case, we set Tm = [Ta ]∆3 −∆ .
                                                                                                                  C
no match in T1 is deleted, whereas a node y ∈ T1 with no match in
Ta is added. Let us denote this edit script by ∆1 .                                 It is straightforward to show that when ∆C = 0,    / then we obtain
   • diff2(Ta , T2 ) follows the same process and provides the script               the same content for the three merge scenarios. This observation is
∆2 leading to T2 from Ta .                                                          inherent to the computation of the edit scripts and the definition of
   A more global edit script, referred as ∆3 , models the final value               the merge outcome in each scenario. Observe that we do not deal
of the diff3; ∆3 is obtained by mixing ∆1 and ∆2 . We describe this                 with the (intuitive and naive) merge case where the user corrects
combination with three types of edits as follows.                                   the conflicting parts with new inputs. However, this case can be
Equivalent edits. An equivalence occurs between all edits in                        simply treated by first choosing one the three outcome above and
∆1 and ∆2 with the same semantics and the same arguments (same                      then by performing updates over this.
identifiers and same labels). Specifically, two insertions u2 ∈ ∆1
and u4 ∈ ∆2 are equivalent if they specify the same node identifier
and the same label to be added. As for deletions in ∆1 and ∆2 , there
                                                                                    4.2 Merging Uncertain Multi-Version XML
is an equivalence between two if these target the same node. Given                     We now introduce our abstraction of the merge operation (cov-
two equivalent edits, only one of the two operations is kept in ∆3 .                ering at least the set up of the merge scenarios above) within the
                                                                                    uncertain multi-version XML document model.
Conflicting edits. Any two given operations u2 ∈ ∆1 , u4 ∈ ∆2                          For sure, an uncertain context induces an inherent uncertain merge;
are conflicting edits when they come with different semantics, i.e,                 involved versions and diffs come with uncertainties. Let Tmv =
if u2 is an insertion, then u4 is a deletion (and conversely), and the              (G , Ω) be an uncertain multi-version XML document with n staged
insertion has added some new nodes as descendants of the node                                                                          cmv = (G , P)
                                                                                                                                                   c as
                                                                                    version control events. In addition, we consider T
that is removed with the delete operation. We introduce conflicted
                                                                                    the probabilistic XML encoding of Tmv . Recall again that each
edits in ∆3 to be those satisfying the properties given above. Given
                                                                                    version of Tmv is identified with a particular event in G , the one
that, we refer to the set of all conflicting edits in ∆3 with ∆C . We
                                                                                    representing the tail of the branch of G leading to this version. We
say that a node handled by conflicted edits is a conflicted node.
                                                                                    reason on events instead of full versions since these are here uncer-
Independent edits. Those edits in ∆1 and ∆2 that do not belong                      tain and can be defined in an arbitrary manner using events. This
to the two first classes. The set of equivalent and independent edits               section introduces the formalism of the merge operation over any
form the non-conflicted edits of a given diff algorithm. A node                     uncertain multi-version XML document and the mapping algorithm
impacted by a non-conflicted edit is a non-conflicted node for a                    over its probabilistic XML encoding.
given merge operation. (Note that conflicted and non-conflicted
nodes together form the set of all nodes impacted by edit scripts in
∆3 ).                                                                                4.2.1 Abstracting Uncertain Merge Operation
   Now, let us briefly present the merging scenarios (cf. usual merge                  With the help of the triple (e1 , e2 , e′ ), we refer in our setting with
options, especially mine-conflict and theirs-conflict, in tools like                uncertainty to a merge operation as MRGe1 ,e2 ,e′ where e1 and e2
SubVersion [4]) using diffs and input versions.                                     point to the two versions to be merged and e′ is a new event assess-
                                                                                    ing the amount of uncertainty in the merge operation. We evaluate
4.1.2 Deterministic Merge Scenarios                                                 the semantics of such a merge operation over Tmv with uncertainty
as follows.                                                                 notion of conflicted nodes in the PrXMLfie probabilistic encoding
                                                                            given the merge of events e1 and e2 .
    MRGe1 ,e2 ,e′ (Tmv ) := (G ∪ ({e′ }, {(e1 , e′ ), (e2 , e′ )}), Ω′ ).                                                      c is encoded with
                                                                               The history of edits over any specific node in P
   On the one hand, this evaluation inserts a new event and two             its attached formula. We base on this for detecting the conflicted
edges in the version space G . On the other hand, it generates a            nodes. Let us set the following valuations of events in G : (i) νs
new distribution Ω′ which represents an extension of Ω with new             setting the events in As to true and the revision variables of all
possible versions and event sets. Let Ae1 and Ae2 be the set of all         other events to false; (ii) ν1 assigning a true value to the events in
strict ancestor events in G of e1 and e2 respectively. We denote            Ae1 ∪ {e1 } and a false value to the revision variables of the other
the common set by As = Ae1 ∩ Ae2 . For all subset F ∈ 2V ∪{e } ,
                                                                  ′
                                                                            events and finally; (iii) ν2 setting the events in Ae2 ∪ {e2 } to true
formally we set:                                                            and all the revision variables in the remaining events to false.
    • if e′ 6∈ F : Ω′ (F ) := Ω(F );                                           We first introduce the lineage of an uncertain node in the PrXMLfie
    • if {e1 , e2 , e′ } ⊆ F : Ω′ (F ) := Ω(F \ {e′ });                     p-document.
    • if {e1 , e′ } ⊆ F and e2 6∈ F : Ω′ (F ) := [Ω((F \ {e′ }) \
                                                                               D EFINITION 4.1. (Node lineage) The lineage formula of a given
       (Ae2 \ As ))]∆2 −∆ ;
                            C

                                                                            node x ∈ P,c denoted by fie↑ (x), is the propositional formula re-
    • if {e2 , e } ⊆ F and e1 6∈ F : Ω′ (F ) := [Ω((F \ {e′ }) \
                ′
                                                                            sulting from the conjunction of the formula of this node x with the
       (Ae1 \ As ))]∆1 −∆ ;
                            C
                                                                                                                           c
                                                                            formulas attached to all its ancestor nodes in P.
    • if {e1 , e2 } ∩ F = 0/ and e′ ∈ F : Ω′ (F ) := [Ω((F \ {e′ }) \
       ((Ae1 \ As ) ∪ (Ae2 \ As )))]∆3 −∆ ;
                                           C
                                                                               Instead of its formula5 , the lineage of a given node in the p-
   We consider the aforementioned edit scripts as all obtained via          document encodes the entire history of edits, starting from the ini-
the diff3 process sketched in Section 4.1.1. For each involved case,        tial event, over the path leading to this node. Given that, we can
the diff3 is executed on the (uncertain) arbitrary versions T1 =            approach the conflicted nodes in the p-document using their lin-
Ω((F \ {e′ } ∩ Ae1 ) ∪ {e1 }) and T2 = Ω((F \ {e′ } ∩ Ae2 ) ∪ {e2 }),       eage formulas as follows.
and Ta = Ω(F \ {e′ } ∩ As ) where F is the subset of events in
V ∪ {e′ } considered as valid.                                                 D EFINITION 4.2. (Conflicted node) Under the general syntax
                                                                            Tcmv , we say that a given x in P  c is a conflicted node with respect to
    E XAMPLE 4.1. Figure 2 describes the process of merging two             the merge implying the events e1 and e2 when its lineage satisfies
possible versions, denoted by T1 and T2 , from Figure 1 given their         the following conditions:
common base Ta . In our proposal, this operation is simply encom-              1. fie↑ (x) |= νs ;
passed with the merge specified over events e3 and e4 which point              2. fie↑ (x) 6|= ν1 (or fie↑ (x) 6|= ν2 ) and;
to the two input versions. On the left-hand side of the example,               3. ∃y ∈ P, c desc(x, y): fie↑ (y) 6|= νs and fie↑ (y) |= ν2 (or fie↑ (y)
we provide the versions T1 , T2 and Ta together with edit scripts                  |= ν1 ) where desc(x, y) means that y is a descendant of the
{u2 , u4 } and {u3 } that led to them from the base Ta . Typically, we             node x.
view these scripts as given by diff functions outlined in Section 4.1.1
based on full versions. The right-hand side in Figure 2 explains the           P ROPOSITION 4.1. Definition 4.2 is consistent with the defini-
process of merging T1 and T2 (with the merge event e′ evaluating            tion of conflicted nodes given in Section 4.1.1.
the uncertainty in the merge) as follows: (i) First, all the edits in the
                                                                                P ROOF. (Sketch) Let x in P       c be a conflicted node such that
scripts above coming with no conflicts, i.e., here only u4 are vali-
dated for building the part of the merge (seen as an intermediate           1) fie↑ (x) |= νs ; 2) fie↑ (x) 6|= ν1 ; 3) fie↑ (y) 6|= νs and fie↑ (y) |= ν2
outcome) that is certain with the existence of e′ ; (ii) Then, gener-       with desc(x, y) true. The relation 1) yields x ∈ νs (P)        c which is a
ating the set of possible merge items by enumerating the different          document corresponding to the common lowest ancestor of the ver-
possibilities with the conflicting edits u2 and u3 . The two initial        sions ν1 (P)c and ν2 (P). c The relations 2) means that x 6∈ ν1 (P),       c
possible results are obtained by propagating respectively u2 and u3                                                          c             c
                                                                            i.e., in the history of edits that gave ν1 (P) from νs (P) there was
given the intermediate outcome. Such a propagation will give in the         at least a deletion u2 over the node x. This is implied by the way
first case a merged version that only contains the sub-tree s2 , and        updPrXML() proceeds. Besides that, 3) enables us to write in one
in the second case a merged version with the sub-tree s1 (includ-           side x ∈ ν2 (P)c since y ∈ ν2 (P)    c and on another side y 6∈ νs (P).    c
ing nodes p1 and p′1 ) in addition. Concretely, our merge approach                                                                       c from νs (P) c
                                                                            As a result, in the history of edits that led to ν2 (P)
will compute the same merged documents by first considering the
                                                                            there was an insertion u4 which added y as a child of x. In other
input versions T1 and T2 , and then by updating these with the edits
                                                                            words, u2 and u4 define two conflicted edits performed on the same
without conflicts respectively from {u3 } and {u2 , u4 }. Finally, the
                                                                            node x.
last possible content for the merge is obtained by discarding all the
conflicting edits and by combining the concurrent nodes in the base            A conflicted node in P  c results in conflicting descendants. We
version with the intermediate result.                                                                               c according to the merge of
                                                                            refer to the conflicted set of nodes in P
                                                                            events e1 and e2 as the restriction Pc
   The uncertain merging operation as formalized above remains                                                     |C{e1 , e2 } . Under this, we infer
however intractable since it requires to evaluate every possible ver-       below the non-conflicted set of nodes.
sion for computing the overall merge result. Below, we propose a               D EFINITION 4.3. (Non-conflicted node) For the merge of events
more convenient way to do this merge.                                                                                                       c\
                                                                            e1 and e2 , we define a non-conflicted node x as a node in P
4.2.2 Merging over Probabilistic XML Encoding                                c
                                                                            P|C{e , e } having a formula fie(x) satisfying one of the following
                                                                                  1 2
   We efficiently present the semantics of the merge operation in           conditions.
Tcmv as Algorithm 1, namely mergePrXML. Prior to a deeper de-               5 The formula just describes the semantics of edits from the event
scription of the proposed algorithm, we start by introducing the            where the node was inserted for the fist time.
                                                           T1 ) r                                                                                r

                                                                 s2                                                                              s2

                                                                p2                                                                              p2

                                                                 t2                                                                              t2
                                                                                                                              2
                                                                                                                      at   eu
                                                           {e1 , e2 , e4 }                                          ag                 {e1 , e2 , e4 , e′ }
                                                                                                                   p
                      u2 : Delete the subtree s1                                                             P   ro                                   r
  Ta ) r                                                                                      r
                      u4 : Insert the subtree s2
         s1                                                                    Merge         s2                                            s1                  s2
                                                                              (event e’)             Propagate u3
         p1                                                                                  p2                                   p1            p’1            p2

         t1                                                                                  t2
                                                                                                                                  t1            t’1            t2
        {e1 }         u3 : Insert the subtree p′1                                           {e′ }       Di
                      at the node s1                                                                         sc
                                                                                                               ard                         {e1 , e3 , e′ }
                                                                                                                     u2                          r
                                                    T2 )         r                                                        an
                                                                                                                            du
                                                                                                                              3
                                                                s1                                                                         s1             s2

                                                           p1         p’1                                                                  p1             p2

                                                           t1         t’1
                                                                                                                                           t1             t2
                                                             {e1 , e3 }
                                                                                                                          {e1 , e′ }
  a) Ta (common base); T1 and T2 (versions to merge);             b) First, validating u4 which does not have any conflict. Then,
            {u2 , u4 } and {u3 } (edit scripts)                               resolving the conflict between u2 and u3 .
                            Figure 2: Merge Operation: (a) Input versions and (b) Generation of Merge results
   1. fie(x) |= νs , fie(x) 6|= ν1 and fie(x) 6|= ν2 .                                     c e1 , e2 , e′
                                                                               Input: (G , P),
   2. fie(x) 6|= νs , fie(x) |= ν1 and fie(x) |= ν2 .                          Output: Merging Uncertain XML Versions in T        cmv
   3. fie(x) |= νs , fie(x) |= ν1 and fie(x) 6|= ν2 .                                       ′          ′          ′
                                                                               G := G ∪ ({e }, {(e1 , e ), (e2 , e )});
   4. fie(x) |= νs , fie(x) 6|= ν1 and fie(x) |= ν2 .
                                                                                                                     c\ P
                                                                                                                        c|C
   5. fie(x) 6|= νs , fie(x) |= ν1 and fie(x) 6|= ν2 .                         foreach non-conflicted node x in P          {e , e }
                                                                                                                                    do
                                                                                                                                       1   2
   6. fie(x) 6|= νs , fie(x) 6|= ν1 and fie(x) |= ν2 .                             replace(fie(x), e1 , (e1 ∨ e′ ));
                                                                                   replace(fie(x), e2 , (e2 ∨ e′ ));
   P ROPOSITION 4.2. Definition 4.3 is consistent with the defini-                         c
                                                                               return (G , P)
tion of non-conflicted nodes given in Section 4.1.1.                                 Algorithm 1: Merge Algorithm (mergePrXML)
  The proof is straightforward. To be exhaustive about non-conflicted
nodes, we infer the following lemma.
                                                                                Algorithm 1 considers as inputs the probabilistic encoding (G , P)   c
   L EMMA 4.1. Let us assume the merge over events e1 and e2 .               of an uncertain multi-version XML document Tmv , the events e1
Given the sets Fs ⊆ As , F1 ⊆ (Ae1 ∪ {e1 }) \ As and F2 ⊆ (Ae2 ∪             and e2 of G , and the new event e′ modeling both the merge con-
{e2 }) \ As , the expression of fie(x) for any non-conflicted node x ∈       tent items and the amount of uncertainty in these. Given that,
Pc\ P c
        |C{e , e } is consistent with one of the following formulas.         mergePrXML first updates G as specified in Section 4.2.1. Then,
              1   2
                            V                          
         V                                                                   the merge in P c will result in a slight change in formulas attached
   1.      ei ∈Fs (ei )  ∧ ¬ e j ∈(F1 ∪F2 ) (e j )
                                                                          to certain non-conflicting nodes in P  c\ P c
       V                     V                                                                                             |C{e1 , e2 } . The function
   2.   e ∈F (ei ) ∨           e j ∈F2 (e j )
       Vi 1                V                          V                replace modifies such formulas by substituting all occurrences of
   3.      ei ∈Fs (ei ) ∨         e j ∈F1 (e j )      ∧ ¬ ek ∈F2 (ek )       e1 and e2 by (e1 ∨ e′ ) and (e2 ∨ e′ ) respectively. The idea is that
       V                                                          
                                  V                        V                 each possible merge outcome, which occurs when e′ is valuated to
   4.      ei ∈Fs (ei ) ∧ ¬           e j ∈F1 (e j )    ∨ ek ∈F2 (ek )
       V                                                                    true regardless of the valuation of the other events, must come with
   5.            (ei )                                                                                                 c seen as valid with e1 and
       Vei ∈F1                                                              at least the non-conflicted nodes from P
   6.   ei ∈F2 (ei )
                                                                             e2 . The remaining non-conflicted nodes, whose existence are inde-
  P ROOF. The proof relies on Definition 4.3.                                pendent of e1 and e2 , will depend uniquely on the valuation of their
                                                                             ancestor events in each given valid event set including e′ . At least,
   Let us continue this section by first describing mergePrXML, then         the validity of a conflicting node in a merge result relies on the
by demonstrating its correctness with respect to the abstraction of          probability of e1 and e2 when the event is e′ certain. If e′ , together
the merge operation in Section 4.2.1.                                        with e1 , are only valuated to true, we say that e1 is more probable
 than e2 for the merge; in this case, only conflicted nodes valid with         using the relations Ω′ (F ) = ν (P),   c ν (P)  c = ν (P  c′ ) and ν (P
                                                                                                                                                     c′ ) =
 Ae1 ∪ {e1 } are chosen. The converse works in the same manner.                Ω′′ (F ) 6 .
 Any conflicted node will be rejected with a valuation setting e′ to               3. For each subset F such that {e1 , e′ } ∩ F 6= 0/ and e2 6∈ F ,
                                                                               we have Ω′ (F ) = [Ω((F \ {e′ }) \ (Ae2 \ As ))]∆2 −∆ . Let F1+ =
                                                                                                                                              C
 true and the revision variables in both e1 and e2 to false.
    Assume an uncertain multi-version XML document Tmv = (G , Ω)               ((F \ {e′ }) \ (Ae2 \ As )) be the set of valid events excepting the
 and the corresponding probabilistic XML encoding T     cmv = (G , P).
                                                                   c           valid ancestor events of e2 in F ∩ (Ae2 \ As ). For the valuation ν
 In addition, let us define J.K as the semantics operator which, ap-           setting all the events in F1+ to true and the revision variables in
            cmv , yields its correct semantics JT
                                                cmv K = (G , JPK)
                                                              c such           the remaining events to false, Scenarios 1 and 2 enable us to write
 plied on T
                                                                               ν (P)c = Ω(F + ) and ν (P)    c = ν (P    c′ ). Now, we just need to show
                                        c
 that G is the same as in Tmv and JPK defines the same probabil-                                 1
                                                                               that [ν (P c′ )]∆2 −∆ = ν ′ (P
                                                                                                     C
                                                                                                               c′ ) (ν ′ being the valuation that sets all
 ity distribution over a subset of documents in D than Ω. Given a
 merge operation MRGe1 ,e2 ,e′ , we now show the main result of this           the events in F to true and the revision variables in all others to
 paper:                                                                        false) to obtain the expected proof, that is, Ω′ (F ) = Ω′′ (F ). Let
                                                                               us set ∆ = ∆2 − ∆C . We distinguish the two following cases.
    P ROPOSITION 4.3. The definition of Algorithm 1 is correct with            (a) For the class of nodes x in ν (P       c′ ) unmodified by ∆. That is,
 respect to the semantics of the merge operation over the uncertain                      c                 c    ∆
                                                                               x ∈ ν (P ) and x ∈ [ν (P )] . For each node x in this class, fie↑ (x)
                                                                                           ′                 ′
 multi-version XML document. In other words, the following dia-                     c′ requires the trueness of events in F + since x ∈ ν (P          c′ ).
 gram commutes:                                                                in P                                                 1
                                                                               Moreover, by the definition of ∆, x may be either a conflicted node
                                     J.K                                       or its existence is independent of the values of events in (F ∩(Ae2 \
                   Tc
                    mv
                                                         cmv K
                                                        JT                     As )) ∪ {e2 }). If x is a conflicted node, it is intuitive to see that
                                                                               fie↑ (x) in P  c′ is satisfied if events in F + are all set to true and
                                                                                                                                 1
mergePrXML                                                                     not if only those in (F ∩ Ae2 ) are set to true (cf. Definition 4.2.1
                                                              MRGe1 ,e2 ,e′
 (e1 , e2 , e′ )                                                               and updPrXML). In the another case, fie↑ (x) is only function of the
                                                                               values of events in F1+ for the set F (typically when fie↑ (x) is Ex-
                   Tc′
                    mv
                                                         c′ K
                                                        JT mv                  pression 5 in Lemma 4.1 with F1 = F1+ ∩ ((Ae1 \ As ) ∪ {e1 })).
                                     J.K
                         (                                                     In both cases, we can state that fie↑ (x) |= ν ′ since ν ′ similarly to ν
                             MRGe1 ,e2 ,e′ (JT cmv K) = (G ′ , Ω′ )            sets all the events in F1+ to true. As a result, we have x ∈ ν ′ (P     c′ )
     P ROOF. Assume:                                                           when x belongs to this class.
                             c′       ′    c ′
                             Tmv = (G , P ) and JT     c′ K = (G ′ , Ω′′ )
                                                        mv
                                                                               (b) For the class of nodes x ∈ ν (P        c′ ) handled with ∆. Let x 6∈
      Seeing that we reach the same version space using mergePrXML                  c     ∆
                                                                               [ν (P )] , that is, there is an operation u in ∆ that removes x from
                                                                                       ′
 is trivial. Now, we have to show that to Ω′ will correspond JP        c′ K;        c′ ). By the construction of ∆, it is easy to show that fie(x)
                                                                               ν (P
 that is, Ω = Ω . Given each set F ⊆ V , five scenarios must be
             ′     ′′                            ′
                                                                                    c maps to Expression 3 in Lemma 4.1 with Fs = F + ∩ As ,
                                                                               in P                                                                1
 checked for this equality.
      1. For each subset F such that e′ 6∈ F , we have Ω′ (F ) = Ω(F ).        F1 = F1+ ∩((Ae1 \As )∪{e1 }) and F2 = (F ∩(Ae2 \As ))∪{e2 }.
                                  c where ν is a valuation setting the         In P c′ , this formula fie(x) is just updated across Algorithm 1 by re-
 By definition, Ω(F ) = ν (P)
 special revision variable in e′ to false and the other events to an           placing the events e1 (the disjunction) in the first member and e2 in
 arbitrary value. Abstracting out the formulas, we can claim that              the second one respectively by (e1 ∨ e′ ) and (e2 ∨ e′ ). Since ν ′ sets
 Pc∼ P     c′ regarding mergePrXML. Since e′ 6|= ν , the result of the         all the events in F to true, therefore the first member of fie(x) will
 evaluation of ν over (e1 ∨ e′ ) and (e2 ∨ e′ ) (or their negation) only       be valuated to true while the second member will be valuated to
 depends on the truth values of e1 and e2 respectively. Thus by re-            false. As a consequence, clearly we can state that fie(x) 6|= ν ′ , i.e.,
                            c′ all occurrences of (e1 ∨ e′ ) and (e2 ∨ e′ )    x 6∈ ν ′ (Pc′ ). In summary, we proven that for each node x in ν (P     c′ )
 placing in formulas of P
                                                                         c′′
                                                                                                                                      ′ c
                                                                               deleted with ∆, x is also not chosen by ν in ν (P ). Note that the
                                                                                                                                ′         ′
 by e1 and e2 respectively, we are sure to build a p-document P
            c          c                                                 c′′   case where x does not occur in ν (P)      c is trivial and it corresponds
 with ν (P ) = ν (P ). But by the definition of mergePrXML, P
               ′         ′′
                c                             c          c                     to a scenario in which fie(x) maps to Expression 1 in Lemma 4.1
 is exactly P. As a result, we obtain ν (P) = ν (P ). Knowing be-
                                                           ′
                                                                               with Fs = F1+ ∩ As , F1 = F1+ ∩ ((Ae1 \ As ) ∪ {e1 }) and F2 =
 forehand that Ω′′ (F ) = ν (P   c′ ), we can state that Ω′ (F ) = Ω′′ (F )
                                                                               (F ∩ (Ae2 \ As )) ∪ {e2 }. Now, let x 6∈ ν (P       c′ ) and x ∈ [ν (Pc′ )]∆ .
 for any F ⊆ V ′ \ {e′ }.
                                                                               That is, there is an insertion u in ∆ that adds the node x as a child
      2. For each subset F such that {e1 , e2 , e′ } ∩ F 6= 0,   / we have                            c′ ). Let F + = (F \ {e′ } ∩ Ae ) be the set of
                                                                               of a node y in ν (P
 Ω (F ) = Ω(F \ {e′ }). Let ν be a valuation setting all the events in
    ′                                                                                                               2                       2
                                                                               all valid ancestor events of e2 in F . By the definition of ∆, we can
 F \ {e′ } to true, the revision variable in e′ to an arbitrary value and
                                                                               state that the formula fie(x) of x in P    c maps to Expression 4 or 6 in
 the revision variables in the remaining events to false. Since e′ does
                             c we can write Ω(F \ {e′ }) = ν (P)     c for     Lemma 4.1 with Fs = F1 ∩ As , F1 = F1+ ∩ ((Ae1 \ As ) ∪ {e1 })
                                                                                                              +
 not occur in formulas in P,
                                                                               and F2 = (F ∩ (Ae2 \ As )) ∪ {e2 }. If fie(x) is Expression 4, this
 sure. At this step, we resort to the logical consequences (e1 |=
                                                                               formula in P    c′ is updated by Algorithm 1 which replaces e1 in the
 ν ) ⇒ ((e1 ∨ e′ ) |= ν ) and (e2 |= ν ) ⇒ ((e2 ∨ e′ ) |= ν ) regardless
 of the truth-value of the event e′ . In the same way, (¬e1 6|= ν ) ⇒          first member (the conjunction) and e2 in the second member by
 (¬(e1 ∨ e′ ) 6|= ν ) and (¬e2 6|= ν ) ⇒ (¬(e2 ∨ e′ ) 6|= ν ). Therefore, by   (e1 ∨ e′ ) and (e2 ∨ e′ ) respectively. Since ν ′ sets all the events in F
                                 c′ all occurrences of (e1 ∨ e′ ) and (e2 ∨    to true, the first member of fie(x) will be valuated to false. As for
 substituting in formulas of P
  ′                                                                  c with    the second member, it will be valuated to true under ν ′ because all
 e ) by e1 and e2 respectively, we obtain the old p-document P                 the events in F2 are set to true and (e2 ∨ e′ ) |= ν ′ . As a result, ν ′
       c          c
 ν (P ) = ν (P) given the semantics of mergePrXML. Moreover,
         ′

 Ω′ (F ) = ν (P) c because Ω′ (F ) = Ω(F \{e′ }) and Ω(F \{e′ }) =             6 The last relation is due by the fact that the events e and e are
                                                                                                                                       1     2
       c
 ν (P). So by inference, we can demonstrate that Ω′ (F ) = Ω′′ (F )            both valuated to true.
will select x in ν ′ (P   c′ ) since fie(x) is such that fie(x) |= ν ′ . Other-    constant time. Finally, the replace method only depends on the
wise, if fie(x) is Expression 6, it is updated in P       c′ by mergePrXML         lengths of formulas.
which substitutes e2 by (e2 ∨ e′ ). Given that it is straightforward to
prove that fie(x) |= ν ′ since all the events in F2 are set to true and            5. CONCLUSION
(e2 ∨ e′ ) |= ν ′ .                                                                    We presented in this paper a merge operation, as well as an
Scenarios (a) and (b) demonstrate that when x ∈ [ν (P             c′ )]∆ , then    efficient mapping algorithm, that supplements our uncertain multi-
        ′  c
x ∈ ν (P ). Similarly, the converse can be reached. We con-
             ′                                                                     version XML framework presented in [2]. The merging mecha-
clude [ν (P  c′ )]∆ = ν ′ (P  c′ ) which joint with ν (P  c′ ) = Ω(F + ) and       nism proposed covers common deterministic XML merge scenar-
                                                                        1
     c                                  + ∆                                        ios while managing uncertain data. Example applications include
ν (P ) = Ω (F ) yield [Ω(F1 )] = Ω (F ). At last, the result
  ′     ′       ′′                                 ′′
                                                                                   the merging of documents in uncertain version control platforms
Ω′ (F ) = Ω′′ (F ) relies on Ω(F ) = [Ω(F1+ )]∆ .                                  such as Wikipedia.
    4. For each subset F such that {e2 , e′ } ∩ F 6= 0/ and e1 6∈ F , we
have Ω′ (F ) = [Ω((F \{e′ })∩(Ae2 ∪{e2 }))]∆1 −∆ . This scenario
                                                              C


is entirely symmetric to Scenario 3.
                                                                                   6. ACKNOWLEDGEMENTS
    5. For each subset F such that {e1 , e2 } ∩ F = 0/ and e′ ∈ F , we                This work was partially supported by the Île-de-France regional
                                                                                   DROD project, and the French government under the STIC-Asia
have Ω′ (F ) = [Ω((F \ {e′ }) \ ((Ae1 \ As ) ∪ (Ae2 \ As )))]∆3 −∆ .
                                                                             C

             ′              ′                                                      program, CCIPX project. We would like to thank the anonymous
Let set F = (F \ {e }) \ ((Ae1 \ As ) ∪ (Ae2 \ As )). Given a val-
                                                                                   reviewers for their valuable suggestions on improving this paper.
uation ν setting all the events in F ′ to true and the revision vari-
ables in all other events to false, we can write Ω(F ′ ) = ν (P)        c and
ν (P)c = ν (P   c′ ) according to Scenarios 1 and 2. Similarly to Sce-             7. REFERENCES
                                                        c′ )]∆3 −∆C = ν ′ (Pc′ )    [1] T. Abdessalem, M. L. Ba, and P. Senellart. A probabilistic
nario 3, we have now to demonstrate that [ν (P
                                                                                        XML merging tool. In Proc. EDBT, 2011.
(where ν is the valuation setting all the events in F to true and the
           ′
revision variables of all the remaining events to false) to obtain that             [2] M. L. Ba, T. Abdessalem, and P. Senellart. Uncertain version
                                                                                        control in open collaborative editing of tree-structured
[Ω(F ′ )]∆3 −∆ = ν ′ (P
                 C
                             c′ ). This result will be sufficient for stating
                                                                                        documents. In Proc. DocEng, 2013.
that Ω (F ) = Ω (F ) since by definition Ω′ (F ) = [Ω(F ′ )]∆3 −∆
                                                                             C
         ′          ′′
                                                                                    [3] G. Cobena, T. Abdessalem, and Y. Hinnach. A comparative
           c
and ν (P ) = Ω (F ). Let us consider ∆ = ∆3 − ∆ . For the
        ′     ′        ′′                                         C
                                                                                        study for XML change detection. In BDA, 2002.
proof, the intuition here is to see that by implementation ∆ can be                 [4] B. Collins-Sussman, B. W. Fitzpatrick, and C. M. Pilato.
rewritten as (∆1 − ∆C ) ∪ (∆2 − ∆C ) on one side. So, if we ar-                         Version Control with Subversion. O’Reilly Media, 2008.
rive to show that for all the nodes x, y such that x ∈ [ν (P       c′ )]∆1 −∆C      [5] J. Estublier. Software configuration management: A
and y ∈ [ν (P   c′ )]∆2 −∆ , then x ∈ ν ′ (P
                           C
                                               c′ ) and y ∈ ν ′ (Pc′ ), we can          Roadmap. In Proc. ICSE, 2000.
                                                      c  ∆
trivially deduct that for each node x ∈ [ν (P)] , then x ∈ ν ′ (P          c′ ).    [6] R. L. Fontaine. Merging XML files: A new approach
These relations can be proven in the same spirit as in Scenario 3 by                    providing intelligent merge of XML data sets. In Proc. XML
just noting that:                                                                       Europe, 2002.
(a) For the set of unmodified nodes x in ν (P           c′ ) with ∆, fie(x) in      [7] S. Khanna, K. Kunal, and B. C. Pierce. A formal
       c
      P is compatible to Expression 1 in Lemma 4.1 with F1 =                            investigation of Diff3. In Proc. FSTTCS, 2007.
      F ′ ∩As , F1 = (F ∩(Ae1 \As ))∪{e1 }, and F1 = (F ∩(Ae2 \                     [8] E. Kharlamov, W. Nutt, and P. Senellart. Updating
      As )) ∪ {e2 }.                                                                    Probabilistic XML. In Proc. Updates in XML, 2010.
(b) For the set of nodes x in ν (P     c′ ) handled with ∆. If the operation        [9] B. Kimelfeld and P. Senellart. Probabilistic XML: Models
      comes from (∆1 − ∆C ), fie(x) is compatible to Expression 3 or                    and Complexity. In Advances in Probabilistic Databases for
      5 in case of insertions and this formula maps to Expression 4                     Uncertain Information Management. Springer-Verlag, 2013.
      for deletions. Concerning operations in (∆2 − ∆C ), fie(x) is                [10] T. Lindholm. A three-way merge for XML documents. In
      compatible to Expression 4 or 6 for insertions and to Expres-                     Proc. DocEng, 2004.
      sion 3 for deletions.                                                        [11] J. Ma, W. Liu, A. Hunter, and W. Zhang. An XML Based
Furthermore, the inclusion in the opposite direction, that is for                       Framework for Merging Incomplete and Inconsistent
each node x ∈ ν ′ (P      c′ ), then x ∈ [ν (P c′ )]∆ , is obvious for nodes            Statistical Information from Clinical Trials. In Z. Ma and
unchanged by ∆. For the other nodes, we only need to verify that                        L. Yan, editors, Software Computing in XML Data
                                              c′ ) depending whether it cor-            Management. Springer-Verlag, 2010.
they are correctly handled by ∆ in ν (P
responds to an addition or a deletion in this document. Following                  [12] L. Peters. Change detection in XML trees: a survey. In TSIT
that, we can state that Ω′ (F ) = Ω′′ (F ) holds for each subset F                      Conference, 2005.
of events in V that contains e′ but not e1 and e2 .                                [13] N. Suzuki. A Structural Merging Algorithm for XML
                                                                                        Documents. In Proc. ICWI, 2002.
   We conclude by showing efficiency of mergePrXML.
   P ROPOSITION 4.4. mergePrXML performs the merge over the
encoding of any uncertain multi-version XML document in time
proportional to the size of the formulas of nodes impacted by the
updates in merged branches.
   P ROOF. (Sketch) The intuition behind the time complexity is
that mergePrXML results in a constant-time update of DAG G , firstly.
Secondly, by viewing P  c as an amortized hash table, the algo-
rithm retrieves any node impacted by a (non-conflicting) update in