<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Merging Uncertain Multi-Version XML Documents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Talel Abdessalem Institut Mines-Télécom</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Télécom ParisTech</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>LTCI Paris</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France talel.abdessalem@ telecom-paristech.fr</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>M. Lamine Ba Institut Mines-Télécom; Télécom ParisTech; LTCI Paris</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Pierre Senellart Institut Mines-Télécom; Télécom ParisTech; LTCI Paris</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <volume>1008</volume>
      <abstract>
        <p>Merging is a fundamental operation in revision control systems that enables integrating different changes made to the same documents. In open platforms, such as Wikipedia, uncertainty is ubiquitous, essentially due to a lack of knowledge about the reliability of contributors. We propose in [2] a version control framework designed for uncertain multi-version tree-structured documents, based on a probabilistic XML model. In this paper, we define a merge operation that complements our framework and enables the conciliation of uncertain versions. We devise an efficient algorithm that implements the merge operation and prove its correction.</p>
      </abstract>
      <kwd-group>
        <kwd>XML</kwd>
        <kwd>collaborative work</kwd>
        <kwd>uncertain version control</kwd>
        <kwd>merge</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>Uncertain version control. Version control of uncertain data
has concrete applicability in open environments such as web-scale
editing platforms. Most of these platforms, in particular Wikipedia 1,
are facing (1) the rapid growth of the number of contributors with
different level of reliability, and (2) the will to provide the users
with the most trustworthy content. This latter purpose is especially
challenging because of uncertainties in data inherent to unreliable
contributors, recurrent conflicts (contradictions or edit wars) and
frequent malicious contributions (e.g., spam). Besides that, trust is
a subjective notion which sorely depends on the user preferences.</p>
      <p>So far, within web-scale collaborative platforms like Wikipedia,
version control allows maintaining the integrity of each document
by tracking all contributions, as well as their history and their
authors. This gives therefore the ability to revert to a given revision
when some editing problems such as vandalism acts and unsourced
information appear. Used version control approaches are however
not necessarily intended to the versioning of uncertain data, and</p>
      <p>Motivations. Amongst the motivations for this work, we can
cite the following ones. On one hand, users may trust only some
contributors and want to see the document resulting from the
conciliation of their contributions, i.e., the merge of the revisions
produced by these contributors. If the user preferences are known (e.g.,
based on her personal settings or past behaviour), a recommender
system can be built on Wikipedia in order to propose to the user a
version resulting from the merge of the contributions of her trusted
authors. On the other hand, open platforms such as Wikipedia
requires as a core functionality the merge of articles which
overlap (articles related to the same topic or sharing a large common
part). This operation is currently done manually and requires a lot
of time and coordination between contributors. This results in a
tedious and error-prone supervised merge process. Providing an
automated integration processes of these articles is certainly
useful. To this goal, a merging operation is needed and it has to take
into account the uncertainty associated to the merged data. In this
paper, we present our uncertain version control model with a
merging operation that covers common deterministic merge scenarios
over XML documents while managing uncertain data. We devise
an efficient algorithm for merging uncertain multi-version XML
documents and prove its correctness.</p>
      <p>Outline. First, we present in Section 2 the merge and edit
detection techniques used for XML documents. Then, we summarize in
Section 3 our uncertain muti-version model. Section 4 concretely
presents our merging operation, as well as a corresponding efficient
algorithm. Finally, we conclude the paper in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>XML MERGE ALGORITHMS</title>
      <p>
        The increasing use of XML-based systems, in particular those
with a built-in version control engine, has lead to the adoption of
new XML merge techniques, e.g., [
        <xref ref-type="bibr" rid="ref10 ref13 ref6">6, 10, 13</xref>
        ]. These algorithms,
aware of the tree-like structure of XML documents, have arisen as
a reliable alternative to classical methods within XML settings.
Indeed, traditional methods for merging text or binary files, cannot
detect meaningfully the semantics of changes over trees. Most of
current XML merge algorithms share as a baseline the diff step (edit
detection) always preceding the generation of the merged
document. Some main differences can be stated as follows: (i) two-way
versus three-way approaches, that is, the use or not of the common
base document from which merged ones are derived; (ii) the set of
handled edit operations; (iii) the compliance to ordered XML
elements or unordered ones and; (iv) the conflict management strategy.
In the following, we briefly survey a few of these algorithms for
merging XML documents. We refer to [
        <xref ref-type="bibr" rid="ref12 ref3">3, 12</xref>
        ] for a more
exhaustive overview about deterministic XML merge and edit detection
techniques.
      </p>
      <p>
        Merging in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] has tackled ordered XML trees, more
suitable in some human-edited contexts such as structured reports,
rich text formats, etc. In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the motivation was the
synchronization of versions of documents edited by different users. The
author has explored a structural two-way merge via a
polynomialtime algorithm which directly computes two isomorphic trees
representing the merge output from the two input XML documents.
The trees are progressively built in a bottom-up fashion with nodes
(having unique identifiers) from the two documents, while ensuring
their isomorphism during this construction by applying a series of
node insertion, deletion and update when a difference is detected.
As a result, the process of generating isomorphic trees, thereby the
merge result, slightly involves a detection of the differences
between merged XML documents. Therefore, there is an implicit
processing of edit changes. However, no details are given by the author
about the processing of conflicts. As for [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the focus was on the
reintegration 2 of changes to a document in cases where multiple
independently modified copies of the document have been made.
The paper has proposed a three-way XML merging algorithm with
clear merge rules (e.g., node sameness, node context) and a
categorization of conflicts based on real-world use cases. In contrast
to [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the algorithm of Lindholm [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] uses a trees matching
process detecting move operations in addition to insertions, deletions
and updates of nodes. In its merge step, core and optional
conflicts are defined: a core conflict (e.g., update/update of a node)
will cause a failure of the merge, whereas an optional conflict (e.g.,
delete/update of a sub-tree) may be tolerated. The system does
not pretend to resolve all conflicts, but it always reports unresolved
scenarios. La Fontaine, in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], has focused more on the best XML
matching strategy regarding node insertions and deletions. An
intermediate (optimal) XML diff file encoding the matches is used
to ease the merge process with the help of an XML transformation
language such as XSLT dialect. This algorithm was designed to
run both in a two-way setting and a three-way one regardless of the
considered XML document model. Note that the aforementioned
XML merge algorithms are all deterministic.
      </p>
      <p>
        In contrast, two-way merging operations in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] are
intended for uncertain XML documents. The followed process
con2Merging changes that led to two distinct documents and apply the
merge result into a third document.
sists of the same steps as in deterministic settings. The main
distinction with [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is that its merge outcome is an XML document
where nodes come with some elements modeling their amount of
uncertainty (the synchronizer [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is based on Dempster–Shafer
theory to deal with uncertainty, in the form of probability values,
degrees of beliefs, or necessity measures, associated to data) that
does not retain enough information for retrieving back individual
versions merged. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is most closely related in spirit to the current
paper since both rely on the same general framework for
managing uncertain XML in a typical versioning process; merging is not
formally considered in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
3.
      </p>
    </sec>
    <sec id="sec-3">
      <title>UNCERTAIN MULTI-VERSION XML</title>
      <p>A multi-version (XML) document with uncertain data evolves,
through uncertain updates, and leads to uncertain versions. It is
represented by the means of an uncertain multi-version XML document
model, that describes the version space of this document together
with a probability distribution over the set of possible versions.</p>
      <p>
        The following is a concise summary of the formal foundations of
our model and the evaluation of updates over it. For more details,
see [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Model: Formal Definition. A multi-version XML document
Tmv with uncertainty is a couple (G , Ω) where G is a directed
acyclic graph (DAG) over a set V ∪ {e0} of events e0 . . . en
representing the version space of Tmv, and Ω is a function giving the
possible versions of the document, as we now detail.</p>
      <p>An event ei in V has a random nature and happens with a
certain probability. It is defined as a conjunction of random Boolean
variables b1 . . . bm that each model a given source of uncertainty
(e.g., the source of information). This definition of the events using
Boolean variables lies on the following: (i) variables are pairwise
independent, that is, Pr(b j ∧ bk) = Pr(b j) × Pr(bk) for all b j 6= bk;
(ii) a variable b j, correlating different events, can be reused across
events; (iii) one revision variable b(i), representing more
specifically the uncertainty in the content, is not shared across other events
and only occurs in ei. Our version control system is state-based
with events modeling the different uncertain states of the evolution
of the versioned document. A state, i.e., an event, has contextual
information about a given version (in the form of, first, Boolean
random variables involved; second, an edit script Δi; third, possible
other metadata).</p>
      <p>The DAG G = (V ∪ {e0}, E ) keeps the history of the evolution
of Tmv with: (i) the particular event e0 6∈ V , which represents the
initial state of Tmv, as the root of G ; (ii) E ⊆ V 2 defines the set
of directed edges of G that enable to implicitly track derivation
relationships between the generated (uncertain) versions. A branch
of G is a directed path in which the tail e j is reachable from the
head ei by traversing a set of ordered edges in E . A rooted branch
is a branch with e0 as head node.</p>
      <p>
        A version of Tmv is an XML document mapping to a set of events
in G , the events whose edit scripts together made this version
happen. Such an event set is always a rooted branch in G in a
deterministic versioning case, whereas it can be arbitrary in the
uncertain setting. In the model, formally an XML document is an
unordered3 , unranked, and labeled tree T in which a node x has a
unique identifier α (x) in I and a label ϕ(x) in L with I ∩ L = 0/
(for brevity, we do not mention node identifiers when depicting
example trees). In addition, all trees considered share the same root
node (same label, same identifier). Given the set 2V of all
subsets of V and the infinite set D of all XML trees, the mapping
3We leave the extension to ordered trees open as in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Ω : 2V → D associates sets of events to versions of Tmv in such a
way that (a) Ω( 0/) corresponds to the root-only XML tree of D and;
(b) for all i, for all F ⊆ 2V \{ei}, Ω({ei} ∪ F ) = [Ω(F )]Δi where
Δi is the script attached to the event ei and [Ω(F )]Δi its
evaluation over the document Ω(F ). A mapping Ω implicitly defines a
probability distribution over the set of versions, as detailed in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>We have just defined an abstract multi-version XML document
– we now provide a general and concise syntax for it, that has for
semantics such a multi-version document.</p>
      <p>
        Probabilistic Encoding: Syntax and Semantics. We have
introduced in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] a syntax Tcmv for an uncertain multi-version XML
document, based on probabilistic XML [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. An uncertain
multiversion XML encoding Tcmv is defined by a pair (G , Pc) where
(a) G is as before a DAG of events and; (b) Pc is a PrXMLfie
pdocument with random variables b1 . . . bm representing efficiently
all possible versions and their corresponding event sets. Formally,
the PrXMLfie p-document Pc is an unordered, unranked, and
labeled tree where every node (except the root) x may be annotated
with an arbitrary propositional formula fie(x) over b1 . . . bm.
Different nodes in the p-document can be correlated by the use of
common variables. A valuation ν of the variables b1 . . . bm is a Boolean
function that sets some variables to true and the remaining to false.
This valuation ν produces over Pc the particular XML document
ν (Pc), also known as a possible world, in which only nodes
annotated with formulas valuated at true by ν are kept (nodes whose
formulas are valuated to false by ν are deleted from the tree, along
with their descendants). The probability of this document ν (Pc) is
given by the sum of the probability of the valuations that yield the
document. For a more detailed picture of the PrXMLfie
representation system, see [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ].
      </p>
      <p>The semantics of an encoding Tcmv, denoted JTcmvK, is an
uncertain multi-version XML document (G , Ω). The DAG G does not
change, whereas Ω is such that Ω(F ) := νF (Pc) for all F ⊆ V ,
where νF is a valuation over variables b1 . . . bm defined as follows.
Let B+F be the set of all random variables occurring in one of the
events of F and set B−F the set of all revision variables b(i)’s for ei
not in F . Then νF sets variables of B+F to true, variables of BF
−
to false, and other variables to an arbitrary value. This semantics
remains non-ambiguous as long as formulas occurring in Pcare
expressed as formulas over the events of V , i.e., do not make use of
the Boolean variables separately of the events.</p>
      <p>Updates: Semantics and Evaluation. An edit script Δ is a
set of edit operation over XML nodes. An edit operation is either an
insertion or a deletion of nodes. An insertion is formally defined
as ins(i,x) with i the identifier of the node where the insertion
must take place and x the label of the new node to be added. As
for a deletion, it is introduced as del(i) where i represents the
identifier of the node to remove. The evaluation of Δ over any XML
document T produces the document [T ]Δ by applying insertions
and deletions to T ; if no node is selected by a given insertion or
deletion, it is simply ignored.</p>
      <p>
        An update operation is set up in the uncertain multi-version XML
framework as updOPΔ, e, e′ where Δ is an edit script, e is an actual
event pointing to the edited version and e′ is a fresh one
assessing the uncertainty in this update. Its semantics on Tmv = (G , Ω)
(i) updates G to (G ∪ ({e′}, {(e, e′)}) and; (ii) extends Ω to Ω′
by letting for all F ⊆ V ∪ {e′}: Ω′(F ) := Ω(F ) if e′ 6∈ F and
Ω′(F ) := [Ω(F \{e′})]Δ otherwise. The translation of this
semantics on the general syntax Tcmv = (G , Pc) is done through an update
algorithm updPrXML that first modifies G as before, and then it
evaluates operations in Δ over the p-document Pc as follows. This
is the usual implementation of updates in probabilistic XML [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
and we show that this is compatible with the semantics of
multiversion encodings in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>• For an insertion u = ins(i,x) in Δ: fie(x) of x in Pc is set to
fie(x) ∨ (e′) if x already occurs in Pc; otherwise, u inserts x
in Pcwith fie(x) = e′.
• For a deletion u = del(i) in Δ: the node x in Pc such that
α (x) = i (if it exists) has its formula fie(x) updated to fie(x) ∧
(¬e′).</p>
      <p>EXAMPLE 3.1. Figure 1 shows an uncertain multi-version
document Tmv with: (a) the version space G with four staged events;
(b) four event sets and associated versions; (c) the p-document
Pc encoding all the possible versions based on staged events and
their attached edit scripts, resulting in formulas attached to nodes
(shown above each node). As a sketch, (e1 ∧ ¬e2) reveals that s1
was added at e1 and then removed at e2. The given four versions
are exactly those modeled by deterministic systems. In contrast,
the possible world mapping to {e1, e4} is only valid within our
framework. It occurs with the reject of the changes introduced by
event e2. Note that the probability of each possible version can be
evaluated based on event sets that map to it and their probabilities.
4.</p>
    </sec>
    <sec id="sec-4">
      <title>MERGE APPROACH</title>
      <p>We detail in this section the translation of the usual XML merge
operation within our uncertain versioning model.</p>
      <p>A merge operation considers a set of versions and integrates their
content in a single new one. We view this outcome as obtained via
a three-way merge 4, that is, an integration of the changes from the
inputs with respect to their common base version. We focus here
on merging two versions which is the most common case in real
applications. However, an extension to n &gt; 2 versions is
straightforward. In addition, we also assume that all the merged versions
are only originated from updates over the base versions, i.e., we do
not consider merging of versions with a different merge history –
this is again for the sake of clarity of the exposition.</p>
      <p>The merge process usually implies two steps: a) an extraction of
the different sets of edit scripts that have led to the input versions
and; b) a generation of the merge result by evaluating a unified
set of the extracted edit scripts over the initial data. This last step
must deal with possible conflicting edits (for the definition of
conflicts, see next) due to concurrent changes (i.e., when two editors
independently changes the same piece of data). The resolution of
conflicts may yield several different content items for the merge.
As a result, each possible merge outcome is obtained by making a
choice between several possible edits. This naturally fits in the
system with uncertainty handling because in such a setting there is no
longer only one truth but several different possibilities, each with a
certain probability of validity.</p>
      <p>We first present the process of computing the edit scripts to use
for the merge, as well as common merge scenarios. Then we
introduce the semantics of merging uncertain multi-version XML
documents, as well as an efficient algorithm on the probabilistic XML
encoding.
4.1</p>
    </sec>
    <sec id="sec-5">
      <title>Detection of Edits and Merge Scenarios</title>
      <p>Assume an unordered XML document under version control.
Let us consider two arbitrary versions T1 and T2, along with their
common lowest ancestor Ta, of this.
4A three-way merge enables a better matching of nodes and
detection of conflicts.</p>
      <p>G )</p>
      <sec id="sec-5-1">
        <title>Computation of Edit Scripts</title>
        <p>
          We do not assume here given any explicit edit script. Instead
of this, we include edit detection as an integral part of the merge
process for the sake of generality. We define the edit script
specifying the merge of versions T1 and T2 through the three-way diff
algorithm diff3(T1, T2, Ta) on unordered trees with unique
identifiers for nodes. The algorithm will return a script with only node
inserts and deletes as edit operations. Like in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], we set up our
diff3 based on the two-way diffs diff2(Ta, T1) and diff2(Ta, T2)
as subroutines. These two-way functions separately compute two
intermediate edit scripts using the same process.
        </p>
        <p>• diff2(Ta, T1) initially matches the nodes in trees Ta and T1 in
order to find out the shared, deleted, and inserted nodes. Then, the
algorithm encodes the matches in terms of a set of node insertions
and deletions which evaluated on Ta give T1. A node x ∈ Ta with
no match in T1 is deleted, whereas a node y ∈ T1 with no match in
Ta is added. Let us denote this edit script by Δ1.</p>
        <p>• diff2(Ta, T2) follows the same process and provides the script
Δ2 leading to T2 from Ta.</p>
        <p>A more global edit script, referred as Δ3, models the final value
of the diff3; Δ3 is obtained by mixing Δ1 and Δ2. We describe this
combination with three types of edits as follows.</p>
        <p>Equivalent edits. An equivalence occurs between all edits in
Δ1 and Δ2 with the same semantics and the same arguments (same
identifiers and same labels). Specifically, two insertions u2 ∈ Δ1
and u4 ∈ Δ2 are equivalent if they specify the same node identifier
and the same label to be added. As for deletions in Δ1 and Δ2, there
is an equivalence between two if these target the same node. Given
two equivalent edits, only one of the two operations is kept in Δ3.
Conflicting edits. Any two given operations u2 ∈ Δ1, u4 ∈ Δ2
are conflicting edits when they come with different semantics, i.e,
if u2 is an insertion, then u4 is a deletion (and conversely), and the
insertion has added some new nodes as descendants of the node
that is removed with the delete operation. We introduce conflicted
edits in Δ3 to be those satisfying the properties given above. Given
that, we refer to the set of all conflicting edits in Δ3 with ΔC . We
say that a node handled by conflicted edits is a conflicted node.
Independent edits. Those edits in Δ1 and Δ2 that do not belong
to the two first classes. The set of equivalent and independent edits
form the non-conflicted edits of a given diff algorithm. A node
impacted by a non-conflicted edit is a non-conflicted node for a
given merge operation. (Note that conflicted and non-conflicted
nodes together form the set of all nodes impacted by edit scripts in
Δ3).</p>
        <p>
          Now, let us briefly present the merging scenarios (cf. usual merge
options, especially mine-conflict and theirs-conflict, in tools like
SubVersion [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]) using diffs and input versions.
4.1.2
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Deterministic Merge Scenarios</title>
        <p>A large majority of current versioning models provide three
common merge scenarios that consider the resolution of possible
conflicts. Recall that in most cases, this resolution is manual, that
is, it requires user involvement. Let Tm be the outcome of the
merge of T1 and T2. We formalize the possible merge scenarios as
follows.</p>
        <p>1. First, one would like to perform the merge based on T1 and
by updating this with the non-conflicted edits from Δ2. For this</p>
        <p>C
case, we have Tm = [T1]Δ2−Δ .</p>
        <p>2. The second scenario is symmetric to the first one: it considers
as a base version T2 and fetches the non-conflicted edits from Δ1.
C
For this case, we set Tm = [T2]Δ1−Δ .</p>
        <p>3. Finally, the last case maps to the update of the common
version Ta with the non-conflicted edits in Δ3, that is, one would like
to reject all the conflicting edits in the merge outcome. For this</p>
        <p>C
case, we set Tm = [Ta]Δ3−Δ .</p>
        <p>It is straightforward to show that when ΔC = 0/, then we obtain
the same content for the three merge scenarios. This observation is
inherent to the computation of the edit scripts and the definition of
the merge outcome in each scenario. Observe that we do not deal
with the (intuitive and naive) merge case where the user corrects
the conflicting parts with new inputs. However, this case can be
simply treated by first choosing one the three outcome above and
then by performing updates over this.
4.2</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Merging Uncertain Multi-Version XML</title>
      <p>We now introduce our abstraction of the merge operation
(covering at least the set up of the merge scenarios above) within the
uncertain multi-version XML document model.</p>
      <p>For sure, an uncertain context induces an inherent uncertain merge;
involved versions and diffs come with uncertainties. Let Tmv =
(G , Ω) be an uncertain multi-version XML document with n staged
version control events. In addition, we consider Tcmv = (G , Pc) as
the probabilistic XML encoding of Tmv. Recall again that each
version of Tmv is identified with a particular event in G , the one
representing the tail of the branch of G leading to this version. We
reason on events instead of full versions since these are here
uncertain and can be defined in an arbitrary manner using events. This
section introduces the formalism of the merge operation over any
uncertain multi-version XML document and the mapping algorithm
over its probabilistic XML encoding.
4.2.1</p>
      <p>With the help of the triple (e1, e2, e′), we refer in our setting with
uncertainty to a merge operation as MRGe1,e2,e′ where e1 and e2
point to the two versions to be merged and e′ is a new event
assessing the amount of uncertainty in the merge operation. We evaluate
the semantics of such a merge operation over Tmv with uncertainty
as follows.</p>
      <p>MRGe1,e2,e′ (Tmv) := (G ∪ ({e′}, {(e1, e′), (e2, e′)}), Ω′).</p>
      <p>On the one hand, this evaluation inserts a new event and two
edges in the version space G . On the other hand, it generates a
new distribution Ω′ which represents an extension of Ω with new
possible versions and event sets. Let Ae1 and Ae2 be the set of all
strict ancestor events in G of e1 and e2 respectively. We denote
the common set by As = Ae1 ∩ Ae2 . For all subset F ∈ 2V ∪{e′},
formally we set:
• if e′ 6∈ F : Ω′(F ) := Ω(F );
• if {e1, e2, e′} ⊆ F : Ω′(F ) := Ω(F \ {e′});
• if {e1, e′} ⊆ F and e2 6∈ F : Ω′(F ) := [Ω((F \ {e′}) \
(Ae2 \ As))]Δ2−ΔC ;
• if {e2, e′} ⊆ F and e1 6∈ F : Ω′(F ) := [Ω((F \ {e′}) \
(Ae1 \ As))]Δ1−ΔC ;
• if {e1, e2} ∩ F = 0/ and e′ ∈ F : Ω′(F ) := [Ω((F \ {e′}) \
((Ae1 \ As) ∪ (Ae2 \ As)))]Δ3−ΔC ;</p>
      <p>We consider the aforementioned edit scripts as all obtained via
the diff3 process sketched in Section 4.1.1. For each involved case,
the diff3 is executed on the (uncertain) arbitrary versions T1 =
Ω((F \ {e′} ∩ Ae1 ) ∪ {e1}) and T2 = Ω((F \ {e′} ∩ Ae2 ) ∪ {e2}),
and Ta = Ω(F \ {e′} ∩ As) where F is the subset of events in
V ∪ {e′} considered as valid.</p>
      <p>EXAMPLE 4.1. Figure 2 describes the process of merging two
possible versions, denoted by T1 and T2, from Figure 1 given their
common base Ta. In our proposal, this operation is simply
encompassed with the merge specified over events e3 and e4 which point
to the two input versions. On the left-hand side of the example,
we provide the versions T1, T2 and Ta together with edit scripts
{u2, u4} and {u3} that led to them from the base Ta. Typically, we
view these scripts as given by diff functions outlined in Section 4.1.1
based on full versions. The right-hand side in Figure 2 explains the
process of merging T1 and T2 (with the merge event e′ evaluating
the uncertainty in the merge) as follows: (i) First, all the edits in the
scripts above coming with no conflicts, i.e., here only u4 are
validated for building the part of the merge (seen as an intermediate
outcome) that is certain with the existence of e′; (ii) Then,
generating the set of possible merge items by enumerating the different
possibilities with the conflicting edits u2 and u3. The two initial
possible results are obtained by propagating respectively u2 and u3
given the intermediate outcome. Such a propagation will give in the
first case a merged version that only contains the sub-tree s2, and
in the second case a merged version with the sub-tree s1
(including nodes p1 and p′1) in addition. Concretely, our merge approach
will compute the same merged documents by first considering the
input versions T1 and T2, and then by updating these with the edits
without conflicts respectively from {u3} and {u2, u4}. Finally, the
last possible content for the merge is obtained by discarding all the
conflicting edits and by combining the concurrent nodes in the base
version with the intermediate result.</p>
      <p>The uncertain merging operation as formalized above remains
however intractable since it requires to evaluate every possible
version for computing the overall merge result. Below, we propose a
more convenient way to do this merge.
4.2.2</p>
      <sec id="sec-6-1">
        <title>Merging over Probabilistic XML Encoding</title>
        <p>We efficiently present the semantics of the merge operation in
Tcmv as Algorithm 1, namely mergePrXML. Prior to a deeper
description of the proposed algorithm, we start by introducing the
notion of conflicted nodes in the PrXMLfie probabilistic encoding
given the merge of events e1 and e2.</p>
        <p>The history of edits over any specific node in Pcis encoded with
its attached formula. We base on this for detecting the conflicted
nodes. Let us set the following valuations of events in G : (i) νs
setting the events in As to true and the revision variables of all
other events to false; (ii) ν1 assigning a true value to the events in
Ae1 ∪ {e1} and a false value to the revision variables of the other
events and finally; (iii) ν2 setting the events in Ae2 ∪ {e2} to true
and all the revision variables in the remaining events to false.</p>
        <p>We first introduce the lineage of an uncertain node in the PrXMLfie
p-document.</p>
        <p>DEFINITION 4.1. (Node lineage) The lineage formula of a given
node x ∈ Pc, denoted by fie↑(x), is the propositional formula
resulting from the conjunction of the formula of this node x with the
formulas attached to all its ancestor nodes in Pc.</p>
        <p>Instead of its formula5, the lineage of a given node in the
pdocument encodes the entire history of edits, starting from the
initial event, over the path leading to this node. Given that, we can
approach the conflicted nodes in the p-document using their
lineage formulas as follows.</p>
        <p>DEFINITION 4.2. (Conflicted node) Under the general syntax
Tcmv, we say that a given x in Pcis a conflicted node with respect to
the merge implying the events e1 and e2 when its lineage satisfies
the following conditions:
1. fie↑(x) |= νs;
2. fie↑(x) 6|= ν1 (or fie↑(x) 6|= ν2) and;
3. ∃y ∈ Pc, desc(x, y): fie↑(y) 6|= νs and fie↑(y) |= ν2 (or fie↑(y)
|= ν1) where desc(x, y) means that y is a descendant of the
node x.</p>
        <p>PROPOSITION 4.1. Definition 4.2 is consistent with the
definition of conflicted nodes given in Section 4.1.1.</p>
        <p>PROOF. (Sketch) Let x in Pc be a conflicted node such that
1) fie↑(x) |= νs; 2) fie↑(x) 6|= ν1; 3) fie↑(y) 6|= νs and fie↑(y) |= ν2
with desc(x, y) true. The relation 1) yields x ∈ νs(Pc) which is a
document corresponding to the common lowest ancestor of the
versions ν1(Pc) and ν2(Pc). The relations 2) means that x 6∈ ν1(Pc),
i.e., in the history of edits that gave ν1(Pc) from νs(Pc) there was
at least a deletion u2 over the node x. This is implied by the way
updPrXML() proceeds. Besides that, 3) enables us to write in one
side x ∈ ν2(Pc) since y ∈ ν2(Pc) and on another side y 6∈ νs(Pc).
As a result, in the history of edits that led to ν2(Pc) from νs(Pc)
there was an insertion u4 which added y as a child of x. In other
words, u2 and u4 define two conflicted edits performed on the same
node x.</p>
        <p>A conflicted node in Pc results in conflicting descendants. We
refer to the conflicted set of nodes in Pc according to the merge of
events e1 and e2 as the restriction Pc|C{e1, e2} . Under this, we infer
below the non-conflicted set of nodes.</p>
        <p>DEFINITION 4.3. (Non-conflicted node) For the merge of events
e1 and e2, we define a non-conflicted node x as a node in Pc\
Pc|C{e1, e2} having a formula fie(x) satisfying one of the following
conditions.
5The formula just describes the semantics of edits from the event
where the node was inserted for the fist time.
s2
p2
t2
Pc\ Pc|C{e1, e2} is consistent with one of the following formulas.
1. Vei∈Fs (ei) ∧ ¬ Vej∈(F1∪F2) (e j)
2. Vei∈F1 (ei) ∨ Vej∈F2 (e j)
5. Vei∈F1 (ei)
6. Vei∈F2 (ei)</p>
        <sec id="sec-6-1-1">
          <title>PROOF. The proof relies on Definition 4.3.</title>
          <p>Let us continue this section by first describing mergePrXML, then
by demonstrating its correctness with respect to the abstraction of
the merge operation in Section 4.2.1.</p>
          <p>Input: (G , Pc), e1, e2, e′
Output: Merging Uncertain XML Versions in Tcmv
G := G ∪ ({e′}, {(e1, e′), (e2, e′)});
foreach non-conflicted node x in Pc\ Pc|C{e1, e2} do
replace(fie(x), e1, (e1 ∨ e′));
replace(fie(x), e2, (e2 ∨ e′));
return (G , Pc)</p>
          <p>Algorithm 1: Merge Algorithm (mergePrXML)</p>
          <p>Algorithm 1 considers as inputs the probabilistic encoding (G , Pc)
of an uncertain multi-version XML document Tmv, the events e1
and e2 of G , and the new event e′ modeling both the merge
content items and the amount of uncertainty in these. Given that,
mergePrXML first updates G as specified in Section 4.2.1. Then,
the merge in Pc will result in a slight change in formulas attached
to certain non-conflicting nodes in Pc\ Pc|C{e1, e2} . The function
replace modifies such formulas by substituting all occurrences of
e1 and e2 by (e1 ∨ e′) and (e2 ∨ e′) respectively. The idea is that
each possible merge outcome, which occurs when e′ is valuated to
true regardless of the valuation of the other events, must come with
at least the non-conflicted nodes from Pc seen as valid with e1 and
e2. The remaining non-conflicted nodes, whose existence are
independent of e1 and e2, will depend uniquely on the valuation of their
ancestor events in each given valid event set including e′. At least,
the validity of a conflicting node in a merge result relies on the
probability of e1 and e2 when the event is e′ certain. If e′, together
with e1, are only valuated to true, we say that e1 is more probable
than e2 for the merge; in this case, only conflicted nodes valid with
Ae1 ∪ {e1} are chosen. The converse works in the same manner.
Any conflicted node will be rejected with a valuation setting e′ to
true and the revision variables in both e1 and e2 to false.</p>
          <p>Assume an uncertain multi-version XML document Tmv = (G , Ω)
and the corresponding probabilistic XML encoding Tcmv = (G , Pc).
In addition, let us define J.K as the semantics operator which,
applied on Tcmv, yields its correct semantics JTcmvK = (G , JPcK) such
that G is the same as in Tmv and JPcK defines the same
probability distribution over a subset of documents in D than Ω. Given a
merge operation MRGe1,e2,e′ , we now show the main result of this
paper:</p>
          <p>PROPOSITION 4.3. The definition of Algorithm 1 is correct with
respect to the semantics of the merge operation over the uncertain
multi-version XML document. In other words, the following
diagram commutes:
mergePrXML
(e1, e2, e′)</p>
          <p>Tcmv
Tcm′v</p>
          <p>J.K
J.K</p>
          <p>JTcmvK
JTcm′vK</p>
          <p>MRGe1,e2,e′</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>PROOF. Assume:</title>
          <p>(</p>
          <p>MRGe1,e2,e′ (JTcmvK) = (G ′, Ω′)</p>
          <p>Tcm′v = (G ′, Pc′) and JTcm′vK = (G ′, Ω′′)</p>
          <p>Seeing that we reach the same version space using mergePrXML
is trivial. Now, we have to show that to Ω′ will correspond JPc′K;
that is, Ω′ = Ω′′. Given each set F ⊆ V ′, five scenarios must be
checked for this equality.</p>
          <p>1. For each subset F such that e′ 6∈ F , we have Ω′(F ) = Ω(F ).
By definition, Ω(F ) = ν (Pc) where ν is a valuation setting the
special revision variable in e′ to false and the other events to an
arbitrary value. Abstracting out the formulas, we can claim that
Pc ∼ Pc′ regarding mergePrXML. Since e′ 6|= ν , the result of the
evaluation of ν over (e1 ∨ e′) and (e2 ∨ e′) (or their negation) only
depends on the truth values of e1 and e2 respectively. Thus by
replacing in formulas of Pc′ all occurrences of (e1 ∨ e′) and (e2 ∨ e′)
by e1 and e2 respectively, we are sure to build a p-document Pc′′
with ν (Pc′) = ν (Pc′′). But by the definition of mergePrXML, Pc′′
is exactly Pc. As a result, we obtain ν (Pc) = ν (Pc′). Knowing
beforehand that Ω′′(F ) = ν (Pc′), we can state that Ω′(F ) = Ω′′(F )
for any F ⊆ V ′ \ {e′}.</p>
          <p>2. For each subset F such that {e1, e2, e′} ∩ F 6= 0/, we have
Ω′(F ) = Ω(F \ {e′}). Let ν be a valuation setting all the events in
F \ {e′} to true, the revision variable in e′ to an arbitrary value and
the revision variables in the remaining events to false. Since e′ does
not occur in formulas in Pc, we can write Ω(F \ {e′}) = ν (Pc) for
sure. At this step, we resort to the logical consequences (e1 |=
ν ) ⇒ ((e1 ∨ e′) |= ν ) and (e2 |= ν ) ⇒ ((e2 ∨ e′) |= ν ) regardless
of the truth-value of the event e′. In the same way, (¬e1 6|= ν ) ⇒
(¬(e1 ∨ e′) 6|= ν ) and (¬e2 6|= ν ) ⇒ (¬(e2 ∨ e′) 6|= ν ). Therefore, by
substituting in formulas of Pc′ all occurrences of (e1 ∨ e′) and (e2 ∨
e′) by e1 and e2 respectively, we obtain the old p-document Pcwith
ν (Pc′) = ν (Pc) given the semantics of mergePrXML. Moreover,
Ω′(F ) = ν (Pc) because Ω′(F ) = Ω(F \ {e′}) and Ω(F \ {e′}) =
ν (Pc). So by inference, we can demonstrate that Ω′(F ) = Ω′′(F )
using the relations Ω′(F ) = ν (Pc), ν (Pc) = ν (Pc′) and ν (Pc′) =
Ω′′(F ) 6.</p>
          <p>3. For each subset F such that {e1, e′} ∩ F 6= 0/ and e2 6∈ F ,
we have Ω′(F ) = [Ω((F \ {e′}) \ (Ae2 \ As))]Δ2−ΔC . Let F1+ =
((F \ {e′}) \ (Ae2 \ As)) be the set of valid events excepting the
valid ancestor events of e2 in F ∩ (Ae2 \ As). For the valuation ν
setting all the events in F1+ to true and the revision variables in
the remaining events to false, Scenarios 1 and 2 enable us to write
ν (Pc) = Ω(F +</p>
          <p>1 ) and ν (Pc) = ν (Pc′). Now, we just need to show
that [ν (Pc′)]Δ2−ΔC = ν ′(Pc′) (ν ′ being the valuation that sets all
the events in F to true and the revision variables in all others to
false) to obtain the expected proof, that is, Ω′(F ) = Ω′′(F ). Let
us set Δ = Δ2 − ΔC . We distinguish the two following cases.
(a) For the class of nodes x in ν (Pc′) unmodified by Δ. That is,
x ∈ ν (Pc′) and x ∈ [ν (Pc′)]Δ. For each node x in this class, fie↑(x)
in P ′ requires the trueness of events in F + since x ∈ ν (Pc′).</p>
          <p>c 1
Moreover, by the definition of Δ, x may be either a conflicted node
or its existence is independent of the values of events in (F ∩ (Ae2 \
As)) ∪ {e2}). If x is a conflicted node, it is intuitive to see that
fie↑(x) in Pc′ is satisfied if events in F + are all set to true and
not if only those in (F ∩ Ae2 ) are set to 1true (cf. Definition 4.2.1
and updPrXML). In the another case, fie↑(x) is only function of the
values of events in F1+ for the set F (typically when fie↑(x) is
Expression 5 in Lemma 4.1 with F1 = F1+ ∩ ((Ae1 \ As) ∪ {e1})).
In both cases, we can state that fie↑(x) |= ν ′ since ν ′ similarly to ν
sets all the events in F1+ to true. As a result, we have x ∈ ν ′(Pc′)
when x belongs to this class.
(b) For the class of nodes x ∈ ν (Pc′) handled with Δ. Let x 6∈
[ν (Pc′)]Δ, that is, there is an operation u in Δ that removes x from
ν (Pc′). By the construction of Δ, it is easy to show that fie(x)
in Pc maps to Expression 3 in Lemma 4.1 with Fs = F1+ ∩ As,
F1 = F1+ ∩((Ae1 \As)∪{e1}) and F2 = (F ∩(Ae2 \As))∪{e2}.
In Pc′, this formula fie(x) is just updated across Algorithm 1 by
replacing the events e1 (the disjunction) in the first member and e2 in
the second one respectively by (e1 ∨ e′) and (e2 ∨ e′). Since ν ′ sets
all the events in F to true, therefore the first member of fie(x) will
be valuated to true while the second member will be valuated to
false. As a consequence, clearly we can state that fie(x) 6|= ν ′, i.e.,
x 6∈ ν ′(Pc′). In summary, we proven that for each node x in ν (Pc′)
deleted with Δ, x is also not chosen by ν ′ in ν ′(Pc′). Note that the
case where x does not occur in ν (Pc) is trivial and it corresponds
to a scenario in which fie(x) maps to Expression 1 in Lemma 4.1
with Fs = F1+ ∩ As, F1 = F1+ ∩ ((Ae1 \ As) ∪ {e1}) and F2 =
(F ∩ (Ae2 \ As)) ∪ {e2}. Now, let x 6∈ ν (Pc′) and x ∈ [ν (Pc′)]Δ.
That is, there is an insertion u in Δ that adds the node x as a child
of a node y in ν (Pc′). Let F2+ = (F \ {e′} ∩ Ae2 ) be the set of
all valid ancestor events of e2 in F . By the definition of Δ, we can
state that the formula fie(x) of x in Pcmaps to Expression 4 or 6 in
Lemma 4.1 with Fs = F1+ ∩ As, F1 = F1+ ∩ ((Ae1 \ As) ∪ {e1})
and F2 = (F ∩ (Ae2 \ As)) ∪ {e2}. If fie(x) is Expression 4, this
formula in Pc′ is updated by Algorithm 1 which replaces e1 in the
first member (the conjunction) and e2 in the second member by
(e1 ∨ e′) and (e2 ∨ e′) respectively. Since ν ′ sets all the events in F
to true, the first member of fie(x) will be valuated to false. As for
the second member, it will be valuated to true under ν ′ because all
the events in F2 are set to true and (e2 ∨ e′) |= ν ′. As a result, ν ′
6The last relation is due by the fact that the events e1 and e2 are
both valuated to true.
will select x in ν ′(Pc′) since fie(x) is such that fie(x) |= ν ′.
Otherwise, if fie(x) is Expression 6, it is updated in Pc′ by mergePrXML
which substitutes e2 by (e2 ∨ e′). Given that it is straightforward to
prove that fie(x) |= ν ′ since all the events in F2 are set to true and
(e2 ∨ e′) |= ν ′.</p>
          <p>Scenarios (a) and (b) demonstrate that when x ∈ [ν (Pc′)]Δ, then
x ∈ ν ′(Pc′). Similarly, the converse can be reached. We
conclude [ν (Pc′)]Δ = ν ′(Pc′) which joint with ν (Pc′) = Ω(F1+) and
ν ′(Pc′) = Ω′′(F ) yield [Ω(F1+)]Δ = Ω′′(F ). At last, the result
Ω′(F ) = Ω′′(F ) relies on Ω(F ) = [Ω(F1+)]Δ.</p>
          <p>4. For each subset F such that {e2, e′} ∩ F 6= 0/ and e1 6∈ F , we
have Ω′(F ) = [Ω((F \ {e′}) ∩ (Ae2 ∪ {e2}))]Δ1−ΔC . This scenario
is entirely symmetric to Scenario 3.</p>
          <p>5. For each subset F such that {e1, e2} ∩ F = 0/ and e′ ∈ F , we
have Ω′(F ) = [Ω((F \ {e′}) \ ((Ae1 \ As) ∪ (Ae2 \ As)))]Δ3−ΔC .
Let set F ′ = (F \ {e′}) \ ((Ae1 \ As) ∪ (Ae2 \ As)). Given a
valuation ν setting all the events in F ′ to true and the revision
variables in all other events to false, we can write Ω(F ′) = ν (Pc) and
ν (Pc) = ν (Pc′) according to Scenarios 1 and 2. Similarly to
Scenario 3, we have now to demonstrate that [ν (Pc′)]Δ3−ΔC = ν ′(Pc′)
(where ν ′ is the valuation setting all the events in F to true and the
revision variables of all the remaining events to false) to obtain that
[Ω(F ′)]Δ3−ΔC = ν ′(Pc′). This result will be sufficient for stating
that Ω′(F ) = Ω′′(F ) since by definition Ω′(F ) = [Ω(F ′)]Δ3−ΔC
and ν ′(Pc′) = Ω′′(F ). Let us consider Δ = Δ3 − ΔC . For the
proof, the intuition here is to see that by implementation Δ can be
rewritten as (Δ1 − ΔC ) ∪ (Δ2 − ΔC ) on one side. So, if we
arrive to show that for all the nodes x, y such that x ∈ [ν (Pc′)]Δ1−ΔC</p>
          <p>C
and y ∈ [ν (Pc′)]Δ2−Δ , then x ∈ ν ′(Pc′) and y ∈ ν ′(Pc′), we can
trivially deduct that for each node x ∈ [ν (Pc)]Δ, then x ∈ ν ′(Pc′).
These relations can be proven in the same spirit as in Scenario 3 by
just noting that:
(a) For the set of unmodified nodes x in ν (Pc′) with Δ, fie(x) in
Pc is compatible to Expression 1 in Lemma 4.1 with F1 =
F ′ ∩As, F1 = (F ∩(Ae1 \As))∪{e1}, and F1 = (F ∩(Ae2 \
As)) ∪ {e2}.
(b) For the set of nodes x in ν (Pc′) handled with Δ. If the operation
comes from (Δ1 − ΔC ), fie(x) is compatible to Expression 3 or
5 in case of insertions and this formula maps to Expression 4
for deletions. Concerning operations in (Δ2 − ΔC ), fie(x) is
compatible to Expression 4 or 6 for insertions and to
Expression 3 for deletions.</p>
          <p>Furthermore, the inclusion in the opposite direction, that is for
each node x ∈ ν ′(Pc′), then x ∈ [ν (Pc′)]Δ, is obvious for nodes
unchanged by Δ. For the other nodes, we only need to verify that
they are correctly handled by Δ in ν (Pc′) depending whether it
corresponds to an addition or a deletion in this document. Following
that, we can state that Ω′(F ) = Ω′′(F ) holds for each subset F
of events in V that contains e′ but not e1 and e2.</p>
        </sec>
        <sec id="sec-6-1-3">
          <title>We conclude by showing efficiency of mergePrXML.</title>
          <p>PROPOSITION 4.4. mergePrXML performs the merge over the
encoding of any uncertain multi-version XML document in time
proportional to the size of the formulas of nodes impacted by the
updates in merged branches.</p>
          <p>PROOF. (Sketch) The intuition behind the time complexity is
that mergePrXML results in a constant-time update of DAG G , firstly.
Secondly, by viewing Pc as an amortized hash table, the
algorithm retrieves any node impacted by a (non-conflicting) update in
constant time. Finally, the replace method only depends on the
lengths of formulas.
5.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION</title>
      <p>
        We presented in this paper a merge operation, as well as an
efficient mapping algorithm, that supplements our uncertain
multiversion XML framework presented in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The merging
mechanism proposed covers common deterministic XML merge
scenarios while managing uncertain data. Example applications include
the merging of documents in uncertain version control platforms
such as Wikipedia.
      </p>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This work was partially supported by the Île-de-France regional
DROD project, and the French government under the STIC-Asia
program, CCIPX project. We would like to thank the anonymous
reviewers for their valuable suggestions on improving this paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Abdessalem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Ba</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Senellart</surname>
          </string-name>
          .
          <article-title>A probabilistic XML merging tool</article-title>
          .
          <source>In Proc. EDBT</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Abdessalem</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Senellart</surname>
          </string-name>
          .
          <article-title>Uncertain version control in open collaborative editing of tree-structured documents</article-title>
          .
          <source>In Proc. DocEng</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Cobena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Abdessalem</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hinnach</surname>
          </string-name>
          .
          <article-title>A comparative study for XML change detection</article-title>
          .
          <source>In BDA</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Collins-Sussman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. W.</given-names>
            <surname>Fitzpatrick</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Pilato. Version Control with Subversion. O'Reilly Media</surname>
          </string-name>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Estublier</surname>
          </string-name>
          .
          <article-title>Software configuration management: A Roadmap</article-title>
          .
          <source>In Proc. ICSE</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Fontaine</surname>
          </string-name>
          .
          <article-title>Merging XML files: A new approach providing intelligent merge of XML data sets</article-title>
          .
          <source>In Proc. XML Europe</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Khanna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kunal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Pierce</surname>
          </string-name>
          .
          <article-title>A formal investigation of Diff3</article-title>
          .
          <source>In Proc. FSTTCS</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Nutt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Senellart</surname>
          </string-name>
          .
          <article-title>Updating Probabilistic XML</article-title>
          .
          <source>In Proc. Updates in XML</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kimelfeld</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Senellart</surname>
          </string-name>
          .
          <article-title>Probabilistic XML: Models and Complexity</article-title>
          .
          <source>In Advances in Probabilistic Databases for Uncertain Information Management</source>
          . Springer-Verlag,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lindholm</surname>
          </string-name>
          .
          <article-title>A three-way merge for XML documents</article-title>
          .
          <source>In Proc. DocEng</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ma</surname>
          </string-name>
          , W. Liu,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hunter</surname>
          </string-name>
          , and
          <string-name>
            <surname>W. Zhang.</surname>
          </string-name>
          <article-title>An XML Based Framework for Merging Incomplete and Inconsistent Statistical Information from Clinical Trials</article-title>
          . In Z. Ma and L. Yan, editors,
          <source>Software Computing in XML Data Management</source>
          . Springer-Verlag,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Peters</surname>
          </string-name>
          .
          <article-title>Change detection in XML trees: a survey</article-title>
          .
          <source>In TSIT Conference</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Suzuki</surname>
          </string-name>
          .
          <article-title>A Structural Merging Algorithm for XML Documents</article-title>
          .
          <source>In Proc. ICWI</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>