<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data fusion with source authority and multiple truth (Discussion Paper)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Politecnico di Milano</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>name.surnameg@polimi.it</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>The abundance of data available on the Web makes more and more probable the case of nding that di erent sources contain (partially or completely) di erent values for the same item. Data Fusion is the relevant problem of discovering the true values of a data item when two entities representing it have been found and their values are di erent. Recent studies have shown that when, for nding the true value of an object, we rely only on majority voting, results may be wrong for up to 30% of the data items, since false values are spread very easily because data sources frequently copy from one another. Therefore, the problem must be solved by assessing the quality of the sources and giving more importance to the values coming from trusted sources. State-of-the-art Data Fusion systems de ne source trustworthiness on the basis of the accuracy of the provided values and on the dependence on other sources. In this paper we propose an improved algorithm for Data Fusion, that extends existing methods based on accuracy and correlation between sources by taking into account also source authority, de ned on the basis of the knowledge of which sources copy from which ones. Our method has been designed to work well also in the multi-truth case, that is, when a data item can also have multiple true values. Preliminary experimental results on a multi-truth real-world dataset show that our algorithm outperforms previous state-of-the-art approaches.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The massive use of user-generated content, the Internet of Things and the
tendency to transform every real-world interaction into digital data have lead to
the problem of how to make sense of the huge mass of data available nowadays.
In this context, not only a source can store a previously unimaginable amount
of data, but also the number of sources that can provide information relevant
for a query increases dramatically, even in very speci c contexts.</p>
      <p>
        With all these con icting data available on the web, discovering their true
values is of primary importance. The solution of this problem is Data Fusion,
where the true value of each data item is decided. Redundancy per se is not
enough, since it has been shown in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] that, if we rely only on majority vote, we
could get wrong results even in 30% of the times. In order to get more accurate
results we propose a Bayesian approach able to evaluate source quality.
      </p>
      <p>
        Data fusion algorithms can be divided into two sub-classes: single-truth and
multi-truth, the latter denoting the case when a data item may have multiple
Copyright c 2019 for the individual papers by the papers authors. Copying
permitted for private and academic purposes. This volume is published and copyrighted by
its editors. SEBD 2019, June 16-19, 2019, Castiglione della Pescaia, Italy.
true values. Such scenarios are common in everyday life, where many actors can
play in a movie or a book can have many authors, like Alice's book \Foundations
of Databases"[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] by Serge Abiteboul, Rick Hull and Victor Vianu. We decided
to design our model to work also in the multi-truth case.
      </p>
      <p>
        Currently, many single-truth data fusion algorithms exist in literature, and
a few of them exploit Bayesian inference to estimate the veracity of each value
and the trustworthiness of sources. TruthFinder [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] applies Bayesian analysis
to compute the probability of a value being true, conditioned to the observation
of values provided by the sources. Accu[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] applies a Bayesian iterative approach
to compute the veracity of values, assuming uniform distribution of false values
for each data item and source independence. These two assumptions have been
relaxed by PopAccu[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and AccuCopy[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] respectively.
      </p>
      <p>
        Less attention has been devoted to studying the problem of multi-truth
nding: to our knowledge, only three algorithms try to solve it. MBM[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] approaches
multi-truth data fusion with a model that focuses on mappings and relations
between sources and sets of provided values, introducing also a copy-detection
phase to discover dependencies between sources. DART[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] computes, for each
source, a domain expertise score relative to the domains of input data. This
score is used in a Bayesian inference process to model source trustworthiness
and value con dence; sources are assumed to be independent. LTM[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] exploits
probabilistic graphical models to nd all the true values claimed for each data
item.
      </p>
      <p>State of the art systems on Data Fusion de ne source trustworthiness based
on the accuracy of the provided values and on the dependence on other sources.
In this paper we propose an improved algorithm for Data Fusion. Our method
extends existing methods based on accuracy and correlation between sources
taking also into consideration the authority of sources. Authoritative sources
are de ned as the ones that have been copied by many sources: the key idea
is that, when source administrators decide to copy data, they will choose the
sources that they perceive as most trustworthy.</p>
      <p>To summarize, in this paper we make the following contributions:
{ We present a new formula for domain-aware copy detection with the goal
of determine the probability that source Si copies, from source Sj , data
items belonging to a speci c domain. Our copy detection process exploits the
domain expertise of the sources and can also assign di erent probabilities to
the two directions of copying.
{ An urgent need of the truth discovery process is to determine what sources
we can trust. We present a fully unsupervised algorithm that can assign an
authority score to each source for each domain. This process is based on the
natural habit of choosing, to copy a missing value, the source that provides
the correct value with the highest probability - in other words, the most
authoritative one.
{ We present an improved algorithm for assessing values' veracity in a
multitruth discovery process, exploiting source authority in copy detection,
positively rewarding sources according to their authority.</p>
      <p>In Section 2 we present preliminary information, Section 3 provides the details
about our approach and in Section 4 we show the experimental results.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background and Preliminaries</title>
      <p>We now present more in details two methods that have been of great importance
for our work.</p>
      <p>DART. This algorithm exploits an iterative domain-aware Bayesian approach to
do multi-truth discovery over a dataset composed starting from di erent sources.
Its key intuition is that, in general, a source may have di erent quality of data for
di erent domains. For each source, they de ne the domain expertise score edi (s),
measuring the source's experience in a given domain, and assign a con dence
cso(v) to each value v provided by a source s, re ecting how much s is convinced
that the value v is (part of) the correct value(s) for object o.</p>
      <p>The veracity o(v) of value v for object o is the probability that v is a true
value of o, which is better estimated at each iteration of the discovery process.
The goal of the DART algorithm is to evaluate the probability that a value v is true
given the observation of the claimed data (o) (i.e. P (vj (o))). Being P ( (o)jv)
and P ( (o)jv) the probabilities of having the observation (o) when v is true or
false respectively, Bayesian inference can be used to express P (vj (o)) as shown
below:</p>
      <p>P (vj (o)) =</p>
      <p>P ( (o)jv)P (v)</p>
      <p>P ( (o))
=</p>
      <p>P ( (o)jv) o(v)
P ( (o)jv) o(v) + P ( (o)jv)(1
(1)</p>
      <p>Our main criticism to DART is the assumption that sources are independent,
which is a clear oversimpli cation of the real world. We will explain how we have
relaxed this assumption in the following section.
0(v))
MBM is a Bayesian algorithm for multi-truth nding that takes into consideration
also the problem of source dependence. It computes, for each source and set of
values, an independence score based on the values provided by all the sources.
The independence score is then used to discredit, in the voting phase, sources
that don't provide their values independently.</p>
      <p>Our criticisms to MBM are the assumption that there is no mutual copying
between sources in the whole dataset and the fact that the algorithm is not able
to distinguish the direction of copying. In the following section we will describe
how we have relaxed these assumptions.</p>
      <p>Table 1 describes the notation that will be used in the following sections.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>We now present ADAM (Authority Domain Aware Multi-truth data fusion), a
method based on Bayesian inference and source authority that iteratively re nes
the probability that a provided value for a data item is true.
3.1</p>
      <sec id="sec-3-1">
        <title>Copy detection</title>
        <p>
          Starting from [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], we have
devised a domain-aware
        </p>
        <p>copy
detection algorithm to assign
di erent probabilities to the
two directions of copy. This
model works at domain
granularity, therefore it can more
accurately
approximate</p>
        <p>
          the
real world behaviour of
correlated copying [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>Scope. Given an object o and
two sources si and sj , we
denote by</p>
        <p>ioj the observation of
the common values cij (o) for
a common object o 2
Notation Description</p>
        <p>O(s) Set of all objects provided by source s
Od(s) Set of objects in domain d provided by source s
Vs(o) Set of all values claimed for object o by source s
Vs(o) Set of all values claimed for object o by sources 6= s
Sod(v) Sources that provide value v for object o in domain d
Sod(v¯) Sources that don’t provide value v for object o in domain d
ed(s) Expertise of source s in domain d
o(v) Veracity of value v for object o
⌧ drec(s) Recall of source s in domain d
⌧ dsp(s) Specificity of source s in domain d
cso(v) Confidence score of value v of object o related to source s
si ! sj Source i is copying at object level from source j
si ? sj Sources i and j are independent at object level
si ! d sj Sources i is copying from source j for domain d</p>
        <p>⇥ idj Set of common objects in domain d between sources i and j
cij(o) =: c Values provided by both sources i and j for object o
ioj =: c Observation of c</p>
        <p>(o) Observation of the values provided by object o
Ad(s) Authority of source s in domain d</p>
        <p>Table 1. Notation
idj in domain d provided by two source si and sj .
assume that two sources can3.1onClyopbyedeetitehcteiornindependent or copiers.</p>
        <p>3 Methodology
Assumptions. In our copy detection algorithm we assume that there is no mutual</p>
        <p>We now present ADAM (Authority Domain Aware Multi-truth data fusion), a
copying at domain level, i.e.m,eitfhsoodubarsceed osn1Bcaoyepsiiaensinfrfeoremncesaonudrscceousrc2e aruetghaorridtyinthgat ditoermatiavienly refine
d, then s2 can copy from st1heopnrolybabvialitlyutehsatfaorproovbidjeedcvtasluienfordaomdataai nitesmd~is6 =trude.; we also</p>
        <p>
          Starting from [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], we have derived a domain aware copy detection algorithm
Object copying. For each pcaapirabloefofsaosusirgcniensg dii;↵ejre,ntapftroebrabwilietieshtaovtehedtewo ndeirdecttiohnes otf rcuoptyh. This
probability of the group of mvaodluelewsorks at domain granularity, therefore it canalmlotrehaeccurately approximate
in c as the probability that values are
the real world behaviour of correlated copying [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
correct (Eq. 2), we can compute the likelihood of c in di erent cases of source
observation of the common values cij(o) for a common obiject o 2 c⇥opidj iienddoiojmtahine
dependence and truthfulnesSscoopef. cG.iSveinmainlaorblyjecttoo [a6n]d, wtweo ssotuartcees tshiaatndifsjs, wheadsenote by
from sj , or the other way roduprnodvi,detdhbeyntwtohseoyurcpersoi vainddesjt.he same common values c,
no matter the veracity of c (Eq. 3).
        </p>
        <p>Assumptions. In our copy detection algorithm we assume that there is no mutual
copying at domain level, i.e., if source s1 copies from source s2 regarding domain
d¯, then s(2cc)an=coYpy fro m(vs)1 only values for objects in domains d˜ 6= (d¯2;)we also
assume that two sources can only be either independent or copiers.</p>
        <p>v2c</p>
        <p>Object copying. For each pair of sources i, j, after we have defined the truth
(</p>
        <p>P ( cjsi !
P ( cjsi !</p>
        <p>
          prcobtarbuiliety) o=fthPe (groupsojf valuses ;in cc atsruthee)p=roba1bility that all the values are
sj ;correct (Equation 2), cwje ca!ncomipute the likelihood of c in di↵erent( 3ca)ses of
sj ;socurcfeadlseepe)n=denPce(andcjtsrujth!fulnseis;s ocf cf.aSlismei)lar=ly t1o [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], we state that if si has
Eq.s 4 and 5 de ne the probabilities that both sources provide the same group
of values c independently of each other, in the two cases that c is true and false.
        </p>
        <p>P ( cjs1?s2; c true) =
P ( cjs1?s2; c false) =
rec(s1)
rec(s2) [1</p>
        <p>sp(s1)] [1
sp(s1)
sp(s2) [1
rec(s1)] [1
sp(s2)]
rec(s2)]
(4)
(5)
Bayesian model. If we apply a Bayesian inference approach we can now compute
the probability of two sources being dependent or independent, and in the rst
case we can also de ne which of the two is the copier.</p>
        <p>With Y = fsi ! sj ; sj ! si; si?sj g we de ne the three possible outcomes.</p>
        <p>P (yj c) = Py02Y P ( cjy0) P (y0)</p>
        <p>P ( cjy) P (y)</p>
        <p>P (y) [P ( cjy; c true)
= Py02Y P (y0) [P ( cjy0; c true)
(c) + P ( cjy; c false) (1 (c))]
(c) + P ( cjy0; c false) (1
(c))]
We now have to nd a way to estimate the prior probability of the Bayesian
model: P (si ! sj ), P (sj ! si) and P (sj ?si), that are all the di erent
congurations of object copying between sources si and sj . We de ne this as the
probability of the two sources being independent or copiers in the domain of the
object we are considering, de ned in Eq. 11. For ease of notation we apply the
following de nition, recalling that d is the same domain of idj 3 o where o is
the object of c that we are analyzing.</p>
        <p>8&gt;P (si ! sj ) =: idj
&lt;</p>
        <p>P (sj ! si)
&gt;:P (sj ?si)
=: jdi
= 1
d
ij
d
ji
and replace Eq.s 7, 2, 3, 4 and 5 into Eq. 6, with the following result:
d
ij
P (si ! sj j c) =
idj + jdi + 1
d
ij
d
ji</p>
        <p>Pu
where</p>
        <p>Pu :=
(c) [ rec(si)
rec(sj ) (1
sp(si)) (1
sp(sj ))] +
+ (1
(v)) [ sp(si)
sp(sj ) (1
rec(si)) (1
rec(sj ))]
(6)
(7)
(8)
(9)
(10)
Non-shared values. With Eq. 8 we have expressed the probability that a source
si has copied from another source sj their common values c for object o. We
now have to take into consideration other possible non-in-common values to
opportunely compute the probability that c were really copied. We have chosen
to scale the copy probability by the Jaccard similarity of the two sets of values
of o claimed by the two sources si and sj , as shown in Eq. 10.</p>
        <p>Jij (o) = Jji(o) =</p>
        <p>Vsi (o) \ Vsj (o)</p>
        <p>Vsi (o) [ Vsj (o)
Domain-level copying. We can use the concept of copying an object o to de ne
the act of copying with respect to a domain d as de ned in Eq. 11.</p>
        <p>P si !d sj idj := Po2 idj P (si !d sj j c) Jij (o) (11)
ij
Initialization. Since in the initialization phase we have no prior knowledge of
idj , we decided to exploit the fact that sources with high expertise in domain
d are less likely to be copiers for domain d and that sources with low expertise
in d tend to copy from sources with higher expertise in d. These ideas can be
summarized in the initialization expressed in Eq. 12.</p>
        <p>idj = 1 ed (si) ed (sj ) 8si; sj 2 S ^ si 6= sj (12)</p>
        <p>ad(sj ) :=</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Source authority</title>
        <p>The key idea to de ne the authority of a source in a speci c domain with respect
to the outcomes of the copy detection process is that, if many sources copy some
values from the same source sa, it is because sa is considered authoritative and
more trustworthy. For each source sj 2 S, we de ne Cd(sj ) in Eq. 13 as the
set of all the sources that copy from source sj with probability above a given
threshold : o</p>
        <p>Cd(sj ) := nsi 2 S P (si !d sj j idj ) &gt;
(13)
Qualitatively, the unadjusted authority score of source s in domain d is how much
source s is copied in d w.r.t. how much all sources are copied in d (Eq. 14).</p>
        <p>Psi2Cd(sj) P (si !d sj j idj )
P
sk2S</p>
        <p>Psl2Cd(sk) P (sl !d skj kdl)
Note that in general the cardinality of S (i.e. the number of sources) is high and
the parameter should not be set too close to 1 to better exploit the variety of
outcomes of the copy detection process. This con guration leads to ad(s) 1.
We can accordingly apply a linear conversion to ad(s) in order to map it on
the interval [0; 1]. We denote this new score as Ad(s) or authority of source s in
domain d, computed as:</p>
        <p>ad(s)
Ad(s) := amax
d
amin</p>
        <p>d
amin</p>
        <p>d
drec(s)ed(s)cs(v)+Ad(s)</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3 Veracity</title>
        <p>We have extended the DART Bayesian inference model in order to exploit the
authority score of each source. Our key idea is to positively reward sources
according to their authority, which can be achieved with Eq.s 16 and 17, respectively.</p>
        <p>Y Y
dsp(s))ed(s)cs(v)+Ad(s)
(14)
(15)
o(v0))
(16)
drec(s))ed(s)cs(v)+Ad(s)
drec(s) =</p>
        <p>P
o2Od(s)</p>
        <p>Po2Od(s) jVs(o)j</p>
        <p>Pv2Vs(o) o(v)</p>
        <p>(18) (19)
At each iteration of the algorithm veracity scores of values are re ned, this leads
to a better estimation of copy detection and source authority, that in turn will
improve again values' veracity in the next iteration. The algorithm stops
iterating when the updates of all veracities are less then a given threshold . The
output of the algorithm is, for each object o in the dataset, a set of values whose
veracities are greater or equal to a given threshold .
dsp(s) =</p>
        <p>P
o2Od(s)</p>
        <p>P</p>
        <p>v02Vs(o)(1
Po2Od(s) Vs(o)
s2Sod(v)</p>
        <p>Y
s2Sod(v)
dsp(s)ed(s)cs(v)+Ad(s)
s2Sod(v)</p>
        <p>Y
s2Sod(v)
(1
(1
P ( (o)jv) =
P ( (o)jv) =
(17)</p>
        <p>
          In a multi-truth context, precision cannot be the only metrics for source
trustworthiness [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], but we should use recall and speci city : source recall is the
probability that true values are claimed as true (Eq. 18), while source speci city
is the probability that false values are claimed as false (Eq. 19).
We now present the result of an experimental comparison between our
algorithm,ADAM (Authority Domain Aware Multi-truth data fusion), and the original
DART in di↵erent configurations of the input data.
4.1
        </p>
        <sec id="sec-3-3-1">
          <title>Dataset</title>
          <p>We have used as input data a subset of the same book dataset that has been
4 Experimental Results uLsined, ofonre tohfetehvealauuatthioonrsooffth[8e].DOARuTr aglogaolriitshmto, dkiisncdolvyermtahdee caovrarielcatblveabluyeXsuofeltinhge
We now present the result of an expemriumltei-nttraulthcopmarpamareitseornabutehtworesenusoinugr tahlegocraittehgmor,y attribute to clusterize books
into domains.</p>
          <p>ADAM (Authority Domain Aware Multi-truth</p>
          <p>For ourdeaxtpaerfiumseionnts),waenhdavtehbeeoenriagbinleatloDuAsReTa subset of this dataset
matchin di erent con gurations of the inpiungt danaottah.er validated and trustworthy dataset considered as golden truth for
the book-authors binding. The dataset used in our experiments is composed by
4.1 Dataset 90,867 tuples from 2,680 sources and 1,958 books spanning on all the 18 domains
(i.e. categories of book genres) of the original dataset.</p>
          <p>
            We have used as input data a subset Oofurtahlegosraitmhme dboepoeknddsatoansseetvetrhaaltpahraamsebteeresn, we report in Table 2 the value
used for the evaluation of the DART aulsgeodrfiotrhmea,chkipnadralymmeteard. eWahveanilaanbliendbicyatXiounelwinasg present in [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] we use the same
Lin, one of the authors of [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. Our gporaolvwidaeds tvoaludeistcoovenesrutrheethceorcroemcptavraabluileitsyobfettwheeen the two algorithms.
multi-truth parameter authors using the category attribute to clusterize books
into domains.
          </p>
          <p>For our experiments, we have been able to use a
subset of this dataset matching another validated and
trustworthy dataset considered as golden truth for the book-authors
binding. The dataset used in our experiments is composed
by 90,867 tuples from 2,680 sources and 1,958 books,
spanning all the 18 domains (i.e. categories of book genres) of
the original dataset.</p>
          <p>
            Our algorithm depends on several parameters; Table 2
reports the value used for each of them. When an indication was present in [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ],
we used the same provided value to ensure optimal comparability between the
two algorithms.
          </p>
          <p>4.2</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>Results</title>
          <p>Parameter Value
↵ 1.5
0
0.1
⌘ 0.2
✓ 0.5
¯ 0.5
⌧¯rec 0.8
⌧¯sp 0.9
Table 2. Parameters
4.2</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>Results</title>
        <p>
          We have develop an implementation of DART algorithm following as precisely as
We have developed in Python 3.7 bpoostshible thimegpuleidmeleinnetsateixopnreossfedDAinRT[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ](afonldloowuirnegxtension ADAM both in Python
an
as precisely as possible the guideline3s.7e.xEpvreensstehdouignh[o5u])r, ianntedreostuwraesxitnendseitoenrmAinDeAMth.e impact of our extensions on
Even though our interest was in dDeAtReTrmpeirnfeor mthaenciems,pwaechtaovef aolsuor deexvetleonpseidonassimonple version of MajorityVote
DART performances, we have also deavseblaospeelidneacosmimpaprliesovne,rtsriaonnsfeorfrinMga jthoericltayssVicotsiengle-truth voting system in the
as baseline comparison. multi-truth context by considering true all the values of object o that have been
voted by at least 60% of the sources that provide a value for o.
        </p>
        <p>ADAM has its F-1 score higher than DART in the 76% of the times. Moreover in
our experiments ADAM has required strictly less iterations before convergence in
the 65% of the times with respect to DART, in some cases the number of iterations
required was less than a half. At rst sight this faster convergence might seem
to be due only to the increment of A(s) in the exponent in Eq.s 16 and 17 but
with a more precise analysis we discover that A(s) 6 0 only for a small fraction
of the sources, modeling in a correct manner the desired meaning of authority
which by de nition should be related to only a small subset of objects.</p>
        <p>We have run 37 comparison between DART, ADAM and MajorityVote using the
same input data for the three algorithms at each run, focusing on both input
regarding single and multiple domains. We particularly focus in this section on
a subset of 10 runs, reporting in Table 3 the metrics of DART and ADAM of those
runs and nally in Table 4 we aggregate the results of all 37 runs reporting the
averaged metrics of MajorityVote, DART and ADAM.</p>
        <p>Domain Records jDj jOj jSj</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5 Conclusions</title>
      <p>We presented ADAM, an improved algorithm for multi-truth data fusion. A quicker
termination and better results con rm that our idea to reward authoritative
sources has led to an increase in the algorithm performance and accuracy.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Dong</surname>
          </string-name>
          , Xin Luna and
          <string-name>
            <surname>Berti-Equille</surname>
          </string-name>
          ,
          <article-title>Laure and Srivastava, Divesh: Truth Discovery and Copying Detection in a Dynamic World</article-title>
          . VLDB (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Blanco</surname>
          </string-name>
          ,
          <article-title>Lorenzo and Crescenzi, Valter and Merialdo, Paolo and Papotti, Paolo: Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources</article-title>
          .
          <source>Advanced Information Systems Eng</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Xian and Dong, Xin Luna and Lyons, Kenneth and Meng, Weiyi and Srivastava, Divesh: Truth Finding on the Deep Web: Is the Problem Solved? CoRR (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dong</surname>
          </string-name>
          , Xin Luna and
          <string-name>
            <surname>Berti-Equille</surname>
          </string-name>
          ,
          <article-title>Laure and Srivastava, Divesh: Integrating Conicting Data: The Role of Source Dependence</article-title>
          .
          <source>VLDB</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Xueling and Chen, Lei: Domain-aware Multi-truth Discovery from Con icting Sources</article-title>
          .
          <source>VLDB</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wang</surname>
            , Xianzhi and Sheng, Quan
            <given-names>Z.</given-names>
          </string-name>
          and
          <article-title>Fang, Xiu Susie and Yao, Lina and Xu, Xiaofei and Li, Xue: An Integrated Bayesian Approach for E ective Multi-Truth Discovery</article-title>
          .
          <source>CIKM</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bo</surname>
          </string-name>
          ,
          <article-title>Zhao and Benjamin, Rubinstein and Jim, Gemmell and Jiawei, Han: A Bayesian Approach to Discovering Truth from Con icting Sources for Data Integration</article-title>
          .
          <source>CoRR</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Xiaoxin</surname>
            , Yin and Jiawei, Han and
            <given-names>Philip</given-names>
          </string-name>
          , Yu:
          <article-title>Truth Discovery with Multiple Conicting Information Providers on the Web</article-title>
          .
          <source>TKDE</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dong</surname>
          </string-name>
          ,
          <article-title>Xin Luna and Saha, Barna and Srivastava, Divesh. Less is more: Selecting sources wisely for integration</article-title>
          .
          <source>VLDB</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Abiteboul</surname>
          </string-name>
          , Serge and Hull, Richard and Vianu, Victor:
          <article-title>Foundations of databases: the logical level, Addison-</article-title>
          <string-name>
            <surname>Wesley</surname>
          </string-name>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>