<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Consensus-based Techniques for Range-task Resolution in Crowdsourcing Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lorenzo Genta</string-name>
          <email>genta@di.unimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alfio Ferrara</string-name>
          <email>ferrara@di.unimi.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Montanelli</string-name>
          <email>montanelli@di.unimi.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Informatica, Università degli Studi di Milano</institution>
          ,
          <addr-line>Via Comelico 39, 20135 - Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dipartimento di Informatica, Università degli Studi di Milano</institution>
          ,
          <addr-line>Via Comelico 39, 20135 - Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Dipartimento di Informatica, Università degli Studi di Milano</institution>
          ,
          <addr-line>Via Comelico 39, 20135 - Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In crowdsourcing, a range task is a type of creation task where only free answers belonging to the numeric domain are accepted/possible. In this paper, we present the median-onagreement (ma) techniques based on statistical and consensusbased mechanisms for determining the result of range tasks. The ma techniques are characterized by i) the distinction between the group of workers that agree (i.e., workers in the consensus) on the task result from the group that disagree, and ii) the calculation of the nal task answer through a median-based mechanism where only answers of workers in the consensus are considered.</p>
      </abstract>
      <kwd-group>
        <kwd>crowdsourcing</kwd>
        <kwd>consensus evaluation</kwd>
        <kwd>range task management</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        In the recent years, crowdsourcing systems have gained
growing popularity as powerful solutions for addressing the
execution of complex, time-consuming activities where the
contribution of human workers can be decisive and the use
of automatic procedures is not completely e ective, such as
for example collaborative ltering and web-resource tagging.
Usually, in this kind of systems, crowd workers are involved
in decision tasks where they are called to select the most
appropriate answer among a set of prede ned alternatives
(e.g., [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). In a conventional scenario, multiple workers
participate to the execution of a task, thus multiple answers
are collected and the nal result is derived by assessing the
level of agreement between the di erent answers and by
deciding if a consensus has been reached [
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ]. The use of
crowdsourcing systems is now being proposed also for the
resolution of the so-called creation tasks, in which the task
answer can be any kind of worker-generated content like for
example a free text answer as well as a drawing or another
visual/multimedia artifact. This task type enables the worker
2017, Copyright is with the authors. Published in the Workshop
Proceedings of the EDBT/ICDT 2017 Joint Conference (March 21, 2017, Venice,
Italy) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is
permitted under the terms of the Creative Commons license CC-by-nc-nd
4.0
to express her/his creativity, thus enabling crowdsourcing to
become a mechanism for collaborative knowledge creation.
However, in creation tasks, the problem of choosing the nal
task result among all the available worker answers is even
more challenging than for decision tasks, especially when
the task question is intrinsically subjective and a factual
answer is not possible nor appropriate (e.g., a labeling task in
which the worker is called to provide a featuring keyword
for a group of web images).
      </p>
      <p>
        In this paper, we focus on range tasks, namely a type
of creation task where only free answers belonging to the
numeric domain are accepted/possible [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We propose the
median-on-agreement (ma) techniques based on statistical
and consensus-based mechanisms. In particular, the ma
techniques are conceived to address range task resolution
when multiple crowd workers are involved in the execution
of each task. Each worker autonomously and independently
executes a task, thus a number of di erent answers is
produced. Based on these answers, the ma techniques allow i)
to distinguish the group of workers that agree (i.e.,
workers in the consensus) on the task result from the group that
disagree, and ii) to calculate the nal task answer through
a median-based mechanism where only answers of workers
in the consensus are considered. The application of the ma
techniques to the Argo crowdsourcing system is presented as
well as experimental results against the main state-of-the-art
approaches for range task resolution.
      </p>
      <p>The paper is organized as follows. In Section 2, we
illustrate motivations and related work. The ma techniques are
presented in Section 3. In Section 4, the application of ma
to Argo is discussed. In Section 5, experimental results on
a real crowdsourcing case-study are presented. Concluding
remarks are provided in Section 6.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>MOTIVATING SCENARIO</title>
      <p>
        Consider the scenario described in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] where the use of
a crowdsourcing approach is proposed for estimating the
amount of calories in a meal. In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], a task is
characterized by a picture of a dish and a worker receiving a task to
execute is asked to insert a numeric value corresponding to
her/his calorie estimation based on the given picture.
      </p>
      <p>
        This is an example of a range task, in that a worker
receiving a task to execute can only provide a free numeric
answer, namely integer or decimal value, based on her/his
personal point-of-view, knowledge, perception, and
expertise. This means that no prede ned options/suggestions are
available and workers are called to independently and
autonomously provide her/his own task answer. Moreover,
the real amount of calories in a dish (i.e., in a task) is not
available/known and only a collective answer is possible [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
This means that crowdsourcing has the goal to provide a
result that represents the so-called \wisdom of the crowd",
in which the reliability of a task result is determined by its
credibility: the more the consensus among workers on an
answer is high, the more the answer reliability is high.
      </p>
      <p>
        An intuitive and popular solution for range task resolution
is to employ a mean-based approach in which multiple
workers are involved in the execution of each task and the
arithmetic mean of the whole set of worker answers is provided
as nal result [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The main drawbacks of a mean-based
approach are illustrated by Francis Galton in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] where the use
of arithmetic mean for computing the result of a range task
is deprecated, since it
would give a voting power to \cranks" in
proportion to their crankiness. One absurdly large or
small estimate would leave a greater impress on
the result than one of reasonable amount, and
the more an estimate diverges from the bulk of
the rest, the more in uence would it exert.
      </p>
      <p>In other words, the numeric answer of a single worker that
diverges (i.e., it is very di erent) from the other
more-orless equivalent worker answers has a strong in uence on the
nal task result. This means that a single worker can
autodetermine her/his impact on the task result independently
from her/his trustworthiness. This is especially true when
the group of workers involved in a task execution is small
(i.e., 5-10 workers per group) and malicious or inaccurate
workers can be involved as usually occurs in real systems.</p>
      <p>
        Further work on resolution of range tasks are presented
in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This contribution is in the eld of QoE (Quality of
Experience) where workers are asked to provide an
evaluation of their experience with a service (e.g., web browsing,
phone call, TV broadcast). The authors propose a
technique called CrowdMOS (i.e., Crowd sourcing M ean Opinion
S core) based on the analysis of the answer distribution
provided by workers. The high subjectivity/uncertainty of
considered tasks motivates the use of a random-e ects model
for determining the task result. However, only random
variables based on a normal distribution (i.e., a symmetric
distribution) can be used for representing errors, thus other
statistical distributions are not supported.
      </p>
      <p>In the following, we propose consensus-based techniques
for managing range task resolution based on two main
contributions. First, use of the median value (instead of the
arithmetic mean) to determine the task result which is
representative of the multiple answers collected from the
involved workers. Second, use of consensus as a mechanism
for distinguishing workers that agree on the task result from
workers that disagree and represent a sort of outlier position.</p>
    </sec>
    <sec id="sec-3">
      <title>THE MEDIAN-ON-AGREEMENT TECH</title>
    </sec>
    <sec id="sec-4">
      <title>NIQUES</title>
      <p>Consider a range task T assigned to a group of workers
G = fw1; : : : ; wng providing a set of answers A = fa1; : : : ; ang
where ak 2 A is the numeric answer provided by the worker
wk 2 G. Range task resolution according to the ma
techniques is articulated in two main steps: identi cation of the
support group and de nition of the nal task result described
in the following.</p>
      <p>Identi cation of the support group. We call GCA1 G
the support group of G, namely the group of workers that
agree on the task result. Two workers agree on the task
result when they provide a similar numeric answer,
meaning that the values provided in the task answer are near in
comparison with the overall range of answers A provided
by all the workers in G. We call ACA1 A the set of task
answers provided by the workers in GCA1 . Consider the
median value mA of all the provided worker answers A. The
group GCA1 is progressively built by including workers that
provided an answer close to mA, namely:
1. Compute the median mA over the whole set of worker
answers A and de ne GCA1 = ;, ACA1 = ;.
2. Select the worker answer ak 2 A which is nearest to
mA. Insert ak in ACA1 and insert the worker wk in
the support group GCA1 .
3. The coe cient of variation cv is exploited to decide
whether an answer ak 2 A is near enough to mA for
being included in GCA1 . To this end, cv is calculated
over the set of answers in ACA1 :
cv(ACA1 ) =
r</p>
      <p>1
jACA1 j</p>
      <p>PjiA=C1A1 j ai</p>
      <p>ACA1</p>
      <p>2
ACA1
where jACA1 j is the number of answers in ACA1 , ai
represents the ith worker answer in ACA1 , and ACA1
represents the arithmetic mean of the answers in ACA1 .
4. The insertion of workers in GCA1 is repeated until
the coe cient of variation over the answers ACA1 is
lower than a threshold thcv (i.e., go back to step 2
if cv(ACA1 ) &lt; thcv). Otherwise, remove the
lastinserted item from GCA1 and ACA1 and continue with
the next step.
5. Create the set GCA2 = G n GCA1 containing the
workers that are not in the support group. Analogously,
the set ACA2 = A n ACA1 is created as well.</p>
      <p>De nition of the nal task result. The nal task result
A is de ned as the median value calculated over the set of
worker answers ACA1 , namely A = mACA1 .</p>
      <p>Example. Consider a task T1 where workers are asked to
guess the distance between the two Italian cities Caserta and
Siena in kilometers (the real distance is 352 Km). Consider
the following set of worker answers: A = f300, 300, 301, 301,
350, 351, 351, 351, 351, 400, 408, 408, 450, 500, 600, 600,
600, 650, 700, 1500g. The median value over the whole set of
worker answers mA = 404. According to ma, we consider a
threshold for the coe cient of variation thcv = 0:15 and we
identify the support group GCA1 shown in Figure 1. With
this support group, the median value of the answers provided
by workers in the support group is returned as nal task
result: A = mACA1 = 351.</p>
    </sec>
    <sec id="sec-5">
      <title>APPLICATION TO THE ARGO SYSTEM</title>
      <p>
        The ma techniques have been implemented in the Argo
crowdsourcing platform (http://island.ricerca.di.unimi.it/projects/
argo/ (Italian language)). In Argo, range task resolution is
enforced through consensus-based evaluation techniques and
trustworthiness-based worker management by relying on our
experience and research results in this eld [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Consensus-based evaluation of range tasks. For
consensus evaluation, Argo employs a weighted-voting
mechanism called supermajority where the answer of a worker
wk has a weight corresponding to her/his trustworthiness.
Supermajority is based on the veri cation of two di
erent constraints called quorum-constraint (q) and
balance-ofpower constraint (bop). The q-constraint veri es that the
task result A is supported by a group of workers GCA1 with
enough weight (i.e., trustworthiness) for satisfying a given
quorum q 2 [0:51; 1]. The bop-constraint veri es that a
single worker cannot shift the majority from one answer to
another one just by changing her own task answer [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This
means that the support group GCA1 still satis es the
qconstraint even if a worker is shifted from GCA1 to GCA2 .
A task is committed on the task result A when the
supermajority constraints are satis ed (i.e., consensus is veri ed).
On the opposite, when supermajority constraints are not
satis ed, the task remains uncommitted. In this case, the
task should be re-executed or considered as failed.
      </p>
      <p>Trustworthiness-based worker management. The
Argo system aims at keeping into account not only the mere
e ort workers spent in executing tasks, but also the quality
of the e ort provided. A worker W is characterized by a
worker score W , and a worker trustworthiness W .</p>
      <p>The worker score W represents the worker revenue
composed by i) a salary, the payment the worker receives each
time she/he executes a task, regardless of the consensus
veri cation, and ii) an award, a bonus the worker receives each
time she/he contributes to commit a task.</p>
      <p>The worker trustworthiness W 2 [0; 1] is de ned to
capture the worker ability to foster the task commitment and
it is based on the worker history in executing tasks. At the
beginning of the crowdsourcing activities (time t = 0), the
worker trustworthiness W is set to an initial value W0 = 0.
Each time a task T is committed (time t + 1), the
trustworthiness of a worker W 2 G is updated. In particular, the
worker trustworthiness increases (i.e., Wt+1 &gt; Wt ) when the
worker belongs to the support group (i.e., W 2 GCA1 ), thus
con rming her/his ability to foster task commitment in the
last-executed task T . On the opposite, the worker
trustworthiness decreases when the worker is not in the support
group (i.e., W 2= GCA1 ).</p>
      <p>For evaluation of the proposed ma techniques, we consider
the geo-dis case-study for crowdsourcing the geographic
distance between pairs of Italian cities.</p>
      <p>The experiment has been executed by relying on the Argo
prototype. We collected a dataset of 120 Italian cities with
their geographic coordinates extracted from the FreeBase
(http://www.freebase.com) open repository. We built a set
of 634 tasks each one asking for the distance between a pair
of di erent cities. The experimentation on geo-dis was
conducted with a crowd of 585 workers selected in a class of
master-degree students (average worker age is 21 years old).
For task resolution, we asked the workers to rely on their
personal knowledge and we set the allowed time to perform
a task to a maximum of 15 minutes. In the experimentation,
the Argo prototype has been con gured as follows: i) initial
worker trustworthiness 0 = 0:5; ii) group size sG=20; iii)
quorum value q = 0:51; iv) the worker salary is s = 0:1 and
the worker award is a = 1.</p>
      <p>Evaluation is based on two di erent experiments over the
geo-dis case-study. The former experiment presents a
comparison of the ma techniques implemented in the Argo
system (maArgo) against other state-of-the-art techniques for
range task resolution. The latter experiment is performed
to evaluate the crowdsourcing cost of the ma techniques by
measuring the number of committed/uncommitted tasks.
Comparison against state-of-the-art techniques. We
compare maArgo against the following competitor techniques:</p>
      <p>
        Overall arithmetic mean O. This method refers to the
classical approach proposed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] where the result of a task
T is given by computing the arithmetic mean over all the
obtained answers.
      </p>
      <p>
        Outlier-cleaned arithmetic mean - Standard Deviation 2SD.
This method consists in applying a classical outlier removal
technique based on the standard deviation (2SD) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to the
set of answers of a task T . After removal of the outliers,
the arithmetic mean is nally computed over the remaining
answers.
      </p>
      <p>
        Outlier-cleaned arithmetic mean - Median Rule MR. This
method consists in applying a more recent outlier removal
technique based on the median rule [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to the set of answers
of a task T . After removal of the outliers, the arithmetic
mean is nally computed over the remaining answers.
      </p>
      <p>Overall median mO. This method consists in
computing the result of a task T as the median value of all the
provided answers. As far as we know, state-of-the-art
techniques based on the median value are not provided.
However, we compare maArgo against mO since this is the natural
baseline for our ma techniques.</p>
      <p>In the evaluation, we consider maArgo under three
congurations characterized by di erent thresholds for the
coe cient of variation thcv. Results are evaluated through
average-error and average-error with outlier-removal
mechanisms. In the average error mechanism, for each task T ,
the evaluation considers the error between the distance
estimation in the crowdsourcing result A and the real distance
between the two cities contained in T . The average error A
is calculated as:</p>
      <p>A =</p>
      <p>Pii==1n jAi</p>
      <p>
        Rij
jT j
where n = jT j is the overall number of tasks, Ai is the
crowdsourcing result of the task Ti, Ri is the real distance
between the pair of cities in the task Ti calculated through the
geodesic distance. In the average-error with outlier-removal,
the error evaluation follows the same approach of A
calculation, but outliers are removed according to the conventional
criterion based on standard-deviation (2SD) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>The results of this experiment are presented in Table 1.
The rst consideration that has to be done is about the
re</p>
      <p>O
2SD
MR
mO
maArgo (thcv = 0:25)
maArgo (thcv = 0:15)
maArgo (thcv = 0:05)
sult of the O technique. The fact that the obtained average
error is so high is mainly due to the presence of malicious
workers in a very high number of groups. These malicious
workers gave completely wrong answers (e.g., 10 millions
kilometers as distance between Rome and Milan) that have
a very serious impact on the task result when the arithmetic
mean is considered and outlier removal is not performed.
We note that the median-based techniques (i.e., mO and
maArgo) provide better results than the techniques based
on the standard deviation. We argue that this is due to
the assumption of symmetric distribution used in O, 2SD,
and MR, which is is usually false (e.g., see for example
the task presented in Figure 1). As a general remark, we
observe that the median-based solutions provide better
results than mean-based techniques even without the
outlierremoval phase. By considering the maArgo results with the
di erent thresholds on the coe cient of variation, we note
that the lower is the threshold thcv, the lower is the average
error . This means that a more restrictive mechanism for
determining the support group GCA1 increases the accuracy
of obtained results.</p>
      <p>Analysis on the task commitment We observed that
a low value of thcv produces a low average error .
However, on the opposite, a low value of thcv also produces a
high number of uncommitted tasks, and thus high expenses
for crowdsourcing execution. For this reason, in this
experiment, we analyze the number of committed tasks when
di erent thresholds on the coe cient of variation are
considered. To this end, we de ne the commitment ratio as
follows:
c =</p>
      <p>Nc</p>
      <p>Nc + Nu
where Nc is the number of committed tasks and Nu is the
number of uncommitted tasks.</p>
      <p>The commitment ratio for di erent coe cient of variation
thresholds thcv are presented in Table 2. We note that the
lower is the coe cient of variation threshold, the lower is
the value of commitment. This behavior is motivated by
the fact that the lower is the coe cient of variation, the
more restrictive is the mechanism for determining the
support group GCA1 . As a result, it is important to con gure
the crowdsourcing execution by tuning the threshold thcv
with the goal to set the desired tradeo between accuracy
of results and commitment ratio. In the geo-dis case study,
the threshold value thcv = 0:15 provides the best tradeo
between accuracy (i.e., almost twice value on accuracy with
respect to the other threshold values) and commitment ratio
(i.e., c 90 %).</p>
    </sec>
    <sec id="sec-6">
      <title>CONCLUDING REMARKS</title>
      <p>In this paper, we presented the ma techniques for range
task resolution in crowdsourcing systems. Application to
the Argo system as well as experimental results on a real
case-study are provided to show the contribution of the
proposed solution with respect to the state-of-the-art. Ongoing
work are focused on the so-called task routing problem with
the goal to specify a family of con guration patterns for
dynamically choosing the most appropriate group of workers
that can be selected for assignment of a given task to be
executed based on worker expertise and knowledge.
7.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bozzon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brambilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ceri</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. Mauri. Reactive</given-names>
            <surname>Crowdsourcing</surname>
          </string-name>
          .
          <source>In Proc. of the 22nd Int. World Wide Web Conference (WWW</source>
          <year>2013</year>
          ), pages
          <fpage>153</fpage>
          {
          <fpage>164</fpage>
          , Rio de Janeiro, Brazil,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Carling</surname>
          </string-name>
          .
          <source>Resistant Outlier Rules and the Non-Gaussian Case. Computational Statistics &amp; Data Analysis</source>
          ,
          <volume>33</volume>
          (
          <issue>3</issue>
          ):
          <volume>249</volume>
          {
          <fpage>258</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Castano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Genta</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Montanelli</surname>
          </string-name>
          .
          <article-title>Combining Crowd Consensus and User Trustworthiness for Managing Collective Tasks</article-title>
          .
          <source>Future Generation Computer Systems</source>
          ,
          <volume>54</volume>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Galton</surname>
          </string-name>
          . One Vote,
          <string-name>
            <given-names>One</given-names>
            <surname>Value</surname>
          </string-name>
          .
          <source>Nature</source>
          ,
          <volume>75</volume>
          :
          <fpage>414</fpage>
          ,
          <year>1907</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T. W.</given-names>
            <surname>Malone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Laubacher</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Dellarocas</surname>
          </string-name>
          .
          <article-title>The Collective Intelligence Genome</article-title>
          . IEEE Engineering Management Review,
          <volume>38</volume>
          (
          <issue>3</issue>
          ),
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Noronha</surname>
          </string-name>
          , E. Hysen,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. Z.</given-names>
            <surname>Gajos</surname>
          </string-name>
          . Platemate:
          <article-title>Crowdsourcing Nutritional Analysis from Food Photographs</article-title>
          .
          <source>In Proc. of the 24th symposium on User Interface Software and Technology</source>
          , pages
          <volume>1</volume>
          {
          <fpage>12</fpage>
          ,
          <string-name>
            <surname>Santa</surname>
            <given-names>Barbara</given-names>
          </string-name>
          , CA, USA,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F. P.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A. F.</given-names>
            <surname>Flor</surname>
          </string-name>
          ^encio, C. Zhang, and
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Seltzer</surname>
          </string-name>
          .
          <article-title>CROWDMOS: An Approach for Crowdsourcing Mean Opinion Score Studies</article-title>
          .
          <source>In Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing</source>
          , pages
          <volume>2416</volume>
          {
          <fpage>2419</fpage>
          , Prague, Czech Republic,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Seo</surname>
          </string-name>
          .
          <article-title>A Review and Comparison of Methods for Detecting Outliers in Univariate Data Sets</article-title>
          .
          <source>PhD thesis</source>
          , University of Pittsburgh, Pennsylvania, USA,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rampalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          . Chimera:
          <article-title>Large-scale Classi cation Using Machine Learning, Rules, and Crowdsourcing</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          ,
          <volume>7</volume>
          (
          <issue>13</issue>
          ),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Surowiecki</surname>
          </string-name>
          . The Wisdom of Crowds. Random
          <string-name>
            <surname>House</surname>
            <given-names>LLC</given-names>
          </string-name>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>