<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mechanical Cheat: Spamming Schemes and Adversarial Techniques on Crowdsourcing Platforms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Djellel Eddine Difallah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianluca Demartini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philippe Cudré-Mauroux</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>eXascale Infolab U. of Fribourg-Switzerland</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>firstname.lastname}@unifr.ch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Algorithms</institution>
          ,
          <addr-line>Experimentation, Performance</addr-line>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>Crowdsourcing is becoming a valuable method for companies and researchers to complete scores of micro-tasks by means of open calls on dedicated online platforms. Crowdsourcing results remains unreliable, however, as those platforms neither convey much information about the workers' identity nor do they ensure the quality of the work done. Instead, it is the responsibility of the requester to lter out bad workers, poorly accomplished tasks, and to aggregate worker results in order to obtain a nal outcome. In this paper, we rst review techniques currently used to detect spammers and malicious workers, whether they are bots or humans randomly or semi-randomly completing tasks; then, we describe the limitations of existing techniques by proposing approaches that individuals, or groups of individuals, could use to attack a task on existing crowdsourcing platforms. We focus on crowdsourcing relevance judgements for search results as a concrete application of our techniques.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>H.3.3 [Information Storage And Retrieval]:
Information Search and Retrieval; H.3.4 [Information Storage
And Retrieval]: Systems and Software</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>Crowdsourcing is the process of indirectly employing
anonymous people over the internet, often for a nominative amount
of money, to complete concise tasks (called micro-tasks) that
are typically too complex for today's computers but
relatively simple for humans. Examples of such micro-tasks
Copyright c 2012 for the individual papers by the papers’ authors.
Copying permitted for private and academic purposes. This volume is published
and copyrighted by its editors.</p>
      <p>Cr.owdSearch 2012 workshop at WWW 2012, Lyon, France
include image annotation, relevance judgement, sentiment
analysis, language translation, etc. Currently,
crowdsoucing platforms like Amazon Mechanical Turk1 (AMT) allow
requesters to create tasks in the form of web pages, decide
on how much to pay per task, and restrict participants by
declaring lters on acceptance rate, country, etc. Once the
tasks are completed, the requester gets back results in the
form of raw les from which they are supposed to lter out
bad answers, and decide whether or not to pay for each
given answer. One particular appeal of crowdsourcing is to
complete large collections of tasks that a requester cannot
do by himself in a reasonable amount of time; hence, going
through all the response manually is also not practical.</p>
      <p>
        Crowdsourcing calls attracts di erent categories of
potential workers; a study of the demographics showed that
workers are spread over country clusters (mostly in India and
USA), age and occupation. The incentives for completing
a task varies per task type and per individuals. For more
than 50% of the Indian workers, for instance, crowdsourcing
is a primary or secondary source of income2. This obviously
negatively in uences the e orts and time taken to complete
tasks on the crowdsourcing platform, since many requesters
do not use tight quality control schemes (previous research
has shown that it is better to systematically pay for all
completed tasks rather than to risk not paying honest workers
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]).
      </p>
      <p>The focus of this paper is the increasing adoption of
crowdsoucing in a context where the workers' incentive is solely
monetary. Skimming through the tasks and hastily or
randomly lling out web forms is the simplest form of
crowdsourcing treachery. Knowledgeable individuals can go much
further and create automated programs that employ
advanced methods to complete tasks. We can for example
imagine organized groups sharing information to complete
collections of tasks faster. In the following, we show that
in the absence of strict control and monitoring mechanisms
on the crowdsourcing platforms, the requesters are reduced
to rely on manual labor, arti cial intelligence or statistical
methods to lter out potentially erroneous responses.</p>
      <p>Our nal claim is that</p>
      <p>Current crowdsourcing quality control techniques
are insu cient to counter organized groups of
workers who maliciously aim at gaining money
disregarding the quality of their completed tasks.
1https://www.mturk.com/mturk/welcome
2http://hdl.handle.net/2451/29585
2.</p>
    </sec>
    <sec id="sec-3">
      <title>MOTIVATION</title>
      <p>
        Despite many recent research e orts dedicated to result
ltering and cheaters detection, quality control methods
remain largely bound by the following factors: 1) Task type
2) Time 3) Cost 4) Participants. First, we believe that the
current techniques can hardly be generalized to any type
of tasks, especially for subjective types of tasks like review
writing and text translation. Second, in the absence of a
reputation system3 participants remain anonymous and little
information is communicated to the requesters, it is
therefore di cult to conclude that an experiment is repeatable
and to assess how many dishonest workers were involved.
Cost is also a sensitive variable: it has been shown that
paying a higher reward for a task does not lead to higher
quality but only to lower completion time (see, for example,
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). In an experiment we previously conducted [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] where we
manually judged all worker answers, we observed that more
than 75% of Indian workers achieved a precision of 50% or
less. Figure 1a shows the precision of the workers against the
number of tasks they performed. Out of the 4088 submitted
tasks, only 1 worker achieved a precision of 0.85 while
answering 288 tasks. Do the available quality control schemes
respond to the behavior of the current worker population?
In the following we will describe adversarial techniques
providing examples based on the relevance judgement scenario:
documents are shown to the user together with a keyword
query; the worker has to judge the relevance of the document
(possibly on multiple levels) with respect to the query.
      </p>
    </sec>
    <sec id="sec-4">
      <title>ANTI-ADVERSARIAL TECHNIQUES</title>
      <p>We can distinguish two categories of crowdsourcing
antiadversarial schemes: a priori cheater dissuasion and a
posteriori quality control. In the following, we summarize brie y
some of the common techniques from both perspectives, and
discuss some of their advantages and drawbacks.
3It has to be mentioned that recently AMT introduced the
concept of Masters, that is, workers which have proved to
perform well on certain task types. Anyhow, it is out of the
requester's control to decide who is a quality-worker at this
stage.
3.1</p>
      <p>
        Task design is the sole responsibility of the requester. At a
minimum, it consists of choosing the right way to formulate
the task and the right incentives. A study on the impact of
incentives was recently conducted [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and partly concluded
that crowdsourcing platforms favor monetary incentives
instead of social ones. The study also hypothesized that
explicit worker conditioning (e.g., inform the worker that
disagreement with other workers on the same task will be
punished) on top of quality control can lead to better quality
results. In [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], Kitture et al. stressed the importance of
task formulation and of having veri able results with two
variants of a given task formulated di erently; along the
same line, [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] observed that cheaters are less attracted to
\Novel tasks that involve creativity and abstract thinking".
Incentives and sophisticated task formulation form a good
barrier for cheaters but constitute a burden for the requester
who only needs the task to be done.
3.2
      </p>
    </sec>
    <sec id="sec-5">
      <title>Task Repetition and Aggregation</title>
      <p>
        For many tasks, result of the aggregation of multiple
answers from non-expert workers (the so-called wisdom of the
crowds) can be compared to the results of more expensive
expert workers as shown in several evaluations of crowdsourced
relevant judgment or other labeling experiments [
        <xref ref-type="bibr" rid="ref15 ref16 ref3">16, 15, 3</xref>
        ].
In general, and in the presence of noisy answers, the same
task is o ered multiple times to di erent workers; once all
the tasks are completed, the requester decides what answers
to pick up and how to aggregate them. The aggregation of
the nal results is a well studied topic; the most
straightforward approach in this context is to proceed with a majority
decision (e.g., [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]). The authors of [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] formalized the
majority decision approach and proposed the use of a control
group that double-checks the answers of a prior run.
      </p>
      <p>The primary goal of task repetition is to diversify the
output by asking di erent workers, which is desired and even
required for many task types; it comes however with the
price of multiplying the cost by the number of repetitions.
3.3</p>
    </sec>
    <sec id="sec-6">
      <title>Test Questions</title>
      <p>
        As part of the recommendations in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], Kittur et al. also
suggest to formulate veri able tasks. Cheaters and
nonserious workers will likely start tasks without reading the
requester directions, and then randomly click generating most
likely a wrong answer. In this case, a simple screening
process could be used [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The requester chooses K rst tasks as
quali cation test that the worker has to pass rst. Moreover,
in our recent work ZenCrowd [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] we applied a posteriori
continuous testing, where for every 10 tasks we tested worker
answers against a gold standard set. Other test
methodologies can be used throughout the experiments with classic
anti-spamming techniques like CAPTCHAs to lter out
automatic answers, for which the requester will not have to
worry about creating a test set of questions.
      </p>
      <p>Test questions are powerful traps for cheaters and
spammers, especially when they cannot be di erentiated from
regular tasks. This comes at a cost: for large amounts of
tasks however, a bigger gold standard set is needed to avoid
workers spotting recurrent questions. Moreover, test
questions should be selected carefully so that a) they do not trick
real workers and b) they are not easy for robots to answer.
3.4</p>
    </sec>
    <sec id="sec-7">
      <title>Machine Learning Filtering</title>
      <p>
        The distribution of workers and the number of tasks they
perform is usually characterized by a power law
distribution [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] where many workers do few tasks and few workers
do many tasks. The quality of aggregating the results in
such a context (e.g, with a majority decision scheme) is
selfcontained in the judgment of the task. Using machine
learning algorithms [
        <xref ref-type="bibr" rid="ref10 ref17 ref4 ref5">17, 5, 10, 4</xref>
        ] allows one to carry over some
knowledge about the workers across tasks. In our ZenCrowd
entity-linking system [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we started with a learning phase
to label workers with a con dence score to decide how to
weight the worker's answer, then used a probabilistic
network to propagate and update scores across workers and
tasks.
3.5
      </p>
    </sec>
    <sec id="sec-8">
      <title>Current AMT Techniques</title>
      <p>Requesters on AMT can already bene t from some
basic aggregation and anti-spamming features provided by the
Amazon platform. A requester on AMT is not able to
directly assign a task to a speci c worker; he can only publish
the task on the \market", and typically the rst worker who
agrees to pick the task will do it. In addition, the platform
allows requesters to add a few constraints when the task is
published, in order to help in avoiding low-quality and
malicious workers. First, requesters can lter workers based
on their previous acceptance rate: if on previously
submitted tasks a worker had an acceptance rate lower than, for
example, 95%, then he is not allowed to accept such task.
Additionally, requesters can add lters on worker location
(at the granularity of the country), number of tasks
performed so far, and add a quali cation test before the task is
assigned. After a task has been completed, requesters can
always decide not to pay for poorly performed tasks and
can even report bad workers to AMT, which may lead to
the worker account being suspended. Naturally, this can
only happen after poor results and/or malicious workers are
detected. Such simple features can be su cient for simple
adversarial techniques, but not for organized group attacks
as explained in the next section.</p>
    </sec>
    <sec id="sec-9">
      <title>ADVERSARIAL TECHNIQUES</title>
      <p>This section presents an overview of possible attacks to
crowdsourcing platforms that we envision. We de ne a
dishonest answer in a crowdsourcing context as a task that has
been either: i) randomly posted ii) arti cially generated or
iii) duplicated from another source. We di erentiate
spamming attacks by the level of collaboration used to generate
dishonest answers. Individual attackers try to proceed as
fast as possible to earn additional money but are more likely
to run into easy test traps; group attacks are more organized
and exploit the repeatability of a task to build knowledge,
and hence become more di cult to detect. In the following,
and without loss of generality, we use the scenario where
workers are asked to judge the relevance of search results
given a keyword query.
4.1</p>
    </sec>
    <sec id="sec-10">
      <title>Individual Attack</title>
      <p>4.1.1</p>
      <sec id="sec-10-1">
        <title>Random Answers</title>
        <p>When a worker has spent some time trying to solve the
task and realizes that he is not able to provide a good answer,
it is more likely that a random answer will be given rather
than the task will be returned. Moreover, malicious workers
Tasks available in the crowd sourcing platform</p>
        <p>T5</p>
        <p>T6</p>
        <p>T7
T1</p>
        <p>T2</p>
        <p>T3</p>
        <p>T4</p>
        <p>T.3 T.2 T.1</p>
        <p>
          Distributed map of Question/Answers
will quickly and randomly answer in order to obtain the
reward fast. In our previous experiment [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] we have observed
that 10% of $0.01 tasks have been completed in less than 5
seconds. As we can see in Figure 1b quickly completed tasks
provide lower quality work. Therefore, completion time is a
strong indication of a malicious random answer.
        </p>
        <p>Random answers jeopardize tasks designed with
monetary incentives and no test questions. The workers prefer to
provide a random answer to collect the money rather than
skipping the task. However, depending on the number of
random answers in the collected results, they could be
ltered out by task repetition.
4.1.2</p>
      </sec>
      <sec id="sec-10-2">
        <title>Automated Answers</title>
        <p>Spammers create generic malicious programs {or bots{
capable of registering to do tasks and submitting answers
either randomly or with minimal arti cial reasoning.</p>
        <p>Test questions constitute a good trap for this attack.
However bots can massively attack a task and thus increase their
chances to pass quali cation test questions.
4.1.3</p>
      </sec>
      <sec id="sec-10-3">
        <title>Semi-Automated Answers</title>
        <p>We identify semi-automated answers as bots speci cally
designed for a given task (e.g., relevance judgement).
Spammers can use pre-existing packages and tailor their attacks
to a given context. In our use-case, the spammers can
create a bot that opens all the links, parses the corresponding
HTML content and attempt to complete the relevance
judgment task whenever possible. Or, it can run the query in a
search engine and identify which of the proposed links has
been ranked rst. If the bot is not sure about its answer,
it can even ask a human, for a given ratio of questions, to
increase its answer accuracy or return low con dence HITs
to preserve its approval rate.</p>
        <p>Such semi-automatic approaches can considerably improve
the time/reward ratio of dishonest workers and they target
task collections with easy-to-answer test questions.
4.2</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Group Attack</title>
      <p>By a group attack we mean a group of individuals or bots
focusing on the same batch of tasks. This group can use a
distributed dictionary of questions and answers. Depending
on the attack, answers (e.g., query-result pairs) are recorded
in this dictionary called Shared Question Answer Dictionary
(SQAD) which is shared among the group (see Figure 2).
4.2.1</p>
      <sec id="sec-11-1">
        <title>Agree on Answers</title>
        <p>Typically, the requester would expect some agreement on
the answers received for a given task; it is typically
recommended to shu e the order of the questions and their
answers to prevent an all-agree-on- rst strategy. In our use
case, the attackers could use the following strategy: the rst
worker who sees a task will select a link randomly, and then
an automated system will create an entry in a SQAD with
the query string and the chosen link. If the same task is
encountered again, the automated system will highlight or
automatically submit the stored answer.</p>
        <p>This attack makes majority vote ltering ine ective as it
may discard valid answers.
4.2.2</p>
      </sec>
      <sec id="sec-11-2">
        <title>Answer Sharing</title>
        <p>From Section 3.3, we can categorize test questions and
their respective attacks into:</p>
        <p>Gold standard : The requesters can only input a limited
number of these questions and often will require
redundancy. Spammers will exploit this weakness by having
all the workers agree on honestly answering some
questions and submitting their answers to a SQAD. The
answers to test questions get shared as well.</p>
        <p>Turing test questions: Such questions (e.g., Captcha)
are widely used to stop bots, they can also be
generated inde nitely, which makes it impossible to track
with a SQAD. Since only humans can pass these tests,
it is su cient that the task is recognized as a test to
require full human attention. The remaining of the
task can be completed automatically.
4.2.3</p>
      </sec>
      <sec id="sec-11-3">
        <title>Artificial Clones</title>
        <p>This attack is similar to Answer Sharing, with the
difference that the malicious worker acts alone by answering
honestly to questions and storing them, then spawns
automated programs that duplicate the spammer's behavior by
reading his answers. If the program encounters an unseen
question, it can either skip the question (if allowed by the
platform), answers randomly, or ask a human.</p>
        <p>This attack presents a challenge for all the anti-adversarial
schemes we are aware of, as the bots merely replicate a
\honest" answer which is an increase in gain for the spammer and
a loss for the requester.</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>The crowdsoucing market is ourishing and it is strongly
based on nancial incentives. Because of this, it may attract
more and more cheaters and thus give rise to novel
cheatingschemes. We could not nd any hard evidence about the
amount of spam in crowdsourcing platforms. Anyhow, we
expect that adversarial approaches will become more
advanced as the popularity of crowdsourcing raises.</p>
      <p>In this paper we overviewed adversarial crowdsourcing
mechanisms and showed that many of current quality
control mechanisms can fail naively in detecting well-organized
spammers. Based on the presented overview, we claim that
in the process of evaluating spam- ltering schemes, the usual
methodology applied|often based on self designed experiments|
is not adequate to real crowdsourcing environments where
organized groups of malicious workers are present.</p>
      <p>
        Such reasons motivate the need for further studies in the
area of spam detection and quality control in
crowdsourcing platforms which will be the focus of our future work.
Speci cally, there is the need for new benchmarks on which
to evaluate and compare existing and novel spam detection
techniques for crowdsourcing platforms. Moreover, a study
of how much this problem a ects the quality of crowdsourced
tasks in a real-world large-scale setting is necessary.
Existing research started to understand which tasks attract more
cheaters and which task features have to be controlled (e.g.,
high reward as well as simple task design attract more
malicious workers [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). Therefore, the conclusion is that it is
better to discourage cheaters rather than invest resources
in a posteriori ltering. With respect to post- ltering, the
requester can use information like assignment time,
submission time, feedback etc. to classify an answer as spam.
Systems for worker analytics that help gather and share data in
real time about the tasks in progress and about workers may
help in identifying malicious behaviors [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Reward
mechanisms di erent than the nancial one should be also taken
into account [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
6.
      </p>
    </sec>
    <sec id="sec-13">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was supported by the Swiss National Science
Foundation under grant number PP00P2 128459.
7.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O.</given-names>
            <surname>Alonso</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Baeza-Yates</surname>
          </string-name>
          .
          <article-title>Design and implementation of relevance assessments using crowdsourcing</article-title>
          .
          <source>In ECIR</source>
          , pages
          <volume>153</volume>
          {
          <fpage>164</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Alonso</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lease</surname>
          </string-name>
          .
          <article-title>Crowdsourcing 101: putting the wsdm of crowds to work for you</article-title>
          .
          <source>In WSDM, pages 1{2</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>O.</given-names>
            <surname>Alonso</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizzaro</surname>
          </string-name>
          .
          <article-title>Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment</article-title>
          .
          <source>In SIGIR 2009 Workshop on The Future of IR Evaluation</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Demartini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Difallah</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Cudre-Mauroux</surname>
          </string-name>
          .
          <article-title>ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking</article-title>
          .
          <source>In WWW</source>
          <year>2012</year>
          , Lyon, France,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Donmez</surname>
          </string-name>
          , J. G. Carbonell, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Schneider</surname>
          </string-name>
          .
          <article-title>A probabilistic framework to learn from multiple annotators with time-varying accuracy</article-title>
          .
          <source>In SDM'10</source>
          , pages
          <fpage>826</fpage>
          {
          <fpage>837</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Downs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Holbrook</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sheng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L. F.</given-names>
            <surname>Cranor</surname>
          </string-name>
          .
          <article-title>Are your participants gaming the system?: screening mechanical turk workers</article-title>
          .
          <source>In CHI</source>
          , pages
          <volume>2399</volume>
          {
          <fpage>2402</fpage>
          , New York, NY, USA,
          <year>2010</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Eickho</surname>
          </string-name>
          and
          <string-name>
            <surname>A. P. de Vries</surname>
          </string-name>
          .
          <source>Increasing Cheat Robustness Of Crowdsourcing Tasks. Information Retrieval</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Heymann</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          .
          <article-title>Turkalytics: analytics for human computation</article-title>
          .
          <source>In WWW</source>
          , pages
          <volume>477</volume>
          {
          <fpage>486</fpage>
          , New York, NY, USA,
          <year>2011</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hirth</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <article-title>Ho feld, and</article-title>
          <string-name>
            <given-names>P.</given-names>
            <surname>Tran-Gia</surname>
          </string-name>
          .
          <article-title>Cost-Optimal Validation Mechanisms and Cheat-Detection for Crowdsourcing Platforms</article-title>
          .
          <source>In Workshop on Future Internet and Next Generation Networks (FINGNet)</source>
          , Seoul, Korea,
          <year>June 2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Ipeirotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Provost</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Quality management on amazon mechanical turk</article-title>
          .
          <source>In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '10</source>
          , pages
          <fpage>64</fpage>
          {
          <fpage>67</fpage>
          , New York, NY, USA,
          <year>2010</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Jurca</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Faltings</surname>
          </string-name>
          .
          <article-title>Mechanisms for making crowds truthful</article-title>
          .
          <source>J. Artif. Int. Res.</source>
          ,
          <volume>34</volume>
          :
          <fpage>209</fpage>
          {
          <fpage>253</fpage>
          ,
          <string-name>
            <surname>March</surname>
          </string-name>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kittur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Suh</surname>
          </string-name>
          .
          <article-title>Crowdsourcing user studies with mechanical turk</article-title>
          .
          <source>In Proc. CHI</source>
          <year>2008</year>
          , ACM Pres, pages
          <volume>453</volume>
          {
          <fpage>456</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Edmonds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Hester</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Biewald</surname>
          </string-name>
          .
          <article-title>Ensuring quality in crowdsourced search relevance evaluation: The e ects of training question distribution</article-title>
          .
          <source>In SIGIR Workshop on Crowdsourcing for Search Evaluation</source>
          , pages
          <volume>21</volume>
          {
          <fpage>26</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>A. D. Shaw</surname>
            ,
            <given-names>J. J.</given-names>
          </string-name>
          <string-name>
            <surname>Horton</surname>
            , and
            <given-names>D. L.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Designing incentives for inexpert human raters</article-title>
          .
          <source>In CSCW</source>
          , pages
          <volume>275</volume>
          {
          <fpage>284</fpage>
          , New York, NY, USA,
          <year>2011</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Sheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Provost</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Ipeirotis</surname>
          </string-name>
          .
          <article-title>Get another label? improving data quality and data mining using multiple, noisy labelers</article-title>
          .
          <source>In KDD</source>
          , pages
          <volume>614</volume>
          {
          <fpage>622</fpage>
          , New York, NY, USA,
          <year>2008</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Snow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. O</given-names>
            <surname>'Connor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          .
          <article-title>Cheap and fast|but is it good?: evaluating non-expert annotations for natural language tasks</article-title>
          .
          <source>In EMNL</source>
          , pages
          <volume>254</volume>
          {
          <fpage>263</fpage>
          ,
          <string-name>
            <surname>Stroudsburg</surname>
          </string-name>
          , PA, USA,
          <year>2008</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Whitehill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ruvolo</surname>
          </string-name>
          , T. fan
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bergsma</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Movellan</surname>
          </string-name>
          .
          <article-title>Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise</article-title>
          .
          <source>In NIPS</source>
          , pages
          <year>2035</year>
          {
          <year>2043</year>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>