DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing


         Condition-Task-Store: A Declarative Abstraction for
             Microtask-based Complex Crowdsourcing

                  Kenji Gonnokami                            Atsuyuki Morishima                          Hiroyuki Kitagawa
                 University of Tsukuba                        University of Tsukuba                      University of Tsukuba
           s1320687@u.tsukuba.ac.jp mori@slis.tsukuba.ac.jp                                         kitagawa@cs.tsukuba.ac.jp


ABSTRACT                                                                          programming model for minimizing the cost of re-running
Microtasks have been widely adopted by many crowdsourc-                           programs. Recently, the declarative approach to the develop-
ing platforms as a unit for human computation. Recently,                          ment of crowdsourcing systems has emerged because declar-
tools to support programmers to implement complex crowd-                          ative abstractions have an affinity toward crowdsourcing ap-
sourcing applications with microtasks have been proposed.                         plications. Declarative descriptions do not impose unneces-
One approach is to provide a library of functions that can                        sary timing constraints, and we can adopt many well-known
be called by programs written in imperative programming                           optimization techniques. For example, there are proposals
languages. Another approach is to allow SQL queries to                            that use SQL-like languages to describe data-centric crowd-
invoke microtasks. The former approach provides large ex-                         sourcing systems [7] [8] [13]. However, they provide limited
pressive power, while the latter allows declarative descrip-                      expressive power (see Section 4).
tions with limited expressive power. This paper proposes                             In this paper, we offer the following two key contributions:
the Condition-Task-Store (CTS) abstraction, which is an al-                       (1) Alternative approach to declarative crowdsourc-
ternative declarative approach to implement complex data-                         ing. We first introduce the Condition-Task-Store (CTS)
centric crowdsourcing with microtasks. The CTS abstrac-                           abstraction, which is an alternative declarative approach for
tion is unique in that it has all the following features: (1)                     implementing complex data-centric crowdsourcing with mi-
it naturally extends the task template adopted by many                            crotasks. The CTS abstraction describes a crowdsourcing
crowdsourcing platforms to define microtasks, (2) it allows                       system as a set of CTS rules, each of which is a natural
declarative descriptions of crowdsourcing systems, and (3)                        extension of task template adopted by many crowdsourc-
it has large expressive power.                                                    ing platforms to define microtasks. Therefore, programmers
                                                                                  who are familiar with task templates can easily write simple
1.    INTRODUCTION                                                                programs with the CTS abstraction.
  As computer network technologies evolved, crowdsourc-                              The CTS abstraction is unique in that it has all the follow-
ing became popular in many application domains. Software                          ing features: (1) it naturally extends the task template, (2)
systems that take the crowdsourcing approach are called                           it allows declarative descriptions of crowdsourcing systems,
crowdsourcing systems [2].                                                        and (3) it has large expressive power.
  Crowdsourcing systems are often constructed on crowd-                           (2) Novel criterion for the expressive power of lan-
sourcing platforms, which provide fundamental functions                           guages. Next, we discuss the expressive power of the CTS
for implementing crowdsourcing systems. For example, the                          abstraction. We introduce a novel criterion to measure the
Amazon Mechanical Turk (MTurk) [3] is a crowdsourcing                             expressive power of programming languages for crowdsourc-
platform that provides a market in which workers perform                          ing; the criterion focuses on the class of interactions with
microtasks (called HITs in MTurk) with a small payment                            humans we can implement with the language. The crite-
amount per task. Crowdsourcing platforms often provide                            rion is important for the following two reasons. First, com-
APIs for requesters to register microtasks in the platforms.                      plex crowdsourcing often requires various types of interac-
  Because there are frequent patterns appearing in pro-                           tions with humans. For example, one of such interactions
grams for crowdsourcing systems, software tools have been                         of crowdsourcing is the iterative collaboration [11], which is
developed to support the implementation of complex crowd-                         not necessarily supported by every existing framework. Sec-
sourcing systems. For example, Turkit [4] provides a library                      ond, the class of interactions is closely related to the class of
of functions to define and call tasks from general-purpose                        games in game theory: because human behavior is affected
programming languages and introduces the crash-and-rerun                          by the incentives and rules defined by the game structure,
                                                                                  the class represents the size of mechanism design space, i.e.,
                                                                                  the set of possible mechanisms we can implement to ex-
                                                                                  ploit the wisdom of the crowd. In fact, the change of game
                                                                                  structure affects the quality of the data produced by crowd-
                                                                                  sourcing systems [5]. Our examples in Sections 3 and 4 also
                                                                                  suggest how game structure is important in crowdsourcing.
                                                                                  Then, we theoretically show that our CTS abstraction is not
Copyright c 2013 for the individual papers by the papers’ authors. Copying
                                                                                  only Turing complete, but can also support a wide range of
permitted for private and academic purposes. This volume is published and         game structures.
copyrighted by its editors.


                                                                             1
                                                                             20
                  DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing


                                                                        <QuestionForm xmlns="http://mechanicalturk.
   The remainder of the paper is organized as follows. Sec-
                                                                         amazonaws.com/AWSMechanicalTurkDataSchemas/
tion 2 explains related work. Section 3 introduces the CTS               2005-10-01/QuestionForm.xsd">
abstraction. Section 4 discusses its expressive power. Sec-               <Question>
tion 5 explains a prototype to support the software develop-                <QuestionIdentifier>1</QuestionIdentifier>
ment using the CTS abstraction. Section 6 is the summary.                   <QuestionContent>
                                                                               <Text>How many movies have you seen this month?</Text>
2.   RELATED WORK                                                           </QuestionContent>
   Many approaches to support the development of complex                    <AnswerSpecification>
crowdsourcing systems have been proposed. They differ                          <FreeTextAnswer/>
from one another in the abstraction they use to describe                    </AnswerSpecification>
                                                                          </Question>
crowdsourcing systems.                                                  </QuestionForm>
(1) Imperative programming languages. TurKit [4]
provides a function library for implementing crowdsourcing,                    Figure 1: Example of a task template
which can be used via codes written in imperative program-               In this section, we first explain task templates. Then, we
ming languages. It supports the crash-and-rerun model to               show examples to give an intuitive explanation of the CTS
avoid re-performing costly operations.                                 abstraction. Finally, we explain formal definitions.
(2) MapReduce-like abstraction. CrowdForge [14] is
a MapReduce-like framework for describing complex tasks                3.1    Task Templates
on MTurk. It models a crowdsourcing system as a set of                    The task template is a popular form for defining and reg-
tasks to implement partition, map, and reduce functions.               istering microtasks into crowdsourcing platforms. Figure 1
As we discuss in Section 4.3, the expressive power of the              shows an example of a task template in XML format for
CTS abstraction is larger than that of CrowdForge.                     microtasks (HITs) of MTurk, which asks a worker to enter
(3) Control/data flows. CrowdLang [11] is a language                   how many movies he or she watched in a month. The es-
for describing crowdsourcing systems in terms of basic op-             sential components of a task template are the question to
erators, data items, and control flow constructs. Currently,           be shown (QuestionContent) and the type of the values to
it seems that CrowdLang is used as a language for writing              be received by workers (AnswerSpecification). Task tem-
crowdsourcing systems at a high-abstraction level and that             plates can contain variables (called placeholders) with which
it does not provide the means to describe the details required         we can define many microtasks that are based on the same
to directly execute the code.                                          template but differ in the values bound to the variables.
(4) Rule-based abstraction. The CTS abstraction is not                    Requesters first write task templates to define and insert
the first rule-based abstraction. CyLog [9] is a Datalog-like,         microtasks into the task pool. Then, workers perform the
rule-based language for describing crowdsourcing systems.              tasks that exist in the task pool.
A weakness of CyLog is that it requires programmers to be
familiar with programming by Horn clauses even for simple              3.2    Overview of the CTS Abstraction
crowdsourcing. In contrast, the core component of the CTS                 In the CTS abstraction, a crowdsourcing system is de-
abstraction is a task template. Therefore, although the CTS            scribed by a set of CTS rules. We assume that there exists
abstraction borrows concepts from logic-based languages,               a relational database. CTS rules read and write data to and
programmers can start with a set of simple task templates              from the database.
and then naturally proceed to more complex crowdsourcing.                 A CTS rule is the fundamental building block of the CTS
(5) SQL-like abstraction. CrowdDB [8], Qurk [7], and                   abstraction. We first give a simple example and next show
Deco [13] use SQL to describe crowdsourcing systems. They              another example that requires more than one CTS rule.
propose novel query processing and optimization schemes                Example 1. A Simple Crowdsourcing System
and we believe that some of the proposed techniques can be               We use the task shown in Figure 2 to ask workers to tag
applied to the CTS abstraction. As shown in Section 4.3,               books. The details are as follows:
the expressive power of the CTS abstraction is larger.
   To our knowledge, this paper is the first to investigate the          • The database has the Book(bid, title, author) rela-
CTS abstraction. The abstraction models crowdsourcing                       tion to store book information.
systems as a set of CTS rules, each of which is similar to a             • For each book stored in the Book relation, tags are given
task template. Technically, a CTS rule can be implemented                   by three workers.
by combining two ECA rules [12]: one generates microtasks
and the other stores results in the database (and can be                 • The result is stored in the Tag(bid, tag) relation.
implemented by using imperative languages). The CTS ab-                  • Workers are paid 10 (cents or any currency) per task, if
straction provides a higher-level, user-friendly abstraction                any other workers entered the same tag.
designed for describing crowdsourcing systems, which has a
well-defined semantics and proven large expressive power.                The crowdsourcing system can be described only by the
   Our discussion on the expressive power is related to game           CTS rule in Figure 3. A CTS rule consists of three parts:
theory. Recently, the literature on algorithmic game the-              condition, task, and store. We explain each part below.
ory has addressed various aspects involving both algorithms            The condition part: We write the condition to generate
and games, such as complexities of computing equilibrium               and insert a task into the task pool. The condition specifies
of games [15]. To our knowledge, our paper is the first to             what tuples need to exist in the database for generating a
discuss classes of games that can be implemented by ab-                task. For example, the condition in Figure 3 states that a
stractions for crowdsourcing.                                          task is generated for each tuple existing in the Book relation.
                                                                       The task part: We write a task template that contains a
3.   THE CTS ABSTRACTION                                               question to be posed to workers. The question can contain


                                                                  2
                                                                  21
                  DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing


                                                                            Figure 4: Example of a microtask: Task 1
          Figure 2: Example of a microtask
 Condition    Book(bid, title, author)
 Task         Question   Please tag the book "$title"
                         written by "$author"
              Generator  Entry(desc:"Tag", var:tag,                         Figure 5: Example of a microtask: Task 2
                         type:text)
              Count      3
              Payoff     PayIf(count(Tag(bid, tag))>=2, 10)                                       Task1:
 Store        Tag(bid, tag)                                            Condition
             Figure 3: Example of a CTS rule                           Task         Question  Please enter the name of a good
                                                                                              restaurant
variables (such as $author and $title) bound to values in                           Generator Entry(desc:"Restaurant Name", var:
                                                                                              rname, type: text)
the condition (i.e., the variables are replaced with values
                                                                                    Count
each time the task is generated). The task part also spec-                          Payoff    PayIf(avg(Rating(rname, value))>3, 10)
ifies additional information, including the variables and its          Store        Restaurant(rname)
associated types, to store entered values.                                                        Task2:
   Because there are frequently occurring patterns that ap-            Condition     Restaurant(rname)
pear in task template specifications, we provide template              Task          Question   Please rate the restaurant
                                                                                                ‘‘$rname’’ on a 5-point scale.
generators that allow users to describe task templates with-
                                                                                     Generator Choice(var: value, type: int,
out specifying implementation details such as HTML tags.                                        items: [1, 2, 3, 4, 5])
The task part in Figure 3 states that we use the Entry tem-                          Count      3
plate generator with the following parameters: (1) the in-                           Payoff     Pay(10)
put field is labeled as “Tag,” (2) the entered value is to             Store         Rating(rname, value)
be stored in the tag variable, and (3) the type of tag is
                                                                                Figure 6: CTS rules for Example 2.
text. Count is the number of tasks to be generated for the
same value. Payoff describes how much is paid to workers
per task. For example, PayIf(count(Tag(bid, tag)) >=                     The CTS rule for Task 1 has no condition: thus, the task is
2, 10) states that 10 cents will be paid if there are other           unconditionally generated. Unless the count number is spec-
workers that entered the same tag to the same book.                   ified, the number of generated tasks is determined as follows:
The store part: We specify the relations and attributes               (1) if the key of the predicate (rname of Restaurant(rname))
in which the entered values will be stored. The store part            is bound to values by the condition, the task is generated
in Figure 3 states that we use the Tag relation to store bid          only once for each case in which the condition is satisfied; (2)
and the entered tag.                                       2          otherwise the same task is repeatedly generated everytime
   As the example above suggests, a CTS rule is a natural             the task is performed and removed from the task pool.
extension of the widely-used task template. Because we can               The condition of the CTS rule for Task 2 states that it
omit the condition and task parts, a CTS rule can be used             generates a task for each tuple stored in Restaurant relation.
to describe the following four types of processing.                   Thus, the task is generated for each restaurant entered in
    1. Generate a task when the condition is satisfied, and           Task 1. Workers receive 10 cents for performing a task. 2
       store the result into the database.                            Discussion. As the examples above suggest, the descrip-
                                                                      tion is declarative, and each rule is invoked when its condi-
  2. If the condition part is omitted, generate a task with           tion is satisfied. Compared to the code written in imperative
     no condition, and store the result into the database.            programming languages, CTS rules naturally describe the
  3. If the task part is omitted, compute and store values            parallel and asynchronous processing of computation involv-
     into the database when the condition is satisfied.               ing human workers. On the other hand, the CTS abstraction
  4. If both of the condition and task parts are omitted,             is more expressive than the declarative query languages that
     store values into the database with no condition.                do not support the transitive closure, because it essentially
                                                                      supports a loop with a dynamic condition check [1].
Example 2. More Complex Crowdsourcing.                                   An important point is that the incentive structure plays
  We consider a crowdsourcing system to rate restaurants,             a critical role to appropriately exploit the wisdom of crowd
in which workers perform the following two types of tasks:            and (at least theoretically) ensure data quality. Because the
Task 1: Enter names of restaurants (Figure 4). A 10 cent              incentive structure and rules involving multiple humans can
     payment is paid if the average rating by others is higher        be modeled as games, we can use game theory to discuss
     than 3.                                                          their behaviors. For example, with a simple game-theoretic
Task 2: Enter an evaluation rating (1 to 5) for the given             analysis, the incentive structure of Example 1 guarantees
      restaurant (Figure 5). Three workers perform this task          that rational workers enter tags that others can easily come
      for each restaurant, and they receive 10 cents per task.        up with. Similarly, the incentive structure of Example 2
                                                                      guarantees that rational workers for Task 1 enter the names
  We assume that the results of Task 1 are stored in
                                                                      of restaurants that are likely to receive high ratings.
Restaurant(rname), and that those of Task 2 are stored
in Rating(rname, value). Then, Figure 6 shows CTS rules
for Tasks 1 and 2.                                                    3.3      Formal Definition

                                                                 3
                                                                 22
                      DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing


  This section first defines the CTS rules and then explains                             which the condition holds, the same task is generated
the syntactic sugar for the concise description of rules.                                N times and N tuples are inserted into the relation.
                                                                                         This is implemented by adding the count attribute
3.3.1 Definition of CTS Rules                                                            to the schema of the relation in the store part, and
Definition 1. A program of the CTS abstraction is a                                      copying the CTS rule N times in the program.
set of CTS rules, each of which is modeled as a triple                                •Payoff specifies a function to compute payoff values
Ri = (Ci , Ti , Si ), running over a relational database schema.                        given to workers. If a function is specified, rules to
Here, Ti is a task template, Ci is the condition to generate                            update Worker(pid, payoff) are automatically gen-
a task using Ti , and Si is the description of how to store                             erated and added to the set of rules. The Crapid sys-
the result in the database. The database contains a pre-                                tem provides pre-defined payoff functions.
defined relation with the schema Worker(pid, payoff) to
store information on workers and the accumulated values of                         3.4    Evaluation Model and Semantics
the payoffs they received so far.                                                    Given a description d of the CTS abstraction, the evalu-
  • Ci is a sequence P1 (x11 , . . . , x1n1 ), . . . , Pm (xm1 , . . . , xmnm )    ation of d on instance ins of the database is performed by
     of zero or more atoms. Some of the atoms can be arith-                        evaluating each CTS rule in a bottom-up way on ins. More
     metic atoms, such as x11 = 3.                                                 precisely,
  • Ti is either a triple (q, x̄, ȳ) or null. Here, q is a question                  • For each CTS rule (Ci , Ti , Si ) ∈ d, check if Ci is satis-
     for workers, x̄ is a sequence of variables bound by Ci ,                           fied with ins in the following way: for each atom ak in
     and ȳ is a sequence of variables to store the results of                          Ci (from left to right), check if there exists any tuple
     performing the task.                                                               to bind variables in ak to the values that are consis-
                                                                                        tent with the values bound to the variables appeared
  • Si is a sequence Q1 (y11 , . . . , y1n1 ), . . . , Qm (ym1 , . . . , ymnm )         in a0 . . . ak−1 .
     of one or more atoms, where each yjk is any of a vari-
     able bound by Ci , a variable that appears in ȳ, or a                           • For every combination V of values that satisfies all the
     constant. Each atom can be followed by either /update                              atoms in Ci , perform one of the following:
     or /delete.                                                           2               – If Ti 6= null, replace variables in Ti with values in
                                                                                             V and insert the task into the task pool.
3.3.2 Syntactic Sugar                                                                      – If Ti = null, create new tuples using values in V
  We introduce the following variety of syntactic sugars to                                  for the atoms that appear in Si . Then, perform
make the rule description concise.                                                           one of the following: insert the tuples into ins,
(1) Attributes of atoms. The notation of atoms, which                                        update ins with the tuples (when the atom is fol-
appear in the condition and store part of CTS rules, are                                     lowed by /update), or delete the tuples from ins
essentially the same as those of Prolog and Datalog. A                                       (when followed by /delete).
key difference is that they explicitly specify attribute names                        • If a worker completes the generated task, (1) create
for their parameters. Each parameter is specified in the                                new tuples for the atoms that appear in Si , using val-
form attribute:variable or attribute:constant. For example,                             ues in V and the entered values for the task, then (2)
Restaurant(name:x, zip:305) is an atom.                                                 apply the insertion, delete, or update operation to ins
  There are cases in which parameters and attributes can                                with the new tuples.
be omitted in each atom, as described below.
                                                                                      • If ins is updated, check if there exist rules for which Ci
   • We can omit a parameter if the rule does not con-                                  is satisfied with the new ins. If such rules exist, process
     sume the value of the bounded variable. For example,                               them. Terminate the process if we cannot find such a
     Restaurant(name:x) is an atom that omits zip.                                      rule. If there are multiple rules that can be executed
   • We can omit an attribute name if the attribute name                                at the same time, the rule that appears earlier in the
     is the same as the variable name. For example,                                     code is evaluated first.
     Restaurant(name:name, zip:y) can be represented
     as Restaurant(name, zip:y) because the attribute                                 For example, assume that we have the CTS rule shown in
     and the variable have the same name.                                          Figure 7. We first check if there exists a tuple that matches
                                                                                   Image(i, size, type:"photo") in the database. Assume
(2) Task part. Because we have frequently occuring pat-                            that the Image relation has tuple t = (img98, 100, photo).
terns in the description, we introduce task-template gener-                        Then, i and size are bound to img98 and 100, respectively.
ators and the following three fields for the task part.                            Next, check if there exists a tuple that matches Large(size)
   •Generator is the name of a task-template generator, as-                        with size=100. If exists, we replace $i in the task template
     sociated with its parameters. For example, Entry and                          with img98 and insert the task into the task pool to ask
     Choice in Figure 6 are task-template generators. They                         workers to choose the category for the image img98. If the
     generate actual task templates based on the given pa-                         task is performed, the result of performing the task is stored
     rameters and the sentence written in the Question                             into LargePhoto, and the payoff attribute of the Worker re-
     field. These allow users to define task templates with-                       lation is incremented by 1.
     out specifying implementation details (such as HTML                              Given a set d of CTS rules, the semantics of d is defined
     tags). The Crapid system (Section 5) implements var-                          as a set of rational consequences of d, in a similar way as
     ious task-template generators.                                                the semantics of CyLog codes [9]. A rational consequence
                                                                                   of d is a set of facts that are derived from the rules and
   •Count specifies the number of generated tasks. Let N
                                                                                   the equilibrium [16] of the games, i.e., the states reached by
     be the number specified in the field. For every case in
                                                                                   rational workers.

                                                                              4
                                                                              23
                  DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing


 Condition    Image(i, size, type:"photo"), Large(size)                by new st, and move the head to pos+dir. Rule 3 extends
 Task         Question   Please choose a category of $i.               the tape when the head reaches a position that the head
              Generator Choice(var: category,
                         type:string, items:[landscape,
                                                                       never visited before. We need Rule 3 because Rule 2 always
                         portrait, animal, food, others])              requires that Tape(pos, sym) exists. We also need a rule
              Count      1                                             to stop the machine if it reaches the halting states H ⊆ K.
              Payoff     pay(1)                                        The rule is obvious and omitted due to the space limitation.
 Store        LargePhoto(i, category)                                    Defining the CTS rules to implement a Turing machine
      Figure 7: Two atoms in the condition part                        proves that the CTS abstraction is Turing complete.       2

                            Rule 1
                                                                       4.2    Expressive Power in Terms of Games
 Condition                                                               This section proposes to use the game concept as a mea-
 Task                                                                  sure of the expressive power of programming languages for
 Store       TuringMachine(id:1, st:s, head:0)                         crowdsourcing, because the class of games that the language
                          Rule 2                                       can implement affects the way in which the implemented sys-
             TuringMachine(id, st, head),
             Tape(pos:head, sym),                                      tem can exploit the power of the wisdom of crowd. First, we
 Condition                                                             enumerate several classes of games and show the relationship
             Rule(st, sym, new st, new sym, dir),
             new pos = pos + dir                                       among the classes.
 Task
 Store
             TuringMachine(id, st:new st, head:new pos)/update,        Definition 2. G1 is a class of games that satisfy the follow-
             Tape(pos, sym:new sym)/update                             ing conditions:
                          Rule 3
 Condition   TuringMachine(id, head)                                      1. Every input from a human is not affected by the inputs
 Task                                                                        from others, and
 Store       Tape(pos:head)/update
                                                                         2. The payoffs are computed by a primitive recursive
Figure 8: CTS rules implementing a Turing machine                           function of the input values.                  2
4.    EXPRESSIVE POWER                                                   An example of a game in G1 is one that asks humans to
                                                                       enter tags for a given image without telling them what tags
  This section discusses the expressive power of the CTS ab-
                                                                       are entered by other humans. Payoffs are defined for each
straction. First, we show that the CTS abstraction is Turing
                                                                       combination of worker inputs. For example, workers receive
complete. Then, we introduce a criterion to measure the ex-
                                                                       payoffs when they enter the same tag. Games in G1 are
pressive power of programming languages, which focuses on
                                                                       called simultaneous games in game theory.
the class of games the language can implement. Finally, we
compare the expressive power of the CTS abstraction with               Definition 3. GN is a class of programs in which (1) N (> 0)
those of other frameworks.                                             is known in advance; (2) each game has at most N -phases of
                                                                       interactions, each of which asks a worker to enter data; (3) at
4.1      Turing Completeness                                           each i-th phase workers are shown some information based
Theorem 1. The CTS abstraction is Turing complete.                     on what was entered in the first to the i − 1-th phases; and
                                                                       (4) payoffs are computed by a primitive recursive function
Proof Outline. Figure 8 is a set of CTS rules that im-
                                                                       of the entered values.                                      2
plements any Turing machine. Formally, a Turing machine
consists of a quintuple (K, Σ, δ, s, H) where K is a finite set          Each game in GN has at most N phases, each of which
of states, Σ is an alphabet, s ∈ K is the initial state, H ∈ K         asks a human to enter data considering some information
is the set of halting states, and δ is the transition function         computed from data in the previous phases. For example,
[6]. Intuitively, we need the following three components to            assume that we want to divide a set of cakes into two groups
implement a turing machine.                                            whose total prices are equivallent to each other. Then, a
                                                                       program that (1) asks one worker to divide the cakes into
Element 1. Memory of the machine’s inner state
                                                                       two groups at the first phase, then (2) asks another worker
Element 2. Head reading and writing information stored in              to choose one group, and finally (3) gives to each worker the
           the tape                                                    total price of cakes in his group, belongs to GN (with N =2).
Element 3. The long tape in infinitum.                                 Note that the game guarantees that the prices of the two
                                                                       groups become the same if workers are rational; it exploits
  In Figure 8, TuringMachine(id, st, head) implements                  the power of human intelligence to compute how they can
Elements 1 and 2. Here, st records the current state whose             make two groups whose prices are equivalent to each other.
domain is K, and head stores the position of the head.                 From the definition, every game in G1 belongs to GN .
Tape(pos, sym) implements Element 3, in which each tuple
                                                                       Definition 4. G∗ is a class of programs in which each pro-
(p, s) states that symbol s (whose domain is Σ) is written at
                                                                       gram (1) executes a sequence of interactions with workers,
position p of the tape. Rule(st, sym, new st, new sym,
                                                                       with the sequence being generated by a primitive recursive
dir) stores the rules δ to read and write symbols on the
                                                                       function, (2) shows workers at each interaction the informa-
tape and move the head.
                                                                       tion computed by a function whose parameters are taken
  Rule 1 initializes a Turing machine (the initial state is s).
                                                                       from the results of past interactions, and (3) payoffs are
Rule 2 states how the head moves and writes symbols onto
                                                                       computed by a primitive recursive function of the entered
the tape according to the rules stored in Rule(st, sym,
                                                                       values.                                                   2
new st, new sym, dir). It states that if the inner state is
st and the symbol at the current position of the head is                 An example of a game in G∗ is to ask workers to write a
sym, write new sym at the position, update the inner state             paragraph that explains a given keyword. Workers update


                                                                  5
                                                                  24
                  DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing


                Table 1: Expressive power                              6.   SUMMARY
 Abstraction /Framework        Turing complete   Class of games          This paper introduced the CTS abstraction, a declarative
 MTurk alone                   N                 ⊆ G1                  approach for implementing complex crowdsourcing with mi-
 CrowdForge                    N                 ⊆ GN
 CrowdDB/Deco/Qurk             N                 ⊆ GN                  crotasks. The paper also introduced a novel criterion to
 CTS Abstraction               Y                 G∗                    measure the expressive power of programming languages for
 Imperative programming lan-   Y                 G∗                    crowdsourcing by focusing on the class of games we can im-
 guages with the MTurk API
                                                                       plement with the language. The class represents the size of
                                                                       mechanism design space, i.e., the set of possible mechanisms
the paragraph until the paragraph is satisfied by the major-
                                                                       we can implement to exploit the wisdom of the crowd. The
ity of the crowd. When the paragraph is completed, every
                                                                       CTS abstraction is unique in that it has all the following
writer (worker) who contributed to the paragraph receives a
                                                                       features: (1) it naturally extends the task template adopted
payoff, which is computed by dividing a fixed total payment
                                                                       by many crowdsourcing platforms, (2) it allows declarative
by the number of the contributors.
                                                                       description, and (3) it has large expressive power.
Theorem 2. GN is a proper subset of G∗ .                                 Future work includes the development of various rewriting
                                                                       techniques for the CTS abstraction. For example, we plan
Proof. For every g, g ∈ GN ⇒ g ∈ G∗ and for any given i,
                                                                       to adapt various optimization techniques for crowdsourcing
there exists g ∈ G∗ s.t. g has i+1 input phases and therefore
                                                                       systems [7] [8] [13] into our context.
g 6∈ GN .                                                  2
                                                                       Acknowledgements. The authors are grateful to Prof.
Theorem 3. Assume that we allow Turing machines to                     Shigeo Matsubara of Kyoto university for his helpful com-
interact with humans at any step of its execution. Let M be            ments, and to the contributors of Crowd4U, whose names
the set of all such machines. M can implement any g ∈ G∗ .             are partially listed at http://crowd4u.org. This research was
                                                                       partially supported by PRESTO from the Japan Science and
Proof. The sequence of interactions with workers that can              Technology Agency, and by the Grant-in-Aid for Scientific
be generated by a primitive recursive function can be imple-           Research (#25240012) from MEXT, Japan.
mented by some m ∈ M . The information shown and the
payoffs can be computed if m is a Turing machine.         2            7.   REFERENCES
   Note that being Turing complete is not a sufficient con-             [1] S. Abiteboul, R. Hull, V. Vianu: Foundations of
                                                                            Databases. Addison-Wesley 1995.
dition to be able to implement G∗ , because the power of                [2] A. Doan, R. Ramakrishnan, A. Y. Halevy. “Crowdsourcing
interactions with humans does not matter for a language to                  systems on the World-Wide Web. Commun.ACM. 2011,
be Turing complete. G∗ contains indefinite-length sequential                vol. 54, no. 4, p. 86-96.
games (in game theory) that can be expressed with Turing                [3] Amazon Mechanical Turk, https://www.mturk.com/.
machines that are able to interact with humans at any step              [4] G. Little, L. B. Chilton, M. Goldman, R. C. Miller.
                                                                            “Turkit: human computation algorithms on mechanical
of its execution.
                                                                            turk”. Proc. UIST (2010), ACM, New York, 57-66.
   From the definition of the CTS rules, the following holds.           [5] S. Jain, D. C. Parkes. “A game-theoretic analysis of the
                                                                            ESP game”. ACM Trans. Economics and Comput. 1(1): 3
Theorem 4.       The CTS abstraction can implement any
                                                                            (2013)
games in G∗ .                                        2                  [6] H. R. Lewis, C. H. Papadimitriou. “Elements of the theory
4.3   Comparison of Expressive Powers                                       of computation.” Prentice-Hall, Englewood Cliffs, New
                                                                            Jersey, 1981
   Table 1 compares the expressive powers of different ab-              [7] A. Marcus, E. Wu, D. Karger, S. Madden, and R. Miller.
stractions and frameworks. Games implemented by manu-                       “Human-powered sorts and joins”. In VLDB, 2012.
ally registering HITs to MTurk are contained in G1 , whereas            [8] M. J. Franklin., D. Kossmann, T. Kraska, S. Ramesh, R.
code written in programming languages that uses MTurk                       Xin. “CrowdDB: answering queries with crowdsourcing”.
                                                                            SIGMOD Conference. 2011, p. 61-72.
APIs can implement games in G∗ . CrowdForge can imple-                  [9] A. Morishima, N. Shinagawa, S. Mochizuki. “The Power of
ment a part of the games in GN , while its expressive power                 Integrated Abstraction for Data-centric Human/Machine
is not larger than GN , because it is not Turing complete.                  Computations”. VLDS2011, pp. 5-9.
In particular, games implemented by a combination of par-              [10] A. Morishima, N. Shinagawa, T. Mitsuishi, H. Aoki, S.
tition, map, and reduce are contained in GN with N = 3.                     Fukusumi. “CyLog/Crowd4U: A Declarative Platform for
                                                                            Complex Data-centric Crowdsourcing”. PVLDB 5(12):
Similarly, the class of games that CrowdDB, Qurk, and Deco                  1918-1921 (2012).
can implement is not larger than GN .                                  [11] P. Minder, A. Bernstein. “CrowdLang: A Programming
                                                                            Language for the Systematic Exploration of Human
5.    PROTOTYPE SYSTEM                                                      Computation Systems”. SocInfo 2012: 124-137
   We implemented the Crapid system, a prototype system                [12] N. W. Paton, O. Di’az. “Active Database Systems”. ACM
                                                                            Comput. Surv. 31(1): 63-103 (1999)
to develop crowdsourcing systems using the CTS abstrac-                [13] H. Park, R. Pang, A. G. Parameswaran, H. Garcia-Molina,
tion. Crapid takes as input a set of CTS rules and outputs                  N. Polyzotis, J. Widom. “An overview of the deco system:
executable code. Crapid supports various task-template                      data model and query language; query processing and
generators (i.e., Entry, Choice, Decision, and Comparisons                  optimization”. SIGMOD Record 41(4): 22-27 (2012)
as of May 2013) and payoff functions to help users easily              [14] A. Kittur, B. Smus, S. Khamkar, R. E. Kraut.
                                                                            “CrowdForge: crowdsourcing complex work”. UIST 2011:
define microtasks in CTS rules. Crapid provides a form-                     43-52
based user interface. When a user chooses a task-template              [15] T. Roughgarden. “Algorithmic game theory”. Commun.
generator, Crapid provides an appropriate set of selection                  ACM 53(7): 78-86 (2010)
boxes and drop-down menus to specify CTS rules. The out-               [16] F. Vega-Redondo. Economics and Theory of Games,
put code is executable on Crowd4U [10], a crowdsourcing                     Cambridge University Press, 2003.
platform deployed at universities.

                                                                  6
                                                                  25