<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Palm-tree Index: Indexing with the crowd∗</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ahmed R. Mahmood</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Saleh Basalamah
Umm Al-Qura University</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eduard Dragut Purdue University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Purdue University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <fpage>2</fpage>
      <lpage>7</lpage>
      <abstract>
        <p>Crowdsourcing services allow employing human intelligence in tasks that are difficult to accomplish with computers such as image tagging and data collection. At a relatively low monetary cost and through web interfaces such as Amazon' s Mechanical Turk (AMT), humans can act as a computational operator in large systems. Recent work has been conducted to build database management systems that can harness the crowd power in database operators, such as sort, join, count, etc. The fundamental problem of indexing within crowdsourced databases has not been studied. In this paper, we study the problem of tree-based indexing within crowd-enabled databases. We investigate the effect of various tree and crowdsourcing parameters on the quality of index operations. We propose new algorithms for index search, insert, and update.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Crowdsourcing is the practice of solving large problems by
dividing them into smaller tasks, each of which is then solved by
humans from an online community. The tasks are usually difficult
to be performed automatically by a computer as they require
human intelligence. Typical crowdsourcing tasks involve labelling,
ranking, data cleaning, data filtering, data collection, and entity
matching (e.g., see [
        <xref ref-type="bibr" rid="ref11 ref14 ref15 ref2 ref5 ref6">11, 6, 2, 15, 5, 14</xref>
        ]). Websites, e.g., Amazon’ s
Mechanical Turk (MTurk) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] provide an infrastructure for
organizations to submit numerous micro-tasks and collect their results
after they are fulfilled by human workers recruited at those
websites. In a typical crowdsourced application, tasks are replicated
and are answered by multiple people to avoid single user errors.
Each person answering a task incurs a (monetary) cost. Thus, a
challenging problem for crowdsourcing is that of optimizing the
number of tasks while maintaining the accuracy of the results.
      </p>
      <p>Consider the following scenario where a crowdsourcing
treebased index can be useful. Assume that a car repair shop wants
to provide an online service that allows users to submit pictures of
their damaged cars to get an estimate of repair cost of their car. The
∗This work was partially supported by the National Science
Foundation under Grants III-1117766 and IIS-0964639.</p>
      <p>Copyright © 2013 for the individual papers by the papers’ authors. Copying
permitted for private and academic purposes. This volume is published and
copyrighted by its editors.
repair shop maintains images of cars previously repaired associated
their actual repair cost. A good repair estimate would be the actual
repair cost of previously repaired car of similar condition. Building
a tree index on images of previously repaired cars sorted based on
their repair cost allows performing crowdsourcing-based similarity
queries on the index. These queries help users to find cars with
similar conditions and hence anticipate the same repair cost to their
own damaged car. The key to the success of this scenario is that
the cost of repair is directly proportional to the condition of the car.
The same idea can be applied when estimating the selling price of a
used car, the cleaning cost of rental rooms, the placement of a new
soccer player in player rankings or the ranking of new scientific
publications, etc.</p>
      <p>In this paper, we introduce the palm-tree index; a
crowdsourcing tree-based index. We call our proposed crowdsourcing
treebased index palm-tree index as real palm trees also need humans
in order to move the pollen from one male tree to the other female
trees to produce fruits (dates). The palm-tree index aims at
indexing keys based on properties that need human intelligence, e.g., to
compare images against each other in order to descend and
navigate the index. Crowdsourcing-based tree indexing is useful for
(1) the ordering of a set of keys based on a subjective property,
(2) performing index nested- loops joins, and (3) answering range
queries, among other typical uses of an index. The problem of
crowdsourcing-based indexing is challenging because of the
following issues. First, crowd-based comparisons are subjective and
are prone to error. Hence, we need to find strategies that give the
most accurate and consistent results. Second, comparison tasks
performed by the crowd incur a (monetary) cost. Hence, optimizing
the number of tasks while preserving accuracy is very important.</p>
      <p>The contributions of this paper are summarized as follows:</p>
      <p>We introduce the problem of crowdsourcing tree-based
indexing as a technique for ordering a set of items with
“subjective” keys.</p>
      <p>We present a taxonomy for crowdsourcing tree-based
indexes.</p>
      <p>We introduce the palm-tree index, a crowdsourced index,
along with several accompanying index traversal algorithms.
We study the quality of the results retrieved using the
palmtree index.</p>
      <p>The rest of this paper is organized as follows. Section 2
introduces the taxonomy for crowdsourcing tree-based indexes.
Section 3 presents notations used throughout the paper. Section 4
presents the palm-tree index and its search algorithms. Section 5
analyzes the proposed algorithms. Preliminary experimental results
are given in Section 6. Related work is presented in Section 7.
Section 8 contains concluding remarks.</p>
    </sec>
    <sec id="sec-2">
      <title>PROBLEM TAXONOMY</title>
      <p>
        In this section, we introduce a taxonomy for crowdsourcing
treebased indexes. This taxonomy can be seen as extension to the
one presented in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which addresses only worker motivation, task
replication and task assignment. Our taxonomy adds concepts
related to tree indexing (e.g., tree height) and to crowdsourcing in
general (e.g., worker experience). Figure 1 depicts the taxonomy.
      </p>
      <p>Tree
height
x Balanced
x Unbalanced</p>
      <p>Crowdsourcing
tree-based
indexing</p>
      <p>Task
replication
x Single
x Replicated</p>
      <p>Task
assignment
x User select
x Sever Assign</p>
      <p>Task
submission
x Batch
x Sequential
x Hybrid
Worker Number of
motivation Dimensions
x Volunteer x One
x Rewarded x Multi</p>
      <p>Cost
bounds
x Unlimited
x limited</p>
      <p>Tree height It specifies whether the tree is balanced or not. The
proposed palm-tree index is a balanced B+-tree-lik e
structure to obtain predictive query cost. More detail about query
cost is given in Section 3.</p>
      <p>Number of dimensions An index can be either one-dimensional
or multi-dim ensional. In this paper we consider
onedimensional indexing.</p>
      <p>
        Worker motivation In some situations, workers may be willing
to perform tasks for free, e.g., see [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], However, in most
instances workers seek rewards for their services. This raises
several issues, e.g., budget estimation and the presence of
low quality work due to workers solving tasks quickly to
obtain more money. In this paper we consider rewarded
workers.
      </p>
      <p>
        Worker experience A simple model to describe workers is to
assume that all workers have equal experience, e.g., as in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
An alternative and viable model is to differentiate workers
based on their level of expertise, e.g., as in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In the
palmtree index, workers are assumed to have indistinguishable,
i.e., equal, expertise. We plan to extend our analysis to the
general case when workers have variable experience. One
special type of workers is a malicious worker. A malicious
worker aims at degrading output quality by providing
intentionally faulty and misleading answers. Coping with
malicious workers is not addressed in this paper.
      </p>
      <p>Task assginment There are two cases to consider: (1) The worker
chooses the tasks to work on, or (2) the tasks are
automatically assigned to the worker without the workers interference
in the selection. This dimension is particularly important
when workers are of variable expertise. For example, it is
important to avoid assigning difficult tasks to inexperienced
workers. In this paper, we assume Case 2, above.
Task replication In a single task no-repl ication model, workers
are trusted to deliver high quality answers. In many other
situations, workers’ responses incur error. The Replicated
task model allows aggregating multiple workers’ opinions to
increase the quality of the results. In this paper, we assume
the replicated model.</p>
      <p>Task submission model A task submission model describes how
replications of the same task are submitted to workers. One
alternative is to submit all task replicas in a single batch. This
technique can reduce the response time to get the final results
as task replicas are solved by multiple workers concurrently.
Nonetheless, one may end up submitting more tasks than
needed. In order to reduce the cost needed for simple tasks,
the replicas of same task can be submitted sequentially. In
the sequential model, a replica of a task is submitted only
when the outcome of the previous replicas does not produce
results that are of acceptable quality. The sequential
technique increases the task response time as replicas wait for
each other. A hybrid submission model is a middle-ground
between both alternatives. In the hybrid model, small-sized
batches of the same task are submitted sequentially. In this
paper, we only consider the batch task submission model,
i.e., that replica tasks are submitted in parallel to the
designated workers at the same time.</p>
      <p>Cost bounds Having unlimited cost per worker is a rare
occurrence. It may occur when the quality of the result is of major
importance. Typically, cost for tasks are limited and
optimizing the overall task error within cost bounds is of major
importance. In this paper, we consider the limited cost case.</p>
    </sec>
    <sec id="sec-3">
      <title>PRELIMINARIES AND INDEX MODEL</title>
    </sec>
    <sec id="sec-4">
      <title>Problem definition</title>
      <p>Let S be a set of N items and q be a query item. The keys of
these items are subjective or imprecise (e.g., the item is the photo
of a damaged car and the key is the cost of repair for this car). We
need to address the following issues:</p>
      <p>Order the items in S according to their keys.
• Construct (build) an index on S.
• Submit query q on S to the crowd.</p>
      <p>Aggregate crowds responses to query q to produce a final
result .
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Index model</title>
      <p>The structure of the palm- tree index is composed of two
components: the index manager and a B+-tree. Figure 2 gives the main
components of the palm-tree index. We distinguish between two
main concepts in the palm-tree; jobs and tasks. A job is the unit
of work submitted by a palm-tree user to perform either a query or
an insertion on the indexed data. A task or HIT (Human Intelligent
Task) is the smallest unit of work assigned to a worker. Typically,
a job involves generating multiple tasks.
3.2.1</p>
      <sec id="sec-5-1">
        <title>The B+-tree Component</title>
        <p>
          The palm-tree index is built on top of a B+-tree index. Each
node in the B+-tree has n ordered keys and represents one task.
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] observes that human beings are capable of performing n-ary
operations with ease. Therefore, we give to a worker a sorted list
A of n items, e.g., images or videos, along with the query item q,
e.g., a query image or video, and ask the worker to place q in A
based on how q compares to the items in A. For instance, A can be
a list of 5 pictures of cars sorted according to the extent to which
the car is damaged, while q can be the picture of a new damaged
car. The fanout f of the B+-tree (the maximum allowed children
of a node other than the root) naturally captures the maximum
nary operations that a human can accomplish with ease. Clearly, a
human being can handle a lot easier a list A with 5 pictures than one
with 20. Determining the optimal f for a given problem requires
an initial phase of training. We discuss this issue in more detail in
Section 5.
        </p>
        <p>Construction (building) of the palm-tree depends on the type of
indexed keys. We have two types of keys: quantitative and
qualitative keys.</p>
        <p>DEFINITION 3.1. A quantitative key is a key to be indexed
that has two proportional properties, namely, a subjective property
and an assessed value. For quantitative keys, the palm-tree index
can be constructed by sorting keys based on the assessed values
then bulkload the index.</p>
        <p>DEFINITION 3.2. A qualitative key is a key to be indexed
based on a subjective property only. It does not have any
associated assessed value. For qualitative keys, a palm-tree index can
be constructed only using successive insertions and human-based
comparisons.</p>
        <p>One example of a quantitative key is images of damaged cars
associated with actual repair costs. The degree of car damage is the
subjective property of the key while the repair cost is the
quantitative key. An example of a qualitative key is images of butterflies
with an ordering based on beauty as a subjective property.
Regular B+-tree splitting/merging algorithms are used during
insertion/deletion into/from the B+-tree. For both types of keys, queries
use human intelligence to make comparisons needed to search the
B+-tree. All comparisons are based on the subjective property only.
3.2.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Index manager</title>
        <p>The palm-tree index manager is responsible for tree
construction and query processing within the palm-tree. It initiates jobs for
users’ requests, generates tasks for every job, assigns tasks to
workers, and aggregates workers’ responses. We use majority voting to
aggregate responses of workers.</p>
        <p>In the palm-tree index, a task consists of items (e.g., pictures) in
a node in the B+-tree. A worker is asked to choose the best location
for a query item among these items. Figure 3 gives an example task
else
end</p>
        <p>In this technique, a job is replicated to K workers. A worker
performs the index search by descending the tree from the root to
the leaf based solely on this workers own decisions or assessments
to find the best match to the query item. All workers responses and
final results are aggregated only at the leaf level. Algorithm 2 gives
an outline of the leaf-only aggregation strategy.</p>
        <p>Algorithm 2: handle leaf only aggregation(Task t)
if t.level 6= 0 then // task at non-leaf level
tree ← get subtree(t.response,t.tree)
t← create task(tree,t.level-1,t.q,t.w)
submit task(t,handle leaf only aggregation)
end
solved++
if solved == k then // all workers finished
result = ← aggregate all results
return result
4.2</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>All-levels Aggregation Strategy</title>
      <p>An alternative search strategy is to move down the tree level by
level. The path from the root to the final search position is
collectively determined by the crowd as follows. The search is started at
the root where K0 tasks are generated at this level. After all
workers complete their tasks at the root level, their answers are
aggregated, say by majority voting, and the child node that corresponds
to the item with the highest vote is selected. Let this node be v. We
generate K1 new tasks for v. We aggregate the answers from the
K1 workers and decide which of the children of v to go to next. We
repeat the process until we reach the leaves.</p>
      <p>There are two variations of this search technique: uninformed
and informed workers. In the former, a worker is not aware of
the other workers’ decisions. In the latter, a worker is allowed to
view a summary of the other workers’ responses to the current task.
Algorithm 3 provides an outline of this search procedure.
Algorithm 3: handle all levels aggregation(Task t)
solved++
if solved == k then // all workers finished at
this level
result = ← aggregate all results
if t.level 6= 0 then // task at non leaf level
k← get replications(tree,t.level-1)
tree ← get subtree(result,t.tree)
for i = 1 to k do
w ← choose worker()
t← create task(tree,t.level-1,t.q,w)
submit task(t,handle all levels aggregation)
else
end
end
return result
end
4.3</p>
    </sec>
    <sec id="sec-7">
      <title>All-levels Aggregation</title>
    </sec>
    <sec id="sec-8">
      <title>Backtracking</title>
    </sec>
    <sec id="sec-9">
      <title>Strategy with</title>
      <p>The two search procedures above, namely the leaf-only and
alllevels aggregation strategies, cannot compensate for an error.
Backtracking is one way to correct potentially wrong decisions of the
workers. A priority queue of pointers to the unexplored nodes in
the index. Unexplored nodes are ordered in the priority queue
according to their height (or level) in the tree. Nodes at the same
level in the B+-tree index are ordered by the percentage of votes
given to these nodes. To detect bad decisions, workers are shown
progressively the leaf node bounds of the current sub-tree at hand.
If the workers indicate that the position of the query key falls
outside the range of the keys in leaf nodes, then the current sub-tree is
abandoned and another node is picked from the top of the priority
queue to resume the search.</p>
      <p>Figure 4 gives an example of the backtracking search strategy. In
Figure 4(a), a palm-tree of fanout 2 is used to index keys 1 through
8. Figure 4(b) illustrates the steps of traversing the tree. The search
starts at the root node A. Node B is at the top of the priority queue
withas 60% of the workers indicate this node. The priority queue
keeps track of other alternatives besides B. In next step we show the
range (i.e., the min and max) of the keys in the sub-tree rooted at B,
that is from 1 to 4. Assume that the workers indicate that the query
key is larger than 4; then we have an error. Hence, we backtrack
to node C. We insert in the queue node D. Algorithm 4 outlines the
search steps.</p>
      <p>Algorithm 4: handle backtrack all levels aggregation(Task
t)
solved++
if solved == k then // all workers finished at
this level
result = ← aggregate all results
if result outside range of current subtree then</p>
      <p>tree ← pop queue()
else</p>
      <p>tree ← get subtree(result,t.tree)
end
push queue(other voting alternatives)
if t.level 6= 0 then // task at non-leaf level
k← get replications(tree,t.level-1)
for i = 1 to k do
w ← choose worker()
t← create task(tree,t.level-1,t.q,w)
submit task(t,handle backtrack all levels aggregation)
end
else
end
end
return result</p>
    </sec>
    <sec id="sec-10">
      <title>ANALYSIS</title>
      <p>In this section, we study how to set the parameters of the
palmtree index: tree order and cost distribution. We also introduce
the main performance metrics of the palm-tree index, mainly, error
and cost.
5.1</p>
    </sec>
    <sec id="sec-11">
      <title>Performance metrics</title>
      <p>The location of a key is the rank of the key in the ordered list
of keys at the leaf level of the tree. Let q be a query key with a
ground truth location, say loctruth, at the leaf level, and locret be
the retrieved location using one of the search strategies.</p>
      <p>DEFINITION 5.1. Error E is the distance between ground
truth location and retrieved location of the query key
E= | loctruth − locret |
A</p>
      <p>P5
D
4
(a)
B</p>
      <p>P3</p>
      <p>P2
1
2
3</p>
      <p>P4</p>
      <p>C</p>
      <p>P7</p>
      <p>P6
5
6</p>
      <p>P8
7
8
step 1
step 2
step 3
60%</p>
      <p>40%
P5
P1
P5</p>
      <p>P3
P7
(b)
30%</p>
      <p>70%
P4
P8</p>
      <p>Priority queue
{}
{&lt;C,60%&gt;}
{&lt;D,30%&gt;}</p>
      <p>DEFINITION 5.2. Cost C is the total number of tasks to
complete a job.
5.2</p>
    </sec>
    <sec id="sec-12">
      <title>Tree order and height</title>
      <p>The order (fanout) of a regular B+-tree index is usually
constrained by the size of the disk page. In contrast, in the palm-tree
index, the order of the B+-tree depends on the ability of workers
to process at once (in a single task) a specific number of keys (see
Section 3).</p>
      <p>Notice that increasing the order of the tree increases a worker’ s
probability of making a wrong decision at a given node. However,
from a different point of view, in order to get a correct final result,
correct decisions must be made at all levels of the tree. Increasing
the tree order reduces number of levels required to be correct and
hence reduces the final probability of error.
5.3</p>
    </sec>
    <sec id="sec-13">
      <title>Cost selection</title>
      <p>We adopt majority voting to aggregate workers decisions.
Therefore, increasing the number of replicated tasks increases the quality
of the aggregated decisions. However, there is almost a
saturation point beyond which, increasing the number of replicated tasks
would be of little, if any, benefit.
5.4</p>
    </sec>
    <sec id="sec-14">
      <title>Cost distribution</title>
      <p>Assume that there is a cost bound, say C, to search the index. An
important issue is how this cost is distributed among tasks in terms
of the number of replicas per tree level as tasks are assigned to
workers. In the leaf-only aggregation strategy, every worker has to
perform exactly h tasks from the root level to the leaf level. Hence,</p>
      <p>C
we can only have k = ⌊ h ⌋ workers at most.</p>
      <p>In the all-le vels aggregation strategy, we propose two techniques
to distribute tasks among level, namely even and probabilistic
distribution. In the even distribution technique, every level in the</p>
      <p>C
palm-tree index is assigned ⌊ h ⌋ tasks. Even cost distribution
assumes equal importance of all the levels of the palm-tree index.
However, tree levels are of different importance e.g., when a wrong
decision is made at the higher levels (closer to the root level) of
the palm-tree index, it would result in a higher final error. This is
not the case when wrong decisions are made at the lower levels of
the palm-tree index. On the other hand, wrong decisions are more
likely to occur at the lower levels than at the higher levels. The
reason is that spacing in-between the keys within a node decreases
as we descend the tree.</p>
      <p>To accommodate for the effect of nodes level on error severity
is to replicate tasks in a way that is proportional to the expected
distance error per level.</p>
      <p>DEFINITION 5.3. Probability of distance d error at level l
(Pdl) is the probability of deviating d branches from the ground
truth branch at level l.</p>
      <p>We estimate the final error that would result from a distance d
error at level l to be d × f l−1. For example, a distance 1 error at
level 1 (i.e., the leaf level) deviates the final result from the correct
one by distance 1. A distance 1 error at level h (i.e., the root level)
deviates final result about the number of leaf keys in a child subtree
of the root node, that is f h−1. We calculate the expected distance
error at level l (EEl) to be P d × Pdl × f l−1.</p>
      <p>d</p>
      <p>Techniques for cost distribution for level-by-le vel voting with
backtracking is left for future investigation.
6.</p>
    </sec>
    <sec id="sec-15">
      <title>EXPERIMENTAL EVALUATION</title>
      <p>In order to test the palm-tree index and its search strategies, we
implement a web site to submit tasks to workers. This helps in
further studying the problem under controlled settings. Two datasets
are used in the experiments; the first dataset is a set of 200 square
images of different sizes. The second dataset is a set of 1300 used
cars images with desired selling prices. A custom-b uilt web crawler
is used to collect the dataset of cars from a website of used car ads.
(a) Squares dataset
(b) Cars dataset
!
!
"!
"!
squares dataset. The reason is that it is easier for ordinary users
to compare images of squares based on their sizes than it is to
compare cars based on their prices. It is also clear that all-le vels
aggregation outperforms leaf-only aggregation in both datasets. The
reason is that in the leaf-only aggregation strategy, if a worker makes a
wrong decision at a higher level of the tree, this error affects more
significantly his/her answer in the lower levels of the tree.</p>
      <p>In the all-le vels aggregation, if a worker makes a wrong
decision, then other workers can usually compensate for it (We use the
majority principle). It is also evident from Figure 5 that increasing
the number of workers reduces the error rate.</p>
      <p>Figure 6 gives the mean cost needed to perform a query on the
palm-tree. The cost for search algorithms is higher on the cars
dataset than on the squares dataset. The reason is that the size (and
accordingly tree height) of the cars dataset is larger than the that of
the squares dataset. It is evident that increasing the number of
replications increases the cost needed. It is also evident that increasing
the tree fanout generally reduces the cost as the height of the tree
decreases. One technique used to reduce the cost is to allow
workers to stop at not leaf levels if they agree that a key at a non-leaf
node matches the search key.</p>
      <p>We are in the process of implementing more extensive
experiments that will include larger crowds and other search
strategies (i.e., informed all-le vels aggregation and all-le vels aggregation
with backtracking). We plan to study the effect of cost distribution
alternatives (even and expected distance errors). We also plan to
study the quality of sorts and joins using the palm-tree index.</p>
    </sec>
    <sec id="sec-16">
      <title>RELATED WORK</title>
      <p>
        Lately, there has been an increasing interest in crowdsourcing.
Several crowd-po wered database systems, e.g., [
        <xref ref-type="bibr" rid="ref11 ref12 ref15 ref6">6, 15, 12, 11</xref>
        ] have
been proposed to use human intelegance to perform database
operations. These operations mainly focus on sorts and joins, e.g., [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
counting, e.g., [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and max-findi ng, e.g., [
        <xref ref-type="bibr" rid="ref16 ref7">7, 16</xref>
        ]. Other works
focus on algorithms to answer top-K, e.g., [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], skyline queries,
e.g., [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and spatial queries, e.g., [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        The problem of optimizing the number of questions needed to
find a set of target nodes within a general directed acyclic graph
(DAG) using yes/no questions is studied in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. However, the B+
tree within the palm-tree index posseses more specific properties
than a general DAG (i.e., balanced height, and multiple ordered
keys within nodes) that requires more specific strategies for tree
search and task aggregation methods (Section 4) than in the case
when yes/no questions are used.To our knowledge, this is the first
work to address the problem of indexing with the crowd.
      </p>
    </sec>
    <sec id="sec-17">
      <title>CONCLUSION</title>
      <p>In this paper, we study the problem of crowdsourcing-based
indexing. We motivate the problem with several application
scenarios. We propose a taxonomy of the problem and highlight its
main challenges. We propose techniques for index construction
and querying. We intend to complete this study with an extensive
analysis and experimental evaluation.</p>
      <p>Our current work focuses on one-dimensional indexing. We plan
to study other variations of crowdsourced indexing, such as
multidimensional indexing, spatial and spatio-temporal indexing and
fuzzy indexing.
9.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] The Amazon Mturk website</article-title>
          . https://www.mturk.com,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bozzon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brambilla</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ceri</surname>
          </string-name>
          .
          <article-title>Answering search queries with CrowdSearcher</article-title>
          .
          <source>In WWW</source>
          , pages
          <fpage>1009</fpage>
          -
          <lpage>1018</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bozzon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brambilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ceri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Silvestri</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Vesci.</surname>
          </string-name>
          <article-title>Choosing the right crowd: expert finding in social networks</article-title>
          .
          <source>In EDBT</source>
          , pages
          <fpage>637</fpage>
          -
          <lpage>648</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Davidson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khanna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Milo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Roy</surname>
          </string-name>
          .
          <article-title>Using the crowd for top-k and group-by queries</article-title>
          .
          <source>In ICDT.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Demartini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Difallah</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Cudre</surname>
          </string-name>
          ´
          <article-title>-Mauroux. ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking</article-title>
          .
          <source>In WWW</source>
          , pages
          <fpage>469</fpage>
          -
          <lpage>478</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Franklin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kossmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kraska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Xin</surname>
          </string-name>
          .
          <article-title>CrowdDB: answering queries with crowdsourcing</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <fpage>61</fpage>
          -
          <lpage>72</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Parameswaran</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          .
          <article-title>So who won?: dynamic max discovery with the crowd</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <fpage>385</fpage>
          -
          <lpage>396</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kazemi</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Shahabi</surname>
          </string-name>
          .
          <article-title>Geocrowd: enabling query answering with spatial crowdsourcing</article-title>
          .
          <source>In Proceedings of the 20th International Conference on Advances in Geographic Information Systems</source>
          , pages
          <fpage>189</fpage>
          -
          <lpage>198</lpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lofi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>El Maarry</surname>
          </string-name>
          , and W.-T . Balke.
          <article-title>Skyline queries in crowd-enabled databases</article-title>
          .
          <source>In EDBT</source>
          , pages
          <fpage>465</fpage>
          -
          <lpage>476</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Marcus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Karger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Madden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Miller</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Oh</surname>
          </string-name>
          .
          <article-title>Counting with the crowd</article-title>
          .
          <source>In PVLDB</source>
          , pages
          <fpage>109</fpage>
          -
          <lpage>120</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Marcus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Karger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Madden</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>Human-po wered sorts and joins</article-title>
          .
          <source>VLDB Endow</source>
          .,
          <volume>5</volume>
          :
          <fpage>13</fpage>
          -
          <lpage>24</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Marcus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Karger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Madden</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>Crowdsourced databases: Query processing with people</article-title>
          .
          <source>CIDR</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Parameswaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Sarma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Polyzotis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <article-title>Human-assisted graph search: it' s okay to ask questions</article-title>
          .
          <source>VLDB Endow</source>
          .,
          <volume>4</volume>
          :
          <fpage>267</fpage>
          -
          <lpage>278</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Parameswaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Polyzotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <article-title>Crowdscreen: algorithms for filtering data with humans</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <fpage>361</fpage>
          -
          <lpage>372</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Parameswaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Polyzotis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <article-title>Deco: A system for declarative crowdsourcing</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>5</volume>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Venetis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Huang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Polyzotis</surname>
          </string-name>
          .
          <article-title>Max algorithms in crowdsourcing environments</article-title>
          .
          <source>In Proceedings of the 21st international conference on World Wide Web, WWW '12</source>
          , pages
          <fpage>989</fpage>
          -
          <lpage>998</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>