DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing The Palm-tree Index: Indexing with the crowd∗ Ahmed R. Mahmood Walid G. Aref Eduard Dragut Purdue University Purdue University Purdue University amahmoo@purdue.edu aref@cs.purdue.edu edragut@purdue.edu Saleh Basalamah Umm Al-Qura University smbasalamah@uqu.edu.sa ABSTRACT repair shop maintains images of cars previously repaired associated Crowdsourcing services allow employing human intelligence in their actual repair cost. A good repair estimate would be the actual tasks that are difficult to accomplish with computers such as image repair cost of previously repaired car of similar condition. Building tagging and data collection. At a relatively low monetary cost and a tree index on images of previously repaired cars sorted based on through web interfaces such as Amazon’s Mechanical Turk (AMT), their repair cost allows performing crowdsourcing-based similarity humans can act as a computational operator in large systems. Re- queries on the index. These queries help users to find cars with cent work has been conducted to build database management sys- similar conditions and hence anticipate the same repair cost to their tems that can harness the crowd power in database operators, such own damaged car. The key to the success of this scenario is that as sort, join, count, etc. The fundamental problem of indexing the cost of repair is directly proportional to the condition of the car. within crowdsourced databases has not been studied. In this paper, The same idea can be applied when estimating the selling price of a we study the problem of tree-based indexing within crowd-enabled used car, the cleaning cost of rental rooms, the placement of a new databases. We investigate the effect of various tree and crowdsourc- soccer player in player rankings or the ranking of new scientific ing parameters on the quality of index operations. We propose new publications, etc. algorithms for index search, insert, and update. In this paper, we introduce the palm-tree index; a crowdsourc- ing tree-based index. We call our proposed crowdsourcing tree- based index palm-tree index as real palm trees also need humans 1. INTRODUCTION in order to move the pollen from one male tree to the other female Crowdsourcing is the practice of solving large problems by di- trees to produce fruits (dates). The palm-tree index aims at index- viding them into smaller tasks, each of which is then solved by ing keys based on properties that need human intelligence, e.g., to humans from an online community. The tasks are usually difficult compare images against each other in order to descend and nav- to be performed automatically by a computer as they require hu- igate the index. Crowdsourcing-based tree indexing is useful for man intelligence. Typical crowdsourcing tasks involve labelling, (1) the ordering of a set of keys based on a subjective property, ranking, data cleaning, data filtering, data collection, and entity (2) performing index nested-loops joins, and (3) answering range matching (e.g., see [11, 6, 2, 15, 5, 14]). Websites, e.g., Amazon’s queries, among other typical uses of an index. The problem of Mechanical Turk (MTurk) [1] provide an infrastructure for orga- crowdsourcing-based indexing is challenging because of the fol- nizations to submit numerous micro-tasks and collect their results lowing issues. First, crowd-based comparisons are subjective and after they are fulfilled by human workers recruited at those web- are prone to error. Hence, we need to find strategies that give the sites. In a typical crowdsourced application, tasks are replicated most accurate and consistent results. Second, comparison tasks per- and are answered by multiple people to avoid single user errors. formed by the crowd incur a (monetary) cost. Hence, optimizing Each person answering a task incurs a (monetary) cost. Thus, a the number of tasks while preserving accuracy is very important. challenging problem for crowdsourcing is that of optimizing the The contributions of this paper are summarized as follows: number of tasks while maintaining the accuracy of the results. Consider the following scenario where a crowdsourcing tree- • We introduce the problem of crowdsourcing tree-based in- based index can be useful. Assume that a car repair shop wants dexing as a technique for ordering a set of items with “sub- to provide an online service that allows users to submit pictures of jective” keys. their damaged cars to get an estimate of repair cost of their car. The • We present a taxonomy for crowdsourcing tree-based in- ∗This work was partially supported by the National Science Foun- dexes. dation under Grants III-1117766 and IIS-0964639. • We introduce the palm-tree index, a crowdsourced index, along with several accompanying index traversal algorithms. • We study the quality of the results retrieved using the palm- tree index. The rest of this paper is organized as follows. Section 2 intro- duces the taxonomy for crowdsourcing tree-based indexes. Sec- tion 3 presents notations used throughout the paper. Section 4 Copyright © 2013 for the individual papers by the papers’ authors. Copying permitted for private and academic purposes. This volume is published and presents the palm-tree index and its search algorithms. Section 5 copyrighted by its editors. analyzes the proposed algorithms. Preliminary experimental results 1 26 DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing are given in Section 6. Related work is presented in Section 7. Sec- Task replication In a single task no-replication model, workers tion 8 contains concluding remarks. are trusted to deliver high quality answers. In many other situations, workers’ responses incur error. The Replicated 2. PROBLEM TAXONOMY task model allows aggregating multiple workers’ opinions to increase the quality of the results. In this paper, we assume In this section, we introduce a taxonomy for crowdsourcing tree- the replicated model. based indexes. This taxonomy can be seen as extension to the one presented in [8], which addresses only worker motivation, task Task submission model A task submission model describes how replication and task assignment. Our taxonomy adds concepts re- replications of the same task are submitted to workers. One lated to tree indexing (e.g., tree height) and to crowdsourcing in alternative is to submit all task replicas in a single batch. This general (e.g., worker experience). Figure 1 depicts the taxonomy. technique can reduce the response time to get the final results as task replicas are solved by multiple workers concurrently. Nonetheless, one may end up submitting more tasks than Crowdsourcing tree-based needed. In order to reduce the cost needed for simple tasks, Tree indexing Worker the replicas of same task can be submitted sequentially. In experience height the sequential model, a replica of a task is submitted only x Equal x Balanced x Variable when the outcome of the previous replicas does not produce x Unbalanced Task Task x Malicious results that are of acceptable quality. The sequential tech- assignment submission nique increases the task response time as replicas wait for x Batch x User select x Sequential each other. A hybrid submission model is a middle-ground x Sever Assign x Hybrid between both alternatives. In the hybrid model, small-sized Task replication Worker motivation Number of Dimensions Cost bounds batches of the same task are submitted sequentially. In this x Single x Volunteer x One x Unlimited paper, we only consider the batch task submission model, x Replicated x Rewarded x Multi x limited i.e., that replica tasks are submitted in parallel to the desig- nated workers at the same time. Figure 1: A taxonomy of crowdsourcing tree-based indexing. The concepts utilized in this paper are shown in boldface. Cost bounds Having unlimited cost per worker is a rare occur- rence. It may occur when the quality of the result is of major importance. Typically, cost for tasks are limited and opti- Tree height It specifies whether the tree is balanced or not. The mizing the overall task error within cost bounds is of major proposed palm-tree index is a balanced B+ -tree-like struc- importance. In this paper, we consider the limited cost case. ture to obtain predictive query cost. More detail about query cost is given in Section 3. 3. PRELIMINARIES AND INDEX MODEL Number of dimensions An index can be either one-dimensional 3.1 Problem definition or multi-dimensional. In this paper we consider one- dimensional indexing. Let S be a set of N items and q be a query item. The keys of these items are subjective or imprecise (e.g., the item is the photo Worker motivation In some situations, workers may be willing of a damaged car and the key is the cost of repair for this car). We to perform tasks for free, e.g., see [8], However, in most in- need to address the following issues: stances workers seek rewards for their services. This raises • Order the items in S according to their keys. several issues, e.g., budget estimation and the presence of low quality work due to workers solving tasks quickly to ob- • Construct (build) an index on S. tain more money. In this paper we consider rewarded work- ers. • Submit query q on S to the crowd. Worker experience A simple model to describe workers is to as- • Aggregate crowds responses to query q to produce a final sume that all workers have equal experience, e.g., as in [11]. result . An alternative and viable model is to differentiate workers based on their level of expertise, e.g., as in [3]. In the palm- 3.2 Index model tree index, workers are assumed to have indistinguishable, The structure of the palm-tree index is composed of two compo- i.e., equal, expertise. We plan to extend our analysis to the nents: the index manager and a B+ -tree. Figure 2 gives the main general case when workers have variable experience. One components of the palm-tree index. We distinguish between two special type of workers is a malicious worker. A malicious main concepts in the palm-tree; jobs and tasks. A job is the unit worker aims at degrading output quality by providing inten- of work submitted by a palm-tree user to perform either a query or tionally faulty and misleading answers. Coping with mali- an insertion on the indexed data. A task or HIT (Human Intelligent cious workers is not addressed in this paper. Task) is the smallest unit of work assigned to a worker. Typically, a job involves generating multiple tasks. Task assginment There are two cases to consider: (1) The worker chooses the tasks to work on, or (2) the tasks are automati- 3.2.1 The B+ -tree Component cally assigned to the worker without the workers interference The palm-tree index is built on top of a B+ -tree index. Each in the selection. This dimension is particularly important node in the B+ -tree has n ordered keys and represents one task. when workers are of variable expertise. For example, it is [11] observes that human beings are capable of performing n-ary important to avoid assigning difficult tasks to inexperienced operations with ease. Therefore, we give to a worker a sorted list workers. In this paper, we assume Case 2, above. A of n items, e.g., images or videos, along with the query item q, 2 27 DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing Table 1: Palm-tree index operations Operation Quantitative keys Qualitative keys Query Crowd-search Crowd-search Insert B+ -tree insert Crowd-search then B+ -tree insert Build Bulk loading or Successive inserts Successive inserts Delete Query then Query then B+ -tree delete B+ -tree delete Update Delete then insert Delete then insert for estimating the repair cost of a car. The image on top represents the query image, while the images below are the items in a node in the tree. The worker assess where the query key fits among the current nodes keys. Figure 2: palm-tree index model. Query item Content Query e.g., a query image or video, and ask the worker to place q in A based on how q compares to the items in A. For instance, A can be a list of 5 pictures of cars sorted according to the extent to which the car is damaged, while q can be the picture of a new damaged Content Node Less More car. The fanout f of the B+ -tree (the maximum allowed children Expensive Expensive of a node other than the root) naturally captures the maximum n- k3 ary operations that a human can accomplish with ease. Clearly, a k1 k2 human being can handle a lot easier a list A with 5 pictures than one with 20. Determining the optimal f for a given problem requires Figure 3: Sample task for choosing the best branch to estimate the an initial phase of training. We discuss this issue in more detail in car repair cost. Section 5. Construction (building) of the palm-tree depends on the type of indexed keys. We have two types of keys: quantitative and quali- tative keys. 4. ALGORITHMS Table 1 describes how the build, insert, update, delete, and query DEFINITION 3.1. A quantitative key is a key to be indexed operations are performed in the palm-tree index. Regular B+ -tree that has two proportional properties, namely, a subjective property algorithms (i.e, insert-split) are performed directly and do not in- and an assessed value. For quantitative keys, the palm-tree index volve the crowd. We focus on the crowd-search operation (i.e., can be constructed by sorting keys based on the assessed values querying the palm-tree index using the crowd) because the other then bulkload the index. operations depend on it. The original search algorithm for the B+ -tree index is not di- DEFINITION 3.2. A qualitative key is a key to be indexed rectly applicable for the crowdsourced palm-tree index. The palm- based on a subjective property only. It does not have any asso- tree search algorithm is dependent on the way the palm-tree is ciated assessed value. For qualitative keys, a palm-tree index can traversed and the way the workers answers are aggregated. We be constructed only using successive insertions and human-based identify three search strategies for the palm-tree search algorithm. comparisons. All three search strategies share the following two steps. The first One example of a quantitative key is images of damaged cars as- step is root task generation that generates tasks at the root level sociated with actual repair costs. The degree of car damage is the and is common for all three strategies. The second step is the re- subjective property of the key while the repair cost is the quantita- sponse handler that generates tasks at the lower levels of the tree tive key. An example of a qualitative key is images of butterflies and varies according to the three strategies. Algorithm 1 describes with an ordering based on beauty as a subjective property. Reg- the root task generation step. ular B+ -tree splitting/merging algorithms are used during inser- tion/deletion into/from the B+ -tree. For both types of keys, queries Algorithm 1: generate root tasks(Tree tree,Key q) use human intelligence to make comparisons needed to search the k← get replications(tree,h) B+ -tree. All comparisons are based on the subjective property only. solved← 0 // zero solved so far 3.2.2 Index manager for i = 1 to k do w ← choose worker() The palm-tree index manager is responsible for tree construc- h← tree.height // tasks at the root level tion and query processing within the palm-tree. It initiates jobs for t← create task(tree,h,q,w) users’ requests, generates tasks for every job, assigns tasks to work- submit task(t,response handler) ers, and aggregates workers’ responses. We use majority voting to end aggregate responses of workers. In the palm-tree index, a task consists of items (e.g., pictures) in a node in the B+ -tree. A worker is asked to choose the best location for a query item among these items. Figure 3 gives an example task 4.1 Leaf-Only Aggregation Strategy 3 28 DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing In this technique, a job is replicated to K workers. A worker workers. A priority queue of pointers to the unexplored nodes in performs the index search by descending the tree from the root to the index. Unexplored nodes are ordered in the priority queue ac- the leaf based solely on this workers own decisions or assessments cording to their height (or level) in the tree. Nodes at the same to find the best match to the query item. All workers responses and level in the B+ -tree index are ordered by the percentage of votes final results are aggregated only at the leaf level. Algorithm 2 gives given to these nodes. To detect bad decisions, workers are shown an outline of the leaf-only aggregation strategy. progressively the leaf node bounds of the current sub-tree at hand. If the workers indicate that the position of the query key falls out- Algorithm 2: handle leaf only aggregation(Task t) side the range of the keys in leaf nodes, then the current sub-tree is abandoned and another node is picked from the top of the priority if t.level 6= 0 then // task at non-leaf level queue to resume the search. tree ← get subtree(t.response,t.tree) Figure 4 gives an example of the backtracking search strategy. In t← create task(tree,t.level-1,t.q,t.w) Figure 4(a), a palm-tree of fanout 2 is used to index keys 1 through submit task(t,handle leaf only aggregation) 8. Figure 4(b) illustrates the steps of traversing the tree. The search else solved++ starts at the root node A. Node B is at the top of the priority queue if solved == k then // all workers finished withas 60% of the workers indicate this node. The priority queue result = ← aggregate all results keeps track of other alternatives besides B. In next step we show the return result range (i.e., the min and max) of the keys in the sub-tree rooted at B, end that is from 1 to 4. Assume that the workers indicate that the query end key is larger than 4; then we have an error. Hence, we backtrack to node C. We insert in the queue node D. Algorithm 4 outlines the search steps. 4.2 All-levels Aggregation Strategy An alternative search strategy is to move down the tree level by Algorithm 4: handle backtrack all levels aggregation(Task level. The path from the root to the final search position is collec- t) tively determined by the crowd as follows. The search is started at solved++ the root where K0 tasks are generated at this level. After all work- if solved == k then // all workers finished at ers complete their tasks at the root level, their answers are aggre- this level gated, say by majority voting, and the child node that corresponds result = ← aggregate all results to the item with the highest vote is selected. Let this node be v. We if result outside range of current subtree then generate K1 new tasks for v. We aggregate the answers from the tree ← pop queue() K1 workers and decide which of the children of v to go to next. We else repeat the process until we reach the leaves. tree ← get subtree(result,t.tree) There are two variations of this search technique: uninformed end and informed workers. In the former, a worker is not aware of push queue(other voting alternatives) the other workers’ decisions. In the latter, a worker is allowed to if t.level 6= 0 then // task at non-leaf level k← get replications(tree,t.level-1) view a summary of the other workers’ responses to the current task. for i = 1 to k do Algorithm 3 provides an outline of this search procedure. w ← choose worker() t← create task(tree,t.level-1,t.q,w) Algorithm 3: handle all levels aggregation(Task t) submit task(t,handle backtrack all levels aggregation) solved++ end if solved == k then // all workers finished at else this level return result result = ← aggregate all results end if t.level 6= 0 then // task at non leaf level end k← get replications(tree,t.level-1) tree ← get subtree(result,t.tree) for i = 1 to k do w ← choose worker() t← create task(tree,t.level-1,t.q,w) 5. ANALYSIS submit task(t,handle all levels aggregation) In this section, we study how to set the parameters of the palm- end tree index: tree order and cost distribution. We also introduce else the main performance metrics of the palm-tree index, mainly, error return result and cost. end end 5.1 Performance metrics The location of a key is the rank of the key in the ordered list of keys at the leaf level of the tree. Let q be a query key with a ground truth location, say loctruth , at the leaf level, and locret be 4.3 All-levels Aggregation Strategy with the retrieved location using one of the search strategies. Backtracking The two search procedures above, namely the leaf-only and all- DEFINITION 5.1. Error E is the distance between ground levels aggregation strategies, cannot compensate for an error. Back- truth location and retrieved location of the query key tracking is one way to correct potentially wrong decisions of the E= | loctruth − locret | 4 29 DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing A P5 60% 40% Priority queue B C step 1 P5 {} P3 P7 30% 70% D step 2 P1 P3 P4 {} P2 P4 P6 P8 1 2 3 4 5 6 7 8 step 3 P5 P7 P8 {} (a) (b) Figure 4: Backtracking all-levels aggregation strategy. DEFINITION 5.2. Cost C is the total number of tasks to com- DEFINITION 5.3. Probability of distance d error at level l plete a job. (Pdl ) is the probability of deviating d branches from the ground truth branch at level l. 5.2 Tree order and height We estimate the final error that would result from a distance d The order (fanout) of a regular B+ -tree index is usually con- error at level l to be d × f l−1 . For example, a distance 1 error at strained by the size of the disk page. In contrast, in the palm-tree level 1 (i.e., the leaf level) deviates the final result from the correct index, the order of the B+ -tree depends on the ability of workers one by distance 1. A distance 1 error at level h (i.e., the root level) to process at once (in a single task) a specific number of keys (see deviates final result about the number of leaf keys in a child subtree Section 3). of the root node, that is f h−1 P. We calculatel−1 the expected distance Notice that increasing the order of the tree increases a worker’s error at level l (EEl ) to be d × Pdl × f . probability of making a wrong decision at a given node. However, d Techniques for cost distribution for level-by-level voting with from a different point of view, in order to get a correct final result, backtracking is left for future investigation. correct decisions must be made at all levels of the tree. Increasing the tree order reduces number of levels required to be correct and hence reduces the final probability of error. 6. EXPERIMENTAL EVALUATION In order to test the palm-tree index and its search strategies, we 5.3 Cost selection implement a web site to submit tasks to workers. This helps in fur- We adopt majority voting to aggregate workers decisions. There- ther studying the problem under controlled settings. Two datasets fore, increasing the number of replicated tasks increases the quality are used in the experiments; the first dataset is a set of 200 square of the aggregated decisions. However, there is almost a satura- images of different sizes. The second dataset is a set of 1300 used tion point beyond which, increasing the number of replicated tasks cars images with desired selling prices. A custom-built web crawler would be of little, if any, benefit. is used to collect the dataset of cars from a website of used car ads. 5.4 Cost distribution Assume that there is a cost bound, say C, to search the index. An important issue is how this cost is distributed among tasks in terms of the number of replicas per tree level as tasks are assigned to workers. In the leaf-only aggregation strategy, every worker has to perform exactly h tasks from the root level to the leaf level. Hence, we can only have k = ⌊ C h ⌋ workers at most. In the all-levels aggregation strategy, we propose two techniques to distribute tasks among level, namely even and probabilistic dis- tribution. In the even distribution technique, every level in the palm-tree index is assigned ⌊ C h ⌋ tasks. Even cost distribution as- sumes equal importance of all the levels of the palm-tree index. (a) Squares dataset (b) Cars dataset However, tree levels are of different importance e.g., when a wrong decision is made at the higher levels (closer to the root level) of Figure 5: Mean error while changing tree fanout. the palm-tree index, it would result in a higher final error. This is not the case when wrong decisions are made at the lower levels of Figures 5 and 6 give the experimental results of the mean er- the palm-tree index. On the other hand, wrong decisions are more ror rate and mean cost for the uninformed all-levels aggregation likely to occur at the lower levels than at the higher levels. The and leaf-only aggregation search strategies. The parameters of the reason is that spacing in-between the keys within a node decreases palm-tree index are set as follows. The fanout ranges from 2 to 4 as we descend the tree. and task replication is set to 3 and 5. To accommodate for the effect of nodes level on error severity Figure 5 gives the mean error rates for the two search algorithms is to replicate tasks in a way that is proportional to the expected when applied to the cars and squares datasets. The error rate for distance error per level. either algorithm is much higher on the cars dataset than on the 5 30 DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing ! 8. CONCLUSION ! "! ! In this paper, we study the problem of crowdsourcing-based in- "! ! dexing. We motivate the problem with several application sce- narios. We propose a taxonomy of the problem and highlight its main challenges. We propose techniques for index construction and querying. We intend to complete this study with an extensive analysis and experimental evaluation. Our current work focuses on one-dimensional indexing. We plan to study other variations of crowdsourced indexing, such as mul- tidimensional indexing, spatial and spatio-temporal indexing and fuzzy indexing. (a) Squares dataset (b) Cars dataset 9. REFERENCES Figure 6: Mean cost while changing tree fanout. [1] The Amazon Mturk website. https://www.mturk.com, 2013. [2] A. Bozzon, M. Brambilla, and S. Ceri. Answering search squares dataset. The reason is that it is easier for ordinary users queries with CrowdSearcher. In WWW, pages 1009–1018, to compare images of squares based on their sizes than it is to com- 2012. pare cars based on their prices. It is also clear that all-levels aggre- [3] A. Bozzon, M. Brambilla, S. Ceri, M. Silvestri, and G. Vesci. gation outperforms leaf-only aggregation in both datasets. The rea- Choosing the right crowd: expert finding in social networks. son is that in the leaf-only aggregation strategy, if a worker makes a In EDBT, pages 637–648, 2013. wrong decision at a higher level of the tree, this error affects more [4] S. B. Davidson, S. Khanna, T. Milo, and S. Roy. Using the significantly his/her answer in the lower levels of the tree. crowd for top-k and group-by queries. In ICDT. In the all-levels aggregation, if a worker makes a wrong deci- [5] G. Demartini, D. E. Difallah, and P. Cudré-Mauroux. sion, then other workers can usually compensate for it (We use the ZenCrowd: leveraging probabilistic reasoning and majority principle). It is also evident from Figure 5 that increasing crowdsourcing techniques for large-scale entity linking. In the number of workers reduces the error rate. WWW, pages 469–478, 2012. Figure 6 gives the mean cost needed to perform a query on the [6] M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and palm-tree. The cost for search algorithms is higher on the cars R. Xin. CrowdDB: answering queries with crowdsourcing. In dataset than on the squares dataset. The reason is that the size (and SIGMOD, pages 61–72, 2011. accordingly tree height) of the cars dataset is larger than the that of [7] S. Guo, A. G. Parameswaran, and H. Garcia-Molina. So who the squares dataset. It is evident that increasing the number of repli- won?: dynamic max discovery with the crowd. In SIGMOD, cations increases the cost needed. It is also evident that increasing pages 385–396, 2012. the tree fanout generally reduces the cost as the height of the tree decreases. One technique used to reduce the cost is to allow work- [8] L. Kazemi and C. Shahabi. Geocrowd: enabling query ers to stop at not leaf levels if they agree that a key at a non-leaf answering with spatial crowdsourcing. In Proceedings of the node matches the search key. 20th International Conference on Advances in Geographic We are in the process of implementing more extensive exper- Information Systems, pages 189–198. ACM, 2012. iments that will include larger crowds and other search strate- [9] C. Lofi, K. El Maarry, and W.-T. Balke. Skyline queries in gies (i.e., informed all-levels aggregation and all-levels aggregation crowd-enabled databases. In EDBT, pages 465–476, 2013. with backtracking). We plan to study the effect of cost distribution [10] A. Marcus, D. Karger, S. Madden, R. Miller, and S. Oh. alternatives (even and expected distance errors). We also plan to Counting with the crowd. In PVLDB, pages 109–120, 2013. study the quality of sorts and joins using the palm-tree index. [11] A. Marcus, E. Wu, D. Karger, S. Madden, and R. Miller. Human-powered sorts and joins. VLDB Endow., 5:13–24, 2011. 7. RELATED WORK [12] A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Crowdsourced databases: Query processing with Lately, there has been an increasing interest in crowdsourcing. people. CIDR, 2011. Several crowd-powered database systems, e.g., [6, 15, 12, 11] have [13] A. Parameswaran, A. D. Sarma, H. Garcia-Molina, been proposed to use human intelegance to perform database oper- N. Polyzotis, and J. Widom. Human-assisted graph search: ations. These operations mainly focus on sorts and joins, e.g., [11], it’s okay to ask questions. VLDB Endow., 4:267–278, 2011. counting, e.g., [10], and max-finding, e.g., [7, 16]. Other works [14] A. G. Parameswaran, H. Garcia-Molina, H. Park, focus on algorithms to answer top-K, e.g., [4], skyline queries, N. Polyzotis, A. Ramesh, and J. Widom. Crowdscreen: e.g., [9], and spatial queries, e.g., [8]. algorithms for filtering data with humans. In SIGMOD, The problem of optimizing the number of questions needed to pages 361–372, 2012. find a set of target nodes within a general directed acyclic graph (DAG) using yes/no questions is studied in [13]. However, the B+ [15] H. Park, R. Pang, A. G. Parameswaran, H. Garcia-Molina, tree within the palm-tree index posseses more specific properties N. Polyzotis, and J. Widom. Deco: A system for declarative than a general DAG (i.e., balanced height, and multiple ordered crowdsourcing. PVLDB, 5, 2012. keys within nodes) that requires more specific strategies for tree [16] P. Venetis, H. Garcia-Molina, K. Huang, and N. Polyzotis. search and task aggregation methods (Section 4) than in the case Max algorithms in crowdsourcing environments. In when yes/no questions are used.To our knowledge, this is the first Proceedings of the 21st international conference on World work to address the problem of indexing with the crowd. Wide Web, WWW ’12, pages 989–998, 2012. 6 31