DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing Crowdsourcing to Mobile Users: A Study of the Role of Platforms and Tasks Vincenzo Della Mea Eddy Maddalena Stefano Mizzaro Department of Mathematics and Computer Science University of Udine Udine, Italy vincenzo.dellamea@uniud.it, eddy.maddalena@uniud.it, mizzaro@uniud.it ABSTRACT they allow requesters to post the tasks they want to crowd- We study whether the task currently proposed on crowd- source and workers to perform those tasks for a small reward sourcing platforms are adequate to mobile devices. We aim (usually a few cents). at understanding both (i) which crowdsourcing platforms, Meanwhile, mobile devices (phones, smartphones, tablets, among the existing ones, are more adequate to mobile de- and in the near future glasses, watches, and so on) have vices, and (ii) which kinds of tasks are more adequate to mo- become ubiquitous and are used to access the Web. Ac- bile devices. Results of a user study hint that: some crowd- cording to several statistics, in the next few years there will sourcing platforms seem more adequate to mobile devices be more Web accesses by mobile devices than by classical than others; some inadequacy issues seem rather superficial desktop/laptop computers (see, e.g., [6]). and can be resolved by a better task design; some kinds of In this paper we study the intersection of mobile and tasks are more adequate than others; and there might be crowdsourcing. We aim at understanding whether the task some unexpected opportunities with mobile devices. currently proposed on crowdsourcing platforms are adequate to mobile devices. By “adequate” we mean that they can be performed effectively by using a mobile device in place of Categories and Subject Descriptors a desktop/laptop computer. We specifically seek to answer H.4.m [Information systems applications]: Miscellaneous two research questions: Q1 Which crowdsourcing platforms, among the existing ones, General Terms are more adequate to mobile devices? Experimentation, Measurement. Q2 Which kinds of tasks are more adequate to mobile de- vices? Keywords Besides the above mentioned statistics on increasing mo- Crowdsourcing, mobile devices. bile usage, this research is also justified by the fact that to- day quite often people access the Web on their mobile phones 1. INTRODUCTION AND AIMS for short periods of time, for example while commuting to Among the phenomena that are acquiring increasing im- work on train or underground, while waiting for a bus or for portance in the information technology landscape, two are a friend, while in a car (and not driving), while standing in the subjects of this paper: (i) crowdsourcing, and (ii) mobile a queue, etc. In other terms, there is plenty of human work- devices and applications. force available for a few minutes (or seconds) bursts, and Crowdsourcing, i.e., the outsourcing of tasks typically per- this kind of workforce seems perfect for the crowdsourcing formed by a few experts to a large crowd as an open call, scenario, where the tasks are usually short and the reward has been shown to be reasonably effective in many cases, is usually low. Moreover, some crowdsourcing tasks could like Wikipedia, the Chess match of Kasparov against the be more adequate to a mobile scenario than to a classical world in 1999, and several others (see, e.g., [4] or even desktop one. For example, taking pictures of some point of http://en.wikipedia.org/wiki/Crowdsourcing). Several interest (like a monument, a paint, or a billboard), describ- crowdsourcing platforms (Amazon Mechanical Turk being ing a real life scene, or even recording movements, destina- probably the most known) have also appeared on the Web: tions, and trajectories in an urban traffic setting. However, to fruitfully exploit this workforce, it is necessary that the platforms are adequate and tasks are feasible. This consid- eration also underlies our choice of focussing on the worker side and neglecting the requester part. The paper is structured as follows. In Section 2 we briefly survey the related work on mobile and crowdsourcing, trying to focus on the research involving both aspects. In Sections 3 and 4 we describe two experiments aiming at answering the Copyright c 2013 for the individual papers by the papers’ authors. Copying permitted for private and academic purposes. This volume is published and two research questions above. In Section 5 we draw conclu- copyrighted by its editors. sions and sketch future developments. 1 14 DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing 2. RELATED WORK id Platform name URL # of Although crowdsourcing commercial platforms seem de- tasks signed with a desktop/laptop user in mind, there has al- mTurk Amazon mturk.com 1154 ready been some work on the idea of having workers using Mechanical Turk mobile devices. We briefly survey it in this section. micW Micro Workers microworkers.com 1302 Musthag and Ganesan[7] focus on mobile micro-task mar- ket and present some statistics on mobile workers behavior. minW Minute Workers minuteworkers.com 86 The mCrowd platform [11] is an iPhone based mobile shortT Short Task shorttask.com 175 crowdsourcing platform that enables mobile users to act as both requester and workers, and focuses on tasks like Table 1: Platforms geolocation-aware image collection, road traffic monitoring, etc., that exploit the rich array of sensors available on iPhones. Eagle [2] describes txteagle, a mobile crowdsourcing mar- tasks available per month [5]. Though, the samples are nei- ketplace used in Kenya and Rwanda for tasks like transla- ther negligible, since they count around 1% − 5%. For each tions, polls, and transcriptions. task we extracted: identifier, title, required proof, remunera- Location-based distribution of tasks to mobile workers is tion, time needed, requester identifier, and description. The proposed in [1]. Some design criteria for mobile crowdsourc- task collection is available upon request. Three examples of ing platforms are also presented and discussed. A similar tasks in our collection are (errors included): approach, focused on the specific domain of news reporting • Task example 1: is presented in [9]: SMS messages are used for location based assignment for crowdsourcing news. 1. Go to http://goo.gl/Dlzk Narula and colleagues [8] focus on low-end mobile devices 2. Click the link to go to the download and present MobileWorks, a platform for OCR tasks specifi- cally aimed at users from the developing world. Experimen- 3. Complete a survey/offer on Sharecash and down- tal results demonstrate a high rate of task completion (120 load the file per hour) and a high accuracy (99%). A similar approach 4. Send proof is presented in [3], where the mClerk system is described. Some experimental results again witness the feasibility of • Task example 2: the approach. Some discussion of the viral diffusion of the 1. Go to http://OneDollarRiches.com/5737 system among workers is also discussed. As a different approach, the CrowdSearch system, an im- 2. Click on Join Now button age search service for mobile phones that relies on Amazon 3. Invest 1 dollar by logging in into your Alertpay Mechanical Turk, is presented in [10]. It is interesting be- account cause, although it does not exploit a mobile crowd, it is an 4. After that enter you personal details and login. example of exploiting a crowd in (almost) real time. 5. Join and finish signing up 3. EXPERIMENT 1 While Sign up use same e-mail of your Alertpay ac- count. because when u make ur refferaf there 1$ sing 3.1 Aims up go direct into ur alterpay account. The first experiment aims to verify the suitability of ex- • Task example 3: Find the details for this Restaurant isting crowdsourcing platforms to mobile devices (see ques- tion Q1 in Section 1). We asked the participants to estimate – For this restaurant below, enter the details below the difficulty of performing a task on both a mobile device – You must confirm that the restaurant is still open and a desktop/laptop computer. – Include the full address, e.g. http://www.thechee 3.2 Participants secakefactory.com Sixteen participants were involved in the experiment. All – Do not include URLs to city guides and listings of them were italian students, aged between 16 and 30. We like Citysearch required a good knowledge of English and familiarity with computers and smartphones. Participants were randomly Restaurant : Akasha Organics 160 North Main St. subdivided into 4 groups (U1 ,U2 ,U3 ,U4 ), each one containing Ketchum four participants. Fill in the text fields with this information: Still open, Restaurant name,Website Address,Phone number,Street 3.3 Data Address,City,State,Zip code. We selected four among the most popular crowdsourcing platforms (see Table 1). We downloaded some randomly se- 3.4 Methods lected tasks from these platform, for a total of 2717 tasks We randomly extracted 48 tasks, 12 from each platform, (the exact number for each platform is shown in the third and divided them into 4 groups (T1 , T2 , T3 , T4 ). Each group column in Table 1). The download has been performed in contains 12 tasks (3 tasks from each of the 4 platforms). October and November 2012. The downloaded tasks are Task group Ti was assigned to user group Ui (e.g., task group among those that can be performed by any requester, i.e., T1 was assigned user group U1 ). We developed a web ap- without any qualification. These are not huge samples: for plication to show to each participant the group of 12 tasks example, on mTurk one can count hundreds of thousands of assigned to his/her user group (see Figure 1). By using this 2 15 DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing Figure 1: The interface used in the first experiment (translated into English) application, each participant recorded two estimates of dif- 5 ficulty for each task, one for a desktop and one for a mobile Desktop device (see the bottom part of the figure). Tasks were pre- Mobile sented in random order and participants did not know from 4 which platform the tasks were extracted. Difficulty was provided on a seven points scale ranging from trivial to impossible. For each task we therefore ob- 3 Difficulty tained 4 estimates (from the participants in the same group). We then converted the labels into the [0..6] range and cal- 2 culated the average of difficulty estimates. 3.5 Results 1 Figure 2 shows the averaged estimated difficulty, on desk- top and mobile, for each platform. Tasks from mTurk are estimated slightly more difficult than MicroWorkers, Min- 0 uteWorkers, and ShortTask. The difference of difficulty es- mTurk micW minW shortT timates between desktop and mobile is also shown in Fig- ure 3: difficulty estimation is consistently higher on mobile Figure 2: Estimated difficulty devices, both in absolute terms and as a percentage of the desktop difficulty. By manually analyzing the task collection we realized that • use of frame attribute in html pages; some of them are inadequate to mobile devices for some typical reasons: • bad layout in a small resolution display; • need of a high power CPU. • too long description; Some of these task issues seem due to the task content, while • technical obstacles like scrolling problems, unsupported some other depend on how the Web interface is realized. audio formats and/or plugins, pages with Adobe Flash, Many of them seem rather superficial and can be overcome etc.; by a better task design and/or better user interfaces. 3 16 DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing sults. Also, their classification was easier (sometimes it is not clear how to classify real tasks). Finally, this allowed us 1.50 100 Difference to create task descriptions written in Italian, thus remov- % ing any language issue from the experiment (all participants 1.25 were Italian native speakers). The created tasks are in all 80 respects similar to real tasks. 1.00 4.3 Methods Percentage(%) 60 Difference 0.75 We took the usual special care to avoid any order and learning bias. Each participant performed 6 tasks (one for 40 0.50 each of the categories in Table 2) on the desktop platform and 6 other tasks (again, one for each category) on the mo- 20 bile one. His/her tasks were selected from two task groups, 0.25 depending on the user group the participant was assigned to. To further avoid bias, participants in each group alter- 0.00 0 natively started from desktop or from mobile. Therefore, mTurk micW minW shortT each participant performed a total of 12 different tasks, half on desktop and half on mobile. Each task was performed by Figure 3: Mobile-desktop difference of estimated 8 participants in two user groups, half of which performed difficulty, as absolute time (bars on the left) and it on mobile and half on desktop. as a fraction (right) Statistics have been calculated as follows. At first, the average time needed for task completion has been calculated for each task separately for mobile and desktop performance (i.e., averaged on 4 subjects each). Then category averages 4. EXPERIMENT 2 have been calculated from task averages, again separately for mobile and desktop devices. 4.1 Aims The aim of the second experiment is to identify which task 4.4 Results kinds are more adequate for mobile devices (see question Q2 Figure 5 shows the average time to complete for a task, in Section 1). We therefore now focus on task features, and for each category and on both mobile and desktop devices. not on platforms. Also, in place of asking estimates to par- Figure 6 shows the differences in average time to complete. ticipants, we required them to actually perform the tasks Some tasks are quicker: Cat, Mod, Sen required less than on both desktop and mobile devices and we measured the one minute on average, on both desktop and mobile. ImT time spent on each task. Participants used two prototype and Tra are a bit longer, between one and two minutes on platforms that we built ad hoc for the experiment: one for average, and Wri is even longer. As expected, all tasks are desktop devices using Google Web Toolkit, and the other faster on desktop, with the only exception of Wri: in it, specifically made for mobile devices, by means of an Android the participants autonomously decided to use the voice-to- application. Figure 4 shows the resulting user interfaces. text functionality when on mobile, and this turned out to be quicker than writing with a keyboard (although we did 4.2 Participants and Data not investigate the quality of transcription). As highlighted The 16 participants (the same as in the previous experi- in Figure 6, ImT and Tra show a higher mobile-desktop dif- ment) were subdivided into 4 groups labeled U1 , U2 , U3 , U4 . ference, both on absolute time and percentage, probably To identify the kinds of task in a somehow objective way, because they require multiple texts in more fields, a cum- we relied on the task categories usually requested in crowd- bersome activity if carried out by mobile. sourcing marketplaces. More in detail, we started from Looking at the percentage differences in Figure 6, one can the 11 categories suggested by Amazon Mechanical Turk notice that Cat small difference in absolute terms is actually when creating a new task (see https://requester.mturk. quite high in percentage: this means that even if the differ- com/create/projects/new): Categorization, Data Collec- ence in time is rather small, since Cat tasks are quite short tion, Moderation of an Image, Sentiment, Survey, Survey (as can be seen in Figure 5), this small value is important in Link, Tagging of an Image, Transcription from A/V, Tran- percentage terms. Conversely, looking at the two rightmost scription from an Image, Writing, and Other. To obtain an bars, the percentage difference in Wri looks smaller than the amenable number of categories in our experiment, we ex- absolute time difference; this is again due to the average cluded 5 Mechanical Turk categories: Data collection, Sur- length of the Wri task, which is quite high (see Figure 5). vey and Survey link (considered somehow similar to Sen- Though, the improvement on mobile is still important, being timent), Transcription from A/V (to avoid technical issues around 20%. on mobile devices), and Other. We therefore selected 6 task categories, those shown in Table 2. Then we created 4 new tasks for each category, for a total of 24 tasks, and grouped 5. CONCLUSIONS AND FUTURE WORK them in four task groups (labeled Ta , Tb , Tc , Td ), each group The work described in this paper is a first exploration of containing six tasks, one from each category. the opportunities and challenges of outsourcing tasks to a Using artificial tasks (i.e., tasks created by ourselves) al- mobile crowd. Results provide preliminary evidence on the lowed to remove any platform bias and those issues discussed inadequacy of current crowdsourcing platforms for mobile at the end of Section 3.5, that might have affected the re- devices, even if task complexity would be adequate for being 4 17 DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing Figure 4: The interface used in the second experiment: desktop (left) and mobile (right) Id Category Description Cat Content categorization Some images are proposed to the worker, which is required to assign each of them to the correct category. Mod Moderation of an image The worker is required to flag adult contente pictures that are inappropriate for children. Sen Sentiment Some sentences are proposed to the worker, which is required to record his agree- ment by means of a Likert scale. ImT Image tagging Some images are proposed to worker, which is required to tag each of them with keywords. Tra Transcription from an image The worker is required to extract and write the textual content from a picture. Wri Writing The worker is required to write a short text about a specific topic. Table 2: Task categories carried out on mobile scenarios. More in detail, results are • Experiment 2 also confirms that mobile devices might fourfold: offer some unexpected opportunities, like the voice-to- text unexpected (by us) solution, autonomously adopted • Experiment 1 results show that, according to user per- by participants. ception of difficulty, some crowdsourcing platforms might be slightly more adequate to mobile devices than oth- We carried out two separate experiments, although shar- ers. ing subjects, in order to study two different aspects of mo- bile crowdsourcing: crowdsourcing platform effects, and task • Some inadequacy issues seem rather superficial and category effects. The experiments are preliminary and re- can be resolved by a better task or interface design. sults are not final, but this is consistent with our aims, that were to begin to study the general issue of mobile crowd- • Experiment 2 shows that tasks of different kinds, as sourcing. This exploratory attitude is also a motivation for defined by mTurk categories, might present different having two experiments performed with different method- difficulties when carried out on desktop or on mobile ologies (asking to the participants an estimate of difficulty devices. This might hint a first specialization of task and having participants performing the actual tasks). Of assignment, although examining features of easy and course, these experiments, or similar ones, could have been difficult tasks might provide a better ad-hoc special- run by means of some crowdsourcing platform themselves. ization, perhaps even independent of the kind of task. We preferred a more traditional approach and started with 5 18 DBCrowd 2013: First VLDB Workshop on Databases and Crowdsourcing 60 60 250 Time Desktop % Mobile 40 40 200 20 20 Percentage(%) 150 Time(s) Time(s) 0 0 100 −20 −20 −40 −40 50 −60 −60 Cat Mod Sen ImT Tra Wri 0 Cat Mod Sen ImT Tra Wri Figure 6: Mobile-desktop differences in average time to complete for each task category, as absolute time Figure 5: Average time to complete for each task (bars on the left) and as a fraction (right) category on both mobile and desktop devices [7] M. Musthag and D. Ganesan. Labor dynamics in a classical user studies, but we do plan to do that in the future. mobile micro-task market. In W. E. Mackay, S. A. To further develop this work, other experiments can be Brewster, and S. Bødker, editors, CHI, pages 641–650. imagined. For example, the same experiments described ACM, 2013. here could be repeated in real-world scenarios (on the train, [8] P. Narula, P. Gutheim, D. Rolnitzky, A. Kulkarni, and road, school rooms, or crowded places) to have more re- B. Hartmann. MobileWorks: A mobile crowdsourcing alistic results. It is also feasible to imagine an extended platform for workers at the bottom of the pyramid. crowdsourcing platform that on the basis of the context of a Proc. HCOMP11, 2011. worker (time, date, geolocation, habits and preferences, mo- [9] H. Väätäjä, T. Vainio, E. Sirkkunen, and K. Salo. bile device sensors, etc.), automatically filters and selects Crowdsourced news reporting: supporting news tasks tailored for a specific context. content creation with mobile phones. In Proceedings of the 13th International Conference on Human 6. REFERENCES Computer Interaction with Mobile Devices and [1] F. Alt, A. S. Shirazi, A. Schmidt, U. Kramer, and Services, MobileHCI ’11, pages 435–444, New York, Z. Nawaz. Location-based crowdsourcing: extending NY, USA, 2011. ACM. crowdsourcing to the real world. In Proceedings of the [10] T. Yan, V. Kumar, and D. Ganesan. Crowdsearch: 6th Nordic Conference on Human-Computer exploiting crowds for accurate real-time image search Interaction: Extending Boundaries, NordiCHI ’10, on mobile phones. In MobiSys ’10: Proceedings of the pages 13–22, New York, NY, USA, 2010. ACM. 8th international conference on Mobile systems, [2] N. Eagle. txteagle: Mobile crowdsourcing. In applications and services, pages 77–90. ACM Press, Proceedings of the 3rd International Conference on 2010. Internationalization, Design and Global Development: [11] T. Yan, M. Marzilli, R. Holmes, D. Ganesan, and Held as Part of HCI International 2009, IDGD ’09, M. Corner. mCrowd: a platform for mobile pages 447–456, Berlin, Heidelberg, 2009. crowdsourcing. In Proceedings of the 7th ACM Springer-Verlag. Conference on Embedded Networked Sensor Systems, [3] A. Gupta, W. Thies, E. Cutrell, and R. Balakrishnan. SenSys ’09, pages 347–348, New York, NY, USA, mClerk: enabling mobile crowdsourcing in developing 2009. ACM. regions. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’12, pages 1843–1852, New York, NY, USA, 2012. ACM. [4] J. Howe. Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business. Random House Inc, 2008. [5] P. G. Ipeirotis. Analyzing the amazon mechanical turk marketplace. XRDS, 17(2):16–21, Dec. 2010. [6] M. Meeker and L. Wu. Internet Trends D11 Conference — The annual Internet Trends Report, 2013. http://www.slideshare.net/kleinerperkins/ kpcb-internet-trends-2013. 6 19