=Paper=
{{Paper
|id=None
|storemode=property
|title=A Model-Driven Approach for Crowdsourcing Search
|pdfUrl=https://ceur-ws.org/Vol-842/crowdsearch-brambilla.pdf
|volume=Vol-842
|dblpUrl=https://dblp.org/rec/conf/www/BozzonBM12
}}
==A Model-Driven Approach for Crowdsourcing Search==
A Model-Driven Approach for Crowdsourcing Search Alessandro Bozzon, Marco Brambilla, Andrea Mauri Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy {name.surname}@polimi.it ABSTRACT MacroTask Description (BPMN) Even though search systems are very efficient in retrieving M2M Transformation world-wide information, they can not capture some peculiar MicroTask Description (BPMN) aspects and features of user needs, such as subjective opin- M2M Transformation ions and recommendations, or information that require local or domain specific expertise. In this kind of scenario, the hu- User Interaction Model (WebML) man opinion provided by an expert or knowledgeable user M2T Transformations can be more useful than any factual information retrieved by a search engine. Application embedded In this paper we propose a model-driven approach for the Stand-alone in social network or application specification of crowd-search tasks, i.e. activities where real crowdsourcing platform people – in real time – take part to the generalized search process that involve search engines. In particular we define Figure 1: Overview of our approach. two models: the “Query Task Model”, representing the meta- model of the query that is submitted to the crowd and the associated answers; and the “User Interaction Model”, which issues. Other users’ opinions can ultimately determine our shows how the user can interact with the query model to decisions. While in the past people could rely on opinions fulfill her needs. Our solution allows for a top-down design given by close friends on local or general topics, the change in approach, from the crowd-search task design, down to the the social connections in our society makes users increasingly crowd answering system design. Our approach also grants rely on online social interaction to complete and validate the automatic code generation thus leading to quick prototyping results of their search activities. People often search for hu- of search applications based on human responses collected man help in between canonical web search steps: they first over social networking or crowdsourcing platforms. query a search system, then they ask for an opinion on the result, maybe they also ask suggestion on the query term. Categories and Subject Descriptors We define this trend as crowd-searching. H.3.3 [Information Storage and Retrieval]: Information In current Web systems, the crowd-search activity, i.e. Search Retrieval—Search Process looking for opinion from friends or experts, is detached from the original search process, and is often carried out through different social networking platforms and technologies. More- Keywords over, people manages different applications, different virtual crowdsourcing, social network, model driven development. identities and maybe also different devices: they send email, ask on Twitter, Facebook or other social network, or ask to 1. INTRODUCTION friend and people they know. While search systems are superior machines to get world- Recent works (see section 4) on crowd-based search fo- wide information, people tend to put more trust in people cus on simple and atomic task, while crowd-sourced search than in automated responses. That is why often users seek involve a wide range of scenario, from trivial decisions, like for opinions collected within friends and expert/local com- choosing where going to eat at dinner, to more serious things munities for taking an informed decision about significant like organizing a travel or even buying a house. Thus the user need a way to manage and control the whole process, from the creation of the query, the selection of the target to the gathering of the results. In this paper we propose a model-driven, platform inde- pendent, approach to design Web applications that support crowd-sourced search. We define a top-down design ap- proach, as sketched in Figure 1 which applies model-driven Copyright c 2012 for the individual papers by the papers’ authors. Copy- engineering (MDE) techniques for the specification of the ing permitted for private and academic purposes. This volume is published and copyrighted by its editors. crowd-sourced information collection task, its splitting and CrowdSearch 2012 workshop at WWW 2012, Lyon, France refinement, and its mapping to the Web user interaction Field 1 N Schema CrowdObject N 1 Relation The type of the query defines how a user can answer to N 1 Outgoing From type: String 1 1 name: String N 1 type: String the question. These have been classified in a taxonomy [6] name: String N idField Incoming To comprising among others the following task types: N 1 1 • Like: the user answers the query by voting (“liking”) FieldInstance User one or more of the query inputs; value: String Input Output 1 N N 1 user: String Answer password: String • Comment: the user answers the query by writing a N 1 N email: String comment on one or more of the query inputs; N 1 Query • Add: the user answers the query by adding one or N question: String Responder more new instances of Output CrowdObject. 1 N Asker type: String open: boolean Finally, the Query is also related with a set of Output CrowdObjects, representing the answers to the question sub- mitted by the crowd. Figure 2: The query task meta-model. Users of a crowd-search task can be classified into two categories: askers and responders. The former is the user specification. The approach starts from the task description using the platform and creating questions to be submitted and applies model-to-model transformations to build the de- to the crowd, while the latter is a user involved in the query tailed task definitions (described e.g. in BPMN) and then answering process using the social network or crowdsourcing the platform independent user interaction model (described platform. in the domain-specific language WebML[3, 9]). Then the Relation represents associations that can exist between final application is automatically generated by means of a CrowdObjects. These relations can be either an Input-Input model-to-text code generation transformation. relation or a Output-Input relation. They are created when The main ingredients that participate to our contribution a query is split into sub-queries and depend on the kind of are: 1) a metamodel of the crowd-sourced question; and splitting pattern that is applied. Indeed, starting from the 2) the models of the user interfaces needed for defining the design of the coarse-grained task, one can refine its descrip- questions and for responding. In this short paper we focus tion by structuring its activities according to known crowd- on the aspects related to the model-driven design of the interaction patterns (ie.g., find-fix-verify [5], map-reduce [11], crowd-search user interactions, spanning from the question or Turkomatic guidelines [12]). definition to the engagement, dissemination, and ending in Input-Input relations occur when the initial set of inputs the response submission and collection. On the other side, is partitioned across different instances of the same query, we consider the task refinement and redesign problem as to reduce the workload of the responder. For example, if the outside the scope of this short work. original query would ask the responder to order one hundred The paper is organized as follows. Section 2 and Section restaurants, it can be useful to split the task into subtask 3 respectively describe our search task meta-model and user of ten restaurants each, to be assign to different responders. interaction model; Section 4 summarizes the related works In this case the task performed by each responder is the for both the crowd-sourcing and the model-driven fields; and same, but it is applied on different sets of objects. The finally, Section 5 concludes. initial set can be is therefore partitioned into the different query instances, according to different strategies (e.g. , in a uniform way or according some properties of the input 2. TASK MODEL instances). The input of the new query instances are thus The starting point of our MDE approach is the query task mapped to the inputs of the original query, according e.g. model. Figure 2 shows the query task meta-model to which to a map-reduce pattern [11]. every query task should conform. The main element is the Output-Input relations occurs when the task requested by Query submitted by a User. The Query is defined by a the author of the query is complex or difficult, or if the re- Question, written in natural language, and a list of Crow- sult require some kind of validation, which therefore requires dObjects, i.e. information structured according to a given organizing the task into a sequence of subtasks [5]. In this schema.1 case the query is composed by several heterogeneous tasks, A question includes a set of Input CrowdObjects, i.e., a and each user performs a particular one (e.g., according to set of data in the user’s question upon which the responder a find-fix-verify or similar pattern [5]). Hence, the output of can apply his response. For example, if the user wants to the first subtask becomes the input for another query sub- collect opinions about some restaurants in Lyon, the Input task, and so on, thus generating an output-input mapping. CrowdObject instances comprise the restaurants subject to the comparison. The input object can be either inserted manually at query creation time by the user or extracted 3. USER INTERACTION MODEL from a previous search (both canonical or crowd-based) step. The user interaction model describes the interface and The model of these objects is defined by the Schema element. navigation aspects of the crowdsearch application. Start- Input objects are not mandatory for the creation of a query, ing from the query task model, possibly split in a complex as a user can create an open question. However, we always pattern of microtasks, a model transformation can lead to a assume the presence of a Schema. coarse user interaction model, which in turn can be manu- ally refined by the designer. The user interaction must cover 1 To ease the discussion, we assume that information is three fundamental phases of the crowdsearch process: structured in relations; however, other formats (e.g. semi- structured, graph, etc.) are also suitable. • the submission of the question (performed by the asker); Figure 3: User interaction model for creating a query. • the collection of the responses (performed by the re- query. sponder); Finally, in the Responder Selection page the asker can se- lect the responders to the query: the list of possible respon- • and the analysis of the results (available to the asker ders is retrieved from the social network or crowdsourcing for getting insights). platform (in this example, the GetFriend component col- At the current stage, our research has identified the inter- lects the friends from the Facebook platform). The user can action patterns relevant for each phase, considering the var- select the responders through the “Friends” multi-selection ious options of deployment platform, task type, and macro- list in the page. Eventually, after viewing a preview of the task splitting pattern. For space reasons, in this section we created question, the user can post the query on the social report one possible outcome of the user interaction design, platform. in case of simple query task and of deployment on the Face- Figure 4 shows a compressed view of the web pages that book social networking platform. We describe the phases are produced starting from the user interaction models de- of query creation and of query answering, according to the scribed in Figure 3, thanks to the code generation facilities WebML notation [3]. of WebRatio. The structure and content of the pages can be easily recognized and mapped to the corresponding model 3.1 Query creation elements. In this particular example the user wants to know Figure 3 shows the user interaction model for creating and some good restaurants in Milan. Hence she defines the ques- submitting a query, according to the WebML notation. In tion “Can you suggest me some good restaurants in Milan?” the Create Query page, the user specifies the textual ques- and selects the “Add” query type. Then she creates the tion (e.g., “What’s the best museum to visit in Milan?”) and schema the input instances of the question must conform to. sets the query type (e.g., “Like”, “Add”, and so on). The user In the instance list she adds a restaurant she already knows. can also choose the type “open question”, thus assuming that Finally she selects the recipients of the question from the no items are needed in input for the responder to select/like list of her friends extracted from Facebook and so on. In both cases, an Query instance is created, and its type is set. If the query does not have inputs, then the 3.2 Answering to a query user is directly brought to the Responder Selection page. Figure 5 depicts the WebML model for the query answer- If, on the other hand, the user chooses to build a struc- ing activity, performed by a responder based on the query tured question with inputs, then he is redirected to the “De- structure defined by the asker. When accessing the applica- fine schema” page, where the asker can create a schema for tion through the Responder Dashboard page, the responder inputs by assigning a general name to the input type and is presented with a list of questions to answer. By clicking by defining its attributes in terms of name and type. By on a question, he is brought to the Details page, where he submitting the form, the application creates a new instance can provide his answer. The page shows the question text, for the Schema entity and its associated Fields. The asker is plus the set of defined input instances (Input component in brought to the Add Instance page, where he can add input the Details page). objects following the schema previously defined. The spec- Depending on the type of the question (defined by the ified instances of the Input are created and linked to the asker defined during the query creation phase), different Figure 4: Rendering of the Web pages implementing the query creation phase, as generated by WebRatio. Figure 5: Hypertext model for answering to a query. concrete user interfaces can be shown: in the case of a tion, i.e., the discipline that aims to use human knowledge “Like” question, the responder simply selects the preferred to fulfill tasks that are difficult or even impossible for a ma- instances in the Input list; as a consequence, a set of Output chine. For example, human computation studies have been objects are created corresponding to the “likes” of the user. done about using crowd’s knowledge for image recognition In the case of “Comment” or “Add” question, the user is [15], to answer ambiguous queries[6] or to refine incomplete shown a form to respectively write a comment or add a new data [10, 14]. The platforms most adopted for exploiting hu- instance to the list. In the case of “Comment” questions, man knowledge and skills are based on crowdsourcing (the an Output object with the comment schema (i.e. a single most prominent example being Amazon Mechanical Turk textual field) is created. The “Add” case is more noteworthy, [1]). However, other ways of collecting human intelligence as the Output objects will present a schema equivalent to the can be exploited, such as social networks. Input ones, so to add the new object instances to the list of A very important aspect of a crowd-based search is the input object of the query. quality of the results and the response time, therefore sev- Figure 6 shows the compressed view of the “Details” page eral works have addressed the problem of understanding how built from the user interaction model described in Figure 5. design features (for example the cost of the task) impacts Continuing the previous example, in this page the responder on these results metrics [13, 4]. of the question can add additional restaurants he knows. The novelty aspects of our approach with respect to the existing works include: independence with respect to the 4. RELATED WORKS crowdsourcing platform (in particular, we allow to exploit in- differently a social network or a crowdsourcing marketplace This work falls into the broad field of human computa- 6. REFERENCES [1] Amazon mechanical turk https://www.mturk.com. [2] Turkit http://groups.csail.mit.edu/uid/turkit/. [3] Webml http://www.webml.org. [4] D. Ariely, U. Gneezy, G. Loewenstein, and N. Mazar. Large Stakes and Big Mistakes. Review of Economic Studies, 75:1–19, 2009. [5] M. S. Bernstein, G. Little, R. C. Miller, B. Hartmann, M. S. Ackerman, D. R. Karger, D. Crowell, and K. Panovich. Soylent: a word processor with a crowd inside. In Proceedings of the 23nd annual ACM symposium on User interface software and technology, UIST ’10, pages 313–322, New York, NY, USA, 2010. Figure 6: Rendering of the query answering Web ACM. page, as generated by WebRatio. [6] A. Bozzon, M. Brambilla, and S. Ceri. Answering search queries with crowdsearcher. In Proceedings of the World Wide Web conference (WWW 2012), page of choice); model-driven design of tasks and user interac- in print, 2012. tions; model-transformation based approach that partly au- [7] M. Brambilla, S. Butti, and P. Fraternali. Webratio tomates the generation of some models, thus reducing the bpm: A tool for designing and deploying business cost of designing new applications; and possibility of man- processes on the web. In B. Benatallah, F. Casati, ually or automatically choosing the responders to a query G. Kappel, and G. Rossi, editors, ICWE, volume 6189 task. Our work can be seen as an extended social ques- of Lecture Notes in Computer Science, pages 415–429. tion answering approach (as applied in Quora and other Springer, 2010. well known platforms), where the asker has greater flexibility [8] M. Brambilla, S. Ceri, P. Fraternali, and I. Manolescu. in defining and sharing his questions. Our work addresses Process modeling in web applications. ACM Trans. the problem of defining crowdsourcing tasks at the modeling Softw. Eng. Methodol., 15(4):360–409, 2006. level, while existing approaches and tools typically allow for [9] S. Ceri, P. Fraternali, A. Bongio, M. Brambilla, a programming approach to the problem (e.g., see TurkIt S. Comai, and M. Matera. Designing data-intensive [2]). Web applications. Morgan Kaufmann, USA, 2003. Our work is based on general purpose model-driven tech- [10] M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, niques and on our previous work on Web application design and R. Xin. CrowdDB: answering queries with [9], on mapping business processes to user interaction mod- crowdsourcing. In Proceedings of the 2011 els [8], as well as on the preliminary results presented in international conference on Management of data, the CrowdSearcher approach [6]. From the implementation SIGMOD ’11, pages 61–72, New York, NY, USA, June perspective, we rely on the WebRatio toolsuite [7], which 2011. ACM. provides code generation facilities for WebML models. [11] A. Kittur, B. Smus, and R. Kraut. CrowdForge: crowdsourcing complex work. In Proceedings of the 2011 annual conference extended abstracts on Human 5. CONCLUSIONS AND FUTURE WORKS factors in computing systems, CHI EA ’11, pages In this paper we presented a model-driven approach for 1801–1806, New York, NY, USA, 2011. ACM. crowdsourcing responses to questions. We defined a meta- [12] A. P. Kulkarni, M. Can, and B. Hartmann. model of the query taks and a user interaction model for Turkomatic: automatic recursive task and workflow building and answering to a query. We apply model-driven design for mechanical turk. In Proceedings of the 2011 techniques to the design of the various aspects of the query annual conference extended abstracts on Human tasks and to the transformations among them. factors in computing systems, CHI EA ’11, pages Ongoing activities are addressing the problems of task 2053–2058, New York, NY, USA, 2011. ACM. splitting and automatic model transformations, so as to im- [13] W. Mason and D. J. Watts. Financial incentives and plement a model-driven approach to the design of the tasks, the ”performance of crowds”. In Proceedings of the considering the structured crowdsourcing patterns identified ACM SIGKDD Workshop on Human Computation, in literature. For the future we plan to extend the coverage HCOMP ’09, pages 77–85, New York, NY, USA, 2009. of the deployment to several social and crowdsourcing plat- ACM. forms and integration of the potential responders base from [14] A. Parameswaran and N. Polyzotis. Answering queries several platforms at a time. using humans, algorithms and databases. In Conference on Inovative Data Systems Research ACKNOWLEDGMENTS (CIDR 2011). Stanford InfoLab, January 2011. This research is partially supported by the Search Comput- [15] T. Yan, V. Kumar, and D. Ganesan. Crowdsearch: ing (SeCo) project, funded by European Research Coun- exploiting crowds for accurate real-time image search cil, under the IDEAS Advanced Grants program; by the on mobile phones. In Proceedings of the 8th Cubrik Project, an IP funded within the EC 7FP; and by international conference on Mobile systems, the BPM4People SME Capacities project. We thank all the applications, and services, MobiSys ’10, pages 77–90, projects’ contributors. New York, NY, USA, 2010. ACM.