=Paper= {{Paper |id=None |storemode=property |title=A Model-Driven Approach for Crowdsourcing Search |pdfUrl=https://ceur-ws.org/Vol-842/crowdsearch-brambilla.pdf |volume=Vol-842 |dblpUrl=https://dblp.org/rec/conf/www/BozzonBM12 }} ==A Model-Driven Approach for Crowdsourcing Search== https://ceur-ws.org/Vol-842/crowdsearch-brambilla.pdf
        A Model-Driven Approach for Crowdsourcing Search

                                  Alessandro Bozzon, Marco Brambilla, Andrea Mauri
                              Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy
                                                       {name.surname}@polimi.it




ABSTRACT                                                                                       MacroTask Description (BPMN)
Even though search systems are very efficient in retrieving                                                    M2M Transformation

world-wide information, they can not capture some peculiar                                      MicroTask Description (BPMN)
aspects and features of user needs, such as subjective opin-                                                   M2M Transformation
ions and recommendations, or information that require local
or domain specific expertise. In this kind of scenario, the hu-                               User Interaction Model (WebML)
man opinion provided by an expert or knowledgeable user                                                            M2T Transformations
can be more useful than any factual information retrieved
by a search engine.                                                                                                  Application embedded
  In this paper we propose a model-driven approach for the                              Stand-alone
                                                                                                                      in social network or
                                                                                        application
specification of crowd-search tasks, i.e. activities where real                                                     crowdsourcing platform
people – in real time – take part to the generalized search
process that involve search engines. In particular we define                         Figure 1: Overview of our approach.
two models: the “Query Task Model”, representing the meta-
model of the query that is submitted to the crowd and the
associated answers; and the “User Interaction Model”, which
                                                                            issues. Other users’ opinions can ultimately determine our
shows how the user can interact with the query model to
                                                                            decisions. While in the past people could rely on opinions
fulfill her needs. Our solution allows for a top-down design
                                                                            given by close friends on local or general topics, the change in
approach, from the crowd-search task design, down to the
                                                                            the social connections in our society makes users increasingly
crowd answering system design. Our approach also grants
                                                                            rely on online social interaction to complete and validate the
automatic code generation thus leading to quick prototyping
                                                                            results of their search activities. People often search for hu-
of search applications based on human responses collected
                                                                            man help in between canonical web search steps: they first
over social networking or crowdsourcing platforms.
                                                                            query a search system, then they ask for an opinion on the
                                                                            result, maybe they also ask suggestion on the query term.
Categories and Subject Descriptors                                          We define this trend as crowd-searching.
H.3.3 [Information Storage and Retrieval]: Information                         In current Web systems, the crowd-search activity, i.e.
Search Retrieval—Search Process                                             looking for opinion from friends or experts, is detached from
                                                                            the original search process, and is often carried out through
                                                                            different social networking platforms and technologies. More-
Keywords                                                                    over, people manages different applications, different virtual
crowdsourcing, social network, model driven development.                    identities and maybe also different devices: they send email,
                                                                            ask on Twitter, Facebook or other social network, or ask to
1.    INTRODUCTION                                                          friend and people they know.
  While search systems are superior machines to get world-                     Recent works (see section 4) on crowd-based search fo-
wide information, people tend to put more trust in people                   cus on simple and atomic task, while crowd-sourced search
than in automated responses. That is why often users seek                   involve a wide range of scenario, from trivial decisions, like
for opinions collected within friends and expert/local com-                 choosing where going to eat at dinner, to more serious things
munities for taking an informed decision about significant                  like organizing a travel or even buying a house. Thus the
                                                                            user need a way to manage and control the whole process,
                                                                            from the creation of the query, the selection of the target to
                                                                            the gathering of the results.
                                                                               In this paper we propose a model-driven, platform inde-
                                                                            pendent, approach to design Web applications that support
                                                                            crowd-sourced search. We define a top-down design ap-
                                                                            proach, as sketched in Figure 1 which applies model-driven
Copyright c 2012 for the individual papers by the papers’ authors. Copy-    engineering (MDE) techniques for the specification of the
ing permitted for private and academic purposes. This volume is published
and copyrighted by its editors.                                             crowd-sourced information collection task, its splitting and
CrowdSearch 2012 workshop at WWW 2012, Lyon, France                         refinement, and its mapping to the Web user interaction
        Field       1         N   Schema                         CrowdObject
                                                                                         N       1
                                                                                                              Relation               The type of the query defines how a user can answer to
                                                 N      1                            Outgoing   From
     type: String
                    1        1
                                  name: String                                           N        1           type: String         the question. These have been classified in a taxonomy [6]
     name: String                                       N
                        idField                                                      Incoming   To
                                                                                                                                   comprising among others the following task types:
              N

              1
                         1
                                                                                                                                        • Like: the user answers the query by voting (“liking”)
    FieldInstance
                                                                                                                User                      one or more of the query inputs;
    value: String                            Input                              Output           1       N
                                                                    N   1
                                                                                                                user: String
                                                                                                     Answer
                                                                                                                password: String
                                                                                                                                        • Comment: the user answers the query by writing a
                                                                                                         N
                                                        1                                N
                                                                                                                email: String             comment on one or more of the query inputs;
                                                        N    1

                                                     Query                                                                              • Add: the user answers the query by adding one or
                                                                        N
                                                 question: String
                                                                        Responder
                                                                                                                                          more new instances of Output CrowdObject.
                                                                            1                            N
                                                                                                                Asker
                                                 type: String
                                                 open: boolean                                                                        Finally, the Query is also related with a set of Output
                                                                                                                                   CrowdObjects, representing the answers to the question sub-
                                                                                                                                   mitted by the crowd.
                  Figure 2: The query task meta-model.
                                                                                                                                      Users of a crowd-search task can be classified into two
                                                                                                                                   categories: askers and responders. The former is the user
specification. The approach starts from the task description                                                                       using the platform and creating questions to be submitted
and applies model-to-model transformations to build the de-                                                                        to the crowd, while the latter is a user involved in the query
tailed task definitions (described e.g. in BPMN) and then                                                                          answering process using the social network or crowdsourcing
the platform independent user interaction model (described                                                                         platform.
in the domain-specific language WebML[3, 9]). Then the                                                                                Relation represents associations that can exist between
final application is automatically generated by means of a                                                                         CrowdObjects. These relations can be either an Input-Input
model-to-text code generation transformation.                                                                                      relation or a Output-Input relation. They are created when
   The main ingredients that participate to our contribution                                                                       a query is split into sub-queries and depend on the kind of
are: 1) a metamodel of the crowd-sourced question; and                                                                             splitting pattern that is applied. Indeed, starting from the
2) the models of the user interfaces needed for defining the                                                                       design of the coarse-grained task, one can refine its descrip-
questions and for responding. In this short paper we focus                                                                         tion by structuring its activities according to known crowd-
on the aspects related to the model-driven design of the                                                                           interaction patterns (ie.g., find-fix-verify [5], map-reduce [11],
crowd-search user interactions, spanning from the question                                                                         or Turkomatic guidelines [12]).
definition to the engagement, dissemination, and ending in                                                                            Input-Input relations occur when the initial set of inputs
the response submission and collection. On the other side,                                                                         is partitioned across different instances of the same query,
we consider the task refinement and redesign problem as                                                                            to reduce the workload of the responder. For example, if the
outside the scope of this short work.                                                                                              original query would ask the responder to order one hundred
   The paper is organized as follows. Section 2 and Section                                                                        restaurants, it can be useful to split the task into subtask
3 respectively describe our search task meta-model and user                                                                        of ten restaurants each, to be assign to different responders.
interaction model; Section 4 summarizes the related works                                                                          In this case the task performed by each responder is the
for both the crowd-sourcing and the model-driven fields; and                                                                       same, but it is applied on different sets of objects. The
finally, Section 5 concludes.                                                                                                      initial set can be is therefore partitioned into the different
                                                                                                                                   query instances, according to different strategies (e.g. , in
                                                                                                                                   a uniform way or according some properties of the input
2.        TASK MODEL                                                                                                               instances). The input of the new query instances are thus
   The starting point of our MDE approach is the query task                                                                        mapped to the inputs of the original query, according e.g.
model. Figure 2 shows the query task meta-model to which                                                                           to a map-reduce pattern [11].
every query task should conform. The main element is the                                                                              Output-Input relations occurs when the task requested by
Query submitted by a User. The Query is defined by a                                                                               the author of the query is complex or difficult, or if the re-
Question, written in natural language, and a list of Crow-                                                                         sult require some kind of validation, which therefore requires
dObjects, i.e. information structured according to a given                                                                         organizing the task into a sequence of subtasks [5]. In this
schema.1                                                                                                                           case the query is composed by several heterogeneous tasks,
   A question includes a set of Input CrowdObjects, i.e., a                                                                        and each user performs a particular one (e.g., according to
set of data in the user’s question upon which the responder                                                                        a find-fix-verify or similar pattern [5]). Hence, the output of
can apply his response. For example, if the user wants to                                                                          the first subtask becomes the input for another query sub-
collect opinions about some restaurants in Lyon, the Input                                                                         task, and so on, thus generating an output-input mapping.
CrowdObject instances comprise the restaurants subject to
the comparison. The input object can be either inserted
manually at query creation time by the user or extracted
                                                                                                                                   3.    USER INTERACTION MODEL
from a previous search (both canonical or crowd-based) step.                                                                          The user interaction model describes the interface and
The model of these objects is defined by the Schema element.                                                                       navigation aspects of the crowdsearch application. Start-
Input objects are not mandatory for the creation of a query,                                                                       ing from the query task model, possibly split in a complex
as a user can create an open question. However, we always                                                                          pattern of microtasks, a model transformation can lead to a
assume the presence of a Schema.                                                                                                   coarse user interaction model, which in turn can be manu-
                                                                                                                                   ally refined by the designer. The user interaction must cover
1
 To ease the discussion, we assume that information is                                                                             three fundamental phases of the crowdsearch process:
structured in relations; however, other formats (e.g. semi-
structured, graph, etc.) are also suitable.                                                                                             • the submission of the question (performed by the asker);
                                Figure 3: User interaction model for creating a query.


   • the collection of the responses (performed by the re-       query.
     sponder);                                                      Finally, in the Responder Selection page the asker can se-
                                                                 lect the responders to the query: the list of possible respon-
   • and the analysis of the results (available to the asker     ders is retrieved from the social network or crowdsourcing
     for getting insights).                                      platform (in this example, the GetFriend component col-
  At the current stage, our research has identified the inter-   lects the friends from the Facebook platform). The user can
action patterns relevant for each phase, considering the var-    select the responders through the “Friends” multi-selection
ious options of deployment platform, task type, and macro-       list in the page. Eventually, after viewing a preview of the
task splitting pattern. For space reasons, in this section we    created question, the user can post the query on the social
report one possible outcome of the user interaction design,      platform.
in case of simple query task and of deployment on the Face-         Figure 4 shows a compressed view of the web pages that
book social networking platform. We describe the phases          are produced starting from the user interaction models de-
of query creation and of query answering, according to the       scribed in Figure 3, thanks to the code generation facilities
WebML notation [3].                                              of WebRatio. The structure and content of the pages can be
                                                                 easily recognized and mapped to the corresponding model
3.1   Query creation                                             elements. In this particular example the user wants to know
   Figure 3 shows the user interaction model for creating and    some good restaurants in Milan. Hence she defines the ques-
submitting a query, according to the WebML notation. In          tion “Can you suggest me some good restaurants in Milan?”
the Create Query page, the user specifies the textual ques-      and selects the “Add” query type. Then she creates the
tion (e.g., “What’s the best museum to visit in Milan?”) and     schema the input instances of the question must conform to.
sets the query type (e.g., “Like”, “Add”, and so on). The user   In the instance list she adds a restaurant she already knows.
can also choose the type “open question”, thus assuming that     Finally she selects the recipients of the question from the
no items are needed in input for the responder to select/like    list of her friends extracted from Facebook
and so on. In both cases, an Query instance is created, and
its type is set. If the query does not have inputs, then the     3.2   Answering to a query
user is directly brought to the Responder Selection page.           Figure 5 depicts the WebML model for the query answer-
   If, on the other hand, the user chooses to build a struc-     ing activity, performed by a responder based on the query
tured question with inputs, then he is redirected to the “De-    structure defined by the asker. When accessing the applica-
fine schema” page, where the asker can create a schema for       tion through the Responder Dashboard page, the responder
inputs by assigning a general name to the input type and         is presented with a list of questions to answer. By clicking
by defining its attributes in terms of name and type. By         on a question, he is brought to the Details page, where he
submitting the form, the application creates a new instance      can provide his answer. The page shows the question text,
for the Schema entity and its associated Fields. The asker is    plus the set of defined input instances (Input component in
brought to the Add Instance page, where he can add input         the Details page).
objects following the schema previously defined. The spec-          Depending on the type of the question (defined by the
ified instances of the Input are created and linked to the       asker defined during the query creation phase), different
 Figure 4: Rendering of the Web pages implementing the query creation phase, as generated by WebRatio.




                                 Figure 5: Hypertext model for answering to a query.


concrete user interfaces can be shown: in the case of a          tion, i.e., the discipline that aims to use human knowledge
“Like” question, the responder simply selects the preferred      to fulfill tasks that are difficult or even impossible for a ma-
instances in the Input list; as a consequence, a set of Output   chine. For example, human computation studies have been
objects are created corresponding to the “likes” of the user.    done about using crowd’s knowledge for image recognition
   In the case of “Comment” or “Add” question, the user is       [15], to answer ambiguous queries[6] or to refine incomplete
shown a form to respectively write a comment or add a new        data [10, 14]. The platforms most adopted for exploiting hu-
instance to the list. In the case of “Comment” questions,        man knowledge and skills are based on crowdsourcing (the
an Output object with the comment schema (i.e. a single          most prominent example being Amazon Mechanical Turk
textual field) is created. The “Add” case is more noteworthy,    [1]). However, other ways of collecting human intelligence
as the Output objects will present a schema equivalent to the    can be exploited, such as social networks.
Input ones, so to add the new object instances to the list of       A very important aspect of a crowd-based search is the
input object of the query.                                       quality of the results and the response time, therefore sev-
   Figure 6 shows the compressed view of the “Details” page      eral works have addressed the problem of understanding how
built from the user interaction model described in Figure 5.     design features (for example the cost of the task) impacts
Continuing the previous example, in this page the responder      on these results metrics [13, 4].
of the question can add additional restaurants he knows.            The novelty aspects of our approach with respect to the
                                                                 existing works include: independence with respect to the
4.     RELATED WORKS                                             crowdsourcing platform (in particular, we allow to exploit in-
                                                                 differently a social network or a crowdsourcing marketplace
     This work falls into the broad field of human computa-
                                                                 6.   REFERENCES
                                                                  [1] Amazon mechanical turk https://www.mturk.com.
                                                                  [2] Turkit http://groups.csail.mit.edu/uid/turkit/.
                                                                  [3] Webml http://www.webml.org.
                                                                  [4] D. Ariely, U. Gneezy, G. Loewenstein, and N. Mazar.
                                                                      Large Stakes and Big Mistakes. Review of Economic
                                                                      Studies, 75:1–19, 2009.
                                                                  [5] M. S. Bernstein, G. Little, R. C. Miller, B. Hartmann,
                                                                      M. S. Ackerman, D. R. Karger, D. Crowell, and
                                                                      K. Panovich. Soylent: a word processor with a crowd
                                                                      inside. In Proceedings of the 23nd annual ACM
                                                                      symposium on User interface software and technology,
                                                                      UIST ’10, pages 313–322, New York, NY, USA, 2010.
Figure 6: Rendering of the query answering Web                        ACM.
page, as generated by WebRatio.                                   [6] A. Bozzon, M. Brambilla, and S. Ceri. Answering
                                                                      search queries with crowdsearcher. In Proceedings of
                                                                      the World Wide Web conference (WWW 2012), page
of choice); model-driven design of tasks and user interac-            in print, 2012.
tions; model-transformation based approach that partly au-        [7] M. Brambilla, S. Butti, and P. Fraternali. Webratio
tomates the generation of some models, thus reducing the              bpm: A tool for designing and deploying business
cost of designing new applications; and possibility of man-           processes on the web. In B. Benatallah, F. Casati,
ually or automatically choosing the responders to a query             G. Kappel, and G. Rossi, editors, ICWE, volume 6189
task. Our work can be seen as an extended social ques-                of Lecture Notes in Computer Science, pages 415–429.
tion answering approach (as applied in Quora and other                Springer, 2010.
well known platforms), where the asker has greater flexibility    [8] M. Brambilla, S. Ceri, P. Fraternali, and I. Manolescu.
in defining and sharing his questions. Our work addresses             Process modeling in web applications. ACM Trans.
the problem of defining crowdsourcing tasks at the modeling           Softw. Eng. Methodol., 15(4):360–409, 2006.
level, while existing approaches and tools typically allow for    [9] S. Ceri, P. Fraternali, A. Bongio, M. Brambilla,
a programming approach to the problem (e.g., see TurkIt               S. Comai, and M. Matera. Designing data-intensive
[2]).                                                                 Web applications. Morgan Kaufmann, USA, 2003.
   Our work is based on general purpose model-driven tech-       [10] M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh,
niques and on our previous work on Web application design             and R. Xin. CrowdDB: answering queries with
[9], on mapping business processes to user interaction mod-           crowdsourcing. In Proceedings of the 2011
els [8], as well as on the preliminary results presented in           international conference on Management of data,
the CrowdSearcher approach [6]. From the implementation               SIGMOD ’11, pages 61–72, New York, NY, USA, June
perspective, we rely on the WebRatio toolsuite [7], which             2011. ACM.
provides code generation facilities for WebML models.
                                                                 [11] A. Kittur, B. Smus, and R. Kraut. CrowdForge:
                                                                      crowdsourcing complex work. In Proceedings of the
                                                                      2011 annual conference extended abstracts on Human
5.   CONCLUSIONS AND FUTURE WORKS                                     factors in computing systems, CHI EA ’11, pages
   In this paper we presented a model-driven approach for             1801–1806, New York, NY, USA, 2011. ACM.
crowdsourcing responses to questions. We defined a meta-         [12] A. P. Kulkarni, M. Can, and B. Hartmann.
model of the query taks and a user interaction model for              Turkomatic: automatic recursive task and workflow
building and answering to a query. We apply model-driven              design for mechanical turk. In Proceedings of the 2011
techniques to the design of the various aspects of the query          annual conference extended abstracts on Human
tasks and to the transformations among them.                          factors in computing systems, CHI EA ’11, pages
   Ongoing activities are addressing the problems of task             2053–2058, New York, NY, USA, 2011. ACM.
splitting and automatic model transformations, so as to im-      [13] W. Mason and D. J. Watts. Financial incentives and
plement a model-driven approach to the design of the tasks,           the ”performance of crowds”. In Proceedings of the
considering the structured crowdsourcing patterns identified          ACM SIGKDD Workshop on Human Computation,
in literature. For the future we plan to extend the coverage          HCOMP ’09, pages 77–85, New York, NY, USA, 2009.
of the deployment to several social and crowdsourcing plat-           ACM.
forms and integration of the potential responders base from      [14] A. Parameswaran and N. Polyzotis. Answering queries
several platforms at a time.                                          using humans, algorithms and databases. In
                                                                      Conference on Inovative Data Systems Research
ACKNOWLEDGMENTS                                                       (CIDR 2011). Stanford InfoLab, January 2011.
This research is partially supported by the Search Comput-       [15] T. Yan, V. Kumar, and D. Ganesan. Crowdsearch:
ing (SeCo) project, funded by European Research Coun-                 exploiting crowds for accurate real-time image search
cil, under the IDEAS Advanced Grants program; by the                  on mobile phones. In Proceedings of the 8th
Cubrik Project, an IP funded within the EC 7FP; and by                international conference on Mobile systems,
the BPM4People SME Capacities project. We thank all the               applications, and services, MobiSys ’10, pages 77–90,
projects’ contributors.                                               New York, NY, USA, 2010. ACM.