=Paper= {{Paper |id=None |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-842/crowdsearch-preface.pdf |volume=Vol-842 }} ==None== https://ceur-ws.org/Vol-842/crowdsearch-preface.pdf
                     CrowdSearch: Crowdsourcing Web search
                                                                                            ∗
                                                  [A WWW 2012 Workshop]
            Piero Fraternali, Stefano                      Ricardo Baeza-Yates                   Fausto Giunchiglia
                      Ceri                                       Yahoo! Research                   University of Trento
            Dipartimento di Elettronica e                             Spain                               Italy
                    Informazione                              rbaeza@acm.org                     fausto@dit.unitn.it
             Politecnico di Milano, Italy
           name.surname@polimi.it

1.    GOALS AMD MOTIVATIONS                                                 chitectural choices and technical solutions for opening in-
Link analysis, that has shaped Web search technology in                     formation search to the active participation of human be-
the last decade, can be seen as a massive mining of crowd-                  ings. The key idea is that human beings should be actively
secured reputation associated with pages. With the expo-                    involved in different stages of the search and their actions
nential increase of social engagement, link analysis is now                 should be composed and intermixed with those of computers
complemented by other kinds of crowd-generated informa-                     to get the best possible search results.
tion, such as multimedia content, recommendations, tweets
and tags, and each person can ask for information or advices                1.1    Topics of interest
from dedicated sites. With the growth of online presence, we                The topics of interest for the CrowdSearch workshop include:
expect questions to be directly routed to informed crowds.
At the same time, many kinds of tasks - either directly used
for search or indirectly used for enriching content to make                    • Large-scale knowledge discovery, content enrichment
it more searchable - are explicitly crowd-sourced, possibly                      and quality assessment with the support of humans
under the format of games. Many such tasks can be used to                        and communities.
craft information, e.g. by naming and tagging data objects
and by solving representational ambiguities and conflicts,                     • Models for task crowdsourcing and game creation for
thereby enhancing the scope of searchable objects. Thus,                         information augmentation, integration, extraction, clas-
social engagement is empowering and reshaping the search                         sification, and retrieval.
of Web information.
                                                                               • Software models, architectures, and tools for combin-
                                                                                 ing information management with human and social
CrowdSearch is targeted to enabling, promoting and under-
                                                                                 computations.
standing individual and social participation to search. It
addresses important research questions, such as: How can                       • Throughput, processing time, and results quality opti-
search paradigms make use of social participation? Will                          mization of queries that involve both data and human
keyword-based search seamlessly adapt to social search, or                       sources.
instead will new models of interaction emerge? Should so-
cial interaction be stimulated by curiosity, games, friendship                 • Incentive mechanisms for engaging users in tasks and
or other incentives? Is there a Şcrowdsearching etiquet-                        games, either individually or cooperatively within so-
teŤ to be used when engaging friend or expert communi-                          cial networks.
ties? Should new sources of information be socially scouted?
Which are the mechanisms that may be used to improve or                        • Techniques for identifying and mitigating spam and
reshape search results based upon social ranking? How do                         abuse in crowd search tasks.
social ranking models compare to advertising? Will social
                                                                               • Approaches for measuring the effectiveness and qual-
interaction solve the problems of data integration? What is
                                                                                 ity of human and social applications for information
the role of semantics, and can it help CrowdSearch?
                                                                                 retrieval and their empirical assessment.
The workshop aims at gathering researchers from different                      • Human and social computation in multimedia content
fields to debate about the various concepts, approaches, ar-                     processing for search.

                                                                               • Use cases and applications of human-assisted informa-
                                                                                 tion retrieval.

                                                                               • Role of crowd search in Şbig dataŤ applications.

Copyright c 2012 for the individual papers by the papers’ authors. Copy-       • User models and human factors in task design for crowd-
ing permitted for private and academic purposes. This volume is published        sourced search applications, e.g., cognitive bias, bounded
and copyrighted by its editors.                                                  rationality, understanding the boundaries between search
CrowdSearch 2012 workshop at WWW 2012, Lyon, France                              questions and spam, etc.
2.    PROGRAM HIGHLIGHTS                                         The paper Mechanical Cheat: Spamming Schemes and Ad-
The Workshop has gathered ten research papers, which have        versarial Techniques on Crowdsourcing Platforms, by Djellel
been organized in three Workshop sessions.                       Eddine Difallah, Gianluca Demartini, and Philippe Cudré-
                                                                 Mauroux, reviews techniques currently used to detect spam-
                                                                 mers and malicious workers in crowdsourcing platforms, whe-
2.1   Crowdsearching on textual and linked data                  ther they are bots or humans randomly or semi-randomly
The first session deals with crowdsearching on textual           completing tasks; then, the authors describe the limitations
and linked data.                                                 of existing techniques by proposing approaches that indi-
                                                                 viduals, or groups of individuals, could use to attack a task
The paper by Ali Khodaei and Cyrus Shahabi, Social-Textual       on existing crowdsourcing platforms. They focus on crowd-
Search and Ranking, focuses on how to improve the effec-         sourcing relevance judgements for search results as a con-
tiveness of web search by utilizing social data available from   crete application of their proposed techniques.
users, users actions and their underlying social network, de-
fined as social-textual (socio-textual ) search. They show       Marco Brambilla, Alessandro Bozzon and Andrea Mauri, in
how social aspects can be effectively integrated into the        the short paper A Model-Driven Approach for Crowdsourc-
textual search engines and propose a new social relevance        ing Search, propose a model-driven approach for the spec-
ranking based on several parameters including relationship       ification of crowd-search tasks. In particular they define
between users, importance of each user and actions users         two models: the Query Task Model, representing the meta-
perform on web documents (objects). The proposed social          model of the query that is submitted to the crowd and the
ranking is combined with the conventional textual relevance      associated answers; and the User Interaction Model, which
ranking and evaluated with experiments based on data the         shows how the user can interact with the query model to
from online radio website last.fm.                               fulfill her needs. This approach allows for a top-down de-
                                                                 sign, from the crowd-search task design, down to the crowd
Elena Simperl, Maribel Acosta and Barry Norton, in the           answering system design, this grants automatic code genera-
paper A semantically enabled architecture for crowdsourced       tion thus leading to quick prototyping of search applications
Linked Data management, propose a semantically enabled           based on human responses collected over social networking
architecture for crowdsourced data management systems which      or crowdsourcing platforms.
uses formal representations of tasks and data to automat-
ically design and optimize the operation and outcomes of         2.3    Crowdsourcing for Multimedia Applica-
human computation projects. The architecture is applied to
the context of Linked Data management to address specific               tions
challenges of Linked Data query processing such as identity      The third session addresses the specificities of Crowdsourc-
resolution and ontological classification. Starting from a mo-   ing for Multimedia Applications.
tivational scenario they explain how query-processing tasks
can be decomposed and translated into MTurk projects us-         Masataka Goto, Jun Ogata, Kazuyoshi Yoshii, Hiromasa Fu-
ing a semantic approach.                                         jihara, Matthias Mauch and Tomoyasu Nakano, in the paper
                                                                 PodCastle and Songle: Crowdsourcing-Based Web Services
In the position paper Exploiting Twitter as a Social Chan-       for Retrieval and Browsing of Speech and Music Content,
nel for Human Computation Ernesto Diaz-Aviles, Ricardo           describe two web services, PodCastle and Songle, that col-
Kawase and Wolfgang Nejdl propose a novel decentralized          lect voluntary contributions by anonymous users in order
architecture that exploits the Twitter social network as a       to improve the experiences of users listening to speech and
communication channel for harnessing human computation.          music content available on the web. These services use au-
Their framework provides individuals and organizations the       tomatic speech-recognition and music-understanding tech-
necessary infrastructure for human computation, facilitating     nologies to provide content analysis results, such as full-text
human task submission, assignment and aggregation. The           speech transcriptions and music scene descriptions, that let
paper also presents a proof of concept and explores the fea-     users enjoy content-based multimedia retrieval and active
sibility of the proposed approach in the light of several use    browsing of speech and music signals without relying on
cases.                                                           metadata. When automatic content analysis is used, how-
                                                                 ever, errors are inevitable. PodCastle and Songle therefore
                                                                 provide an efficient error correction interface that let users
2.2   Methods and Tools for CrowdSearching                       easily correct errors by selecting from a list of candidate
The second session focuses on Methods and Tools for              alternatives.
CrowdSearching.
                                                                 In the paper A Framework for Crowdsourced Multimedia
In the paper Human Computation Must Be Reproducible              Processing and Querying, Alessandro Bozzon, Ilio Catallo,
Praveen Paritosh argues that in the social and behavioral        Eleonora Ciceri, Piero Fraternali, Davide Martinenghi and
sciences, when using humans as measuring instruments, re-        Marco Tagliasacchi, introduce a conceptual and architec-
producibility guides the design and evaluation of experiments    tural framework for addressing the design, execution and
and that the results of human computation, which has sim-        verification of tasks by a crowd of performers. The proposed
ilar properties, must be reproducible, in order to be infor-     framework is substantiated by an ongoing application to a
mative. Additionally he discusses the requirements of va-        problem of trademark logo detection in video collections.
lidity or utility of results, which depend on reproducibility.   Preliminary results show that the contribution of crowds
Reproducibility has implications for the design of task and      can improve the recall of state-of-the-art traditional algo-
instructions, as well as for the communication of the results.   rithms, with no loss in terms of precision. However, task-to-
executor matching, as expected, has an important influence
on the task performance.

Christopher G. Harris, in the paper An Evaluation of Search
Strategies for User-Generated Video Content, examines user-
generated content (UGC) search strategies on YouTube us-
ing video requests from several knowledge markets such as
Yahoo! Answers. He compares crowdsourcing and student
search efforts to YouTubeŠs own search interface, applies
these strategies to different types of information needs, rang-
ing from easy to difficult, and evaluatee findings using two
different assessment methods and discuss how the relative
time and financial costs of these three search strategies af-
fect our results.

Finally, the paper Discovering User Perceptions of Seman-
tic Similarity in Near-duplicate Multimedia Files by Raynor
Vliegendhart, Martha Larson and Johan Pouwelse, addresses
the problem of discovering new notions of user-perceived
similarity between near-duplicate multimedia files, with fo-
cus on file-sharing, since in this setting, users have a well-
developed understanding of the available content, but what
constitutes a near-duplicate is nonetheless nontrivial. An
experiment elicited judgments of semantic similarity by im-
plementing triadic elicitation as a crowdsourcing task on
Amazon Mechanical Turk. The judgments are categorized
in 44 different dimensions of semantic similarity perceived
by users. These discovered dimensions can be used for clus-
tering items in search result lists.

3.   ACKNOWLEDGMENTS
The Crowdsearch workshop is sponsored by the CUBRIK In-
tegrating Project of the 7th Framework Program of the EU.
We wish to thank all the members of the Program Com-
mittee, who contributed to selecting an attractive program,
and the invited speakers Donald Kossman and Sihem Amer-
Yahia.