<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Classical vs. Crowdsourcing Surveys for Eliciting Geographic Relevance Criteria</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stefano De Sabbata</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Omar Alonso</string-name>
          <email>omar.alonso@microsoft.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Mizzaro</string-name>
          <email>mizzaro@uniud.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Microsoft Corp. 1065 La Avenida, Mountain View CA</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Udine Via delle Scienze 206</institution>
          ,
          <addr-line>33100 Udine</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Zurich-Irchel Winterthurerstrasse 190</institution>
          ,
          <addr-line>CH-8057 Zurich</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Geographic relevance aims to assess the relevance of physical entities (e.g., shops and museums) in geographic space for a mobile user in a given context, thereby shifting the focus from the digital world (the realm of classical information retrieval) to the physical world. We study the elicitation of geographic relevance criteria by means of both a classical survey and an Amazon Mechanical Turk (a crowdsourcing platform) survey. This allows us to obtain three results: rst, we gather a set of criteria and their relative importance; second, we gain a rst insight on the di erences between geographic relevance and classical relevance as commonly understoon in the IR eld; and third we draw some considerations on the agreement, on the importance of speci c criteria, among the participants to the classical and the crowdsourcing surveys.</p>
      </abstract>
      <kwd-group>
        <kwd>Relevance</kwd>
        <kwd>Crowdsourcing</kwd>
        <kwd>Amazon Mechanical Turk</kwd>
        <kwd>SurveyMonkey</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The elicitation of relevance criteria dates back to the 90s, if not earlier [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
Although such criteria seemed quite well established at that time [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], recently
this issue is studied again [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This is probably due to the Web, that on the
one side provides novel search services that might entail a di erent notion of
relevance, and on the other side allows more convenient methods for preparing
surveys involving several participants.
      </p>
      <p>In this short paper, we concentrate on Geographic Relevance (GR), a recent
area of Information Retrieval (IR), and we discuss the elicitation of relevance
criteria by means of:
{ SurveyMonkey (SM, www.surveymonkey.com), a Web service that allows the
preparation of an online survey whose participants are then invited by email,
and
{ Amazon Mechanical Turk (AMT, www.mturk.com), a crowdsourcing
platform that allows to outsource to the crowd speci c tasks for a small amount
of money.</p>
      <p>The aim of this research is threefold:
{ to nd suitable GR criteria, that might be di erent from the classical
relevance criteria;
{ to gain a rst insight into the di erence between GR and the classical concept
of relevance in the IR eld;
{ to understand if AMT provides reliable results, or at least if those results
agree with the SM ones, which are obtained in a more classical way.</p>
      <p>
        AMT quality and reliability are important issues [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]: there is no guarantee
that AMT workers provide reliable answers and that they carry on their task in
a reliable way; for example, workers might cheat to quickly gain money. This is
even more critical as crowdsourcing is emerging as a widespread alternative for
relevance evaluations.
      </p>
      <p>In the following, we rst de ne GR (Section 2) and discuss crowdsourcing
and AMT (Section 3) then we present the experimental study and its results
(Section 4), and we nally summarize the main ndings (Section 5).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Geographic Relevance Criteria</title>
      <p>
        The basic idea of GR is to assess the relevance of physical entities (e.g., shops
and museums) in geographic space for a mobile user in a given context [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This
de nition implies a shift from the informational world | that is the focus of
IR, which is devoted to retrieve information from unstructured digital document
collections | to the physical world. In other terms, the aim of GR is to apply the
principles and concepts developed in the eld of IR not only in the informational
world, but also in the physical world [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>GR is di erent from Geographic Information Retrieval because the second
still focuses on digital entities. The aim of Geographic Information Retrieval is
to retrieve geographic information from digital documents, or to nd relevant
digital documents that can satisfy a user's need for geographic information.
GR uses digital entities (e.g., the objects in a collection within a Geographic
Information System, or documents, or images, etc.) as means to estimate the
relevance of the physical entities they refer to, rather than aiming to evaluate
the relevance of the digital entities themselves.</p>
      <p>
        In shifting the focus from the digital world to the physical world, a rst
question is whether the criteria of relevance developed in IR [
        <xref ref-type="bibr" rid="ref1 ref2 ref7">7, 2, 1</xref>
        ] can be
applied to assess GR. A second question is whether other criteria are needed in
order to fully understand the relevance of a physical entity. We ground our study
Properties
      </p>
      <p>Geography</p>
      <p>
        Information Presentation
on the set of criteria of GR proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]; these criteria are listed in Table 1.
We do not have the space here to discuss these criteria in detail; a comprehensive
description of each single criterion, together with a more in depth analysis, is
provided in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Crowdsourcing</title>
      <p>Crowdsourcing has emerged as a feasible alternative for relevance evaluation
because it brings the exibility of the editorial approach at a larger scale.</p>
      <p>AMT is an example of a crowdsourcing platform: it is an Internet service that
gives developers the ability to include human intelligence as a core component of
their applications. Developers use a web services API to submit tasks, approve
completed tasks, and incorporate the answers into their software applications. To
the application, the transaction looks very much like any remote procedure call:
the application sends the request, and the service returns the results. People (the
\crowd") come to the web site looking for tasks and receive payment for their
completed work. In addition to the API, there is also the option to interact using
a dashboard that includes several useful features for prototyping experiments.
There is an increased participation by large numbers of online users from all over
the world, which is a good sample that includes diversity.</p>
      <p>The individual or organization who has work to be performed is known as
the requester. A person who wants to sign up to perform work is described in
the system as a worker.</p>
      <p>
        One issue with AMT and similar crowdsourcing platform is quality [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]: there
is no guarantee that the workers provide correct answers and that they carry on
their task in a reliable way. For example, workers might cheat to quickly gain
money. One of the aims of this paper is to compare a survey carried on by means
of AMT with a similar one carried on by more classical means, like SM.
1. Considering a place that ts your needs by its category (e.g. a restaurant, if you
want to go out for dinner), which other criteria would you take into account?
{ A place that o ers just the services you need is more relevant than a place
that also o ers other services.
{ A place that o ers all the services you need is more relevant than a place that
o ers just some of them.
{ A place that was previously unknown to you is more relevant than an
already known place.
2. Considering a place that ts your needs, do you take into account the following
criteria related to the presented information and the way it is presented (for example
on your mobile device) to judge its relevance?
{ The more information available about a place, the higher is the relevance of
the place.
{ The more accurate the information about a place, the higher is the relevance
of the place.
{ The more current, recent, timely, up-to-date the information about a place,
the higher is the relevance of the place.
{ The more dynamic, active or interactive the presentation of information, the
higher is the relevance of the presented place.
{ The more the information about a place is presented in a certain format or
style, or o ers output in a way that is helpful, desirable, or preferable, the
higher is its relevance.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <sec id="sec-4-1">
        <title>Experimental design</title>
        <p>We selected a subset of the criteria listed in Table 1: the 14 criteria in italics.
We chose many of the geographic criteria, leaving out spatial proximity and
temporal proximity (we took into account the spatio-temporal proximity that
combines both), and association rule (which is di cult to explain and can be
misunderstood if not explained in detail). We selected two or three criteria from
each of the other groups, choosing the easier to explain in a few words and,
probably, the most intuitive ones.</p>
        <p>Towards the aims stated in Section 1, we ran 3 experiments:
{ A SM survey (referred to as SMs) sent by email to researchers and students
in IR and similar subjects.
{ A rst AMT survey (AMTs1) obtained by simplifying the SM survey and
by focussing on some items only.
{ A second AMT survey (AMTs2) obtained, after the responses to AMTs1, by
ne tuning the language to tailor it to the AMT environment, where workers
usually are not keen to spend much time on a task.</p>
        <p>The questions were asked in an indirect way: for example, we did not ask
literally whether \spatio-temporal proximity is an important GR criterion"; rather
1. Given a place in the right category (e.g., a restaurant, if you want to go out for
dinner), which other criteria would you take into account?
{ A place that o ers just the services you need is more relevant than a place
that also provides other services.
{ A place that o ers all the services you need is more relevant than a place that
provides just some of them.
{ A place that was previously unknown to you is more relevant than an
already known place.
2. Considering a place that ts your needs, do you take into account the following
criteria to judge its relevance?
{ The more information available about a place, the higher is the relevance
of the place.
{ The more accurate the information about a place, the higher is the relevance
of the place.
{ The more current, recent, timely, up-to-date the information about a
place, the higher is the relevance of the place.
{ The more dynamic, active or interactive the presentation of
information, the higher is the relevance of the presented place.
{ The more the information about a place is presented in a certain format
or style, or o ers output in a way that is helpful, desirable, or
preferable, the higher is its relevance.
we asked whether \it is important to take into account whether the place (or
a related event) will be available at the time you will be able to reach it (e.g.,
whether you can reach the shop before it closes)." The questionnaire included a
total of 14 items, arranged into three main questions.</p>
        <p>Figure 1 shows two of the three questions (each one grouping some items) as
framed in SMs and AMTs1. In SMs, a rst page was dedicated to the criteria
not related to geographic concepts (e.g., novelty ), whereas a second page was
dedicated to the geography-related criteria. The same items have been used in
AMTs1, where the 3 questions were all presented in one page. Figure 2 shows
the same items as framed in AMTs2, where we slightly modi ed the questions
(but not the items, that were almost identical to SMs and ATMs14), each one
presented in a separate page. Participants assessed each item on a 7-point Likert
scale \1 - Strongly disagree" { \7 - Strongly agree" (all the scale values appear
on the ordinal axis in Figure 3).
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Results</title>
        <p>The number of participants in the three cases is similar: SMs got 53 participants,
AMTs1 43, and AMTs2 42 (we discarded two outliers from each AMT survey
since they were far too quick). The collected demographics say that participants
4 The only di erences, as shown in the gures, is the change of \o er" into \provide"
and the usage of boldface to highlight some terms.
to SMs were familiar with digital maps (71% use them at least several times a
week), mobile maps (51% use them on their mobile), and online yellow pages
(only 30% of the participants have never used them). We did not collect
demographic data for AMT (we plan to do that in future experiments). We paid $0.15
to each AMT worker. The total cost for both AMT experiments was $16.</p>
        <p>The Kolmogorov-Smirnov normality test was negative, so we considered the
variables as ordinal. Figure 3 shows the median importance of the single criteria
in the three surveys.</p>
        <p>By analyzing the relative importance of the criteria, three groups can be
singled out: a rst one including the three leftmost criteria (coverage,
spatiotemporal proximity, and currency ), whose importance seems very high according
to all the three surveys; a second group including the central seven criteria whose
importance is tangible, but somehow lower with respect to the rst group; and
a nal group of the four rightmost criteria whose importance seems rather low
and more inconsistent among the three surveys.</p>
        <p>Turning to the agreement among the participants in the three surveys, we
can note rst that SMs median values are generally lower than AMTs1/2. Also,
agreement is di erent for each criterion, as con rmed by a Mann-Whitney test:
{ highly signi cant (p &lt; :01) di erence has been found between SMs and
AMTs1, and also between SMs and AMTs2, for the criteria availability,
accuracy, dynamism, presentation quality ;</p>
        <p>AMT
{ highly signi cant (p &lt; :01) di erence has been found between SMs and
AMTs1 for the criterion hierarchy, and between SMs and AMTs2 for the
criterion visibility ;
{ signi cant (p &lt; :05) di erence has been found between SMs and AMTs1 for
the criteria currency and visibility, and between SMs and AMTs2 for the
criterion co-location;
{ no statistical signi cant di erence has been found between AMTs1 and
AMTs2, in any criteria.</p>
        <p>Besides di erences in quality per se, there are other characteristics that may
in uence the choice of system for conducting surveys. We present the most
important aspects in Table 2.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>Overall, the results hint that:
{ The most important GR criteria seem to be coverage, spatio-temporal
proximity, and currency.
{ SM and AMT surveys provide slightly di erent results.
{ The di erences mainly concern the importance of four criteria (availability,
accuracy, dynamism and presentation quality )
{ None of these four criteria are in the Geography set (see Table 1).
This last point is perhaps surprising, since one would expect that the
heterogeneous background and cultural di erences of the international AMT population
would particularly a ect the elicitation of geographic criteria. However, in our
experiments disagreement was mainly on classical relevance criteria.</p>
      <p>One further point to remark is that the average quality of AMT workers
answers was good, as demonstrated by the good agreement level with SM, although
we did not require quali ed workers | as it would have been possible in AMT.</p>
      <p>Finally, as future work, we are considering a more \visual" survey, with more
images or scenarios, than just pure text as we did in this work .</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>O.</given-names>
            <surname>Alonso</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizzaro</surname>
          </string-name>
          .
          <article-title>Relevance criteria for e-commerce: a crowdsourcing-based experimental analysis</article-title>
          .
          <source>In SIGIR '09: Proceedings of the 32nd international ACM SIGIR</source>
          , pages
          <volume>760</volume>
          {
          <fpage>761</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Barry</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Schamber</surname>
          </string-name>
          .
          <article-title>Users' criteria for relevance evaluation: A crosssituational comparison</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <volume>34</volume>
          (
          <issue>2-3</issue>
          ):
          <volume>219</volume>
          {
          <fpage>236</fpage>
          , May
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>P.</given-names>
            <surname>Coppola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. D.</given-names>
            <surname>Mea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Gaspero</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizzaro</surname>
          </string-name>
          .
          <article-title>The concept of relevance in mobile and ubiquitous information access</article-title>
          .
          <source>In Mobile HCI Workshop on Mobile and Ubiquitous Information Access</source>
          , volume
          <volume>2954</volume>
          <source>of LNCS</source>
          , pages
          <volume>1</volume>
          {
          <fpage>10</fpage>
          . Springer,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. S. De Sabbata.
          <article-title>Criteria of geographic relevance</article-title>
          .
          <source>In 6th Int'l Conf. on Geographic Information Science</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. S. De Sabbata and
          <string-name>
            <given-names>T.</given-names>
            <surname>Reichenbacher</surname>
          </string-name>
          .
          <article-title>Criteria of geographic relevance: an experimental study</article-title>
          .
          <source>International Journal of Geographic Information Science, forthcoming.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>P.</given-names>
            <surname>Marsden</surname>
          </string-name>
          . Crowdsourcing.
          <source>Contagious Magazine</source>
          ,
          <volume>18</volume>
          :
          <fpage>24</fpage>
          {
          <fpage>28</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizzaro</surname>
          </string-name>
          . Relevance:
          <article-title>The whole history</article-title>
          .
          <source>Journal of the American Society for Information Science</source>
          ,
          <volume>48</volume>
          (
          <issue>9</issue>
          ):
          <volume>810</volume>
          {
          <fpage>832</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>T.</given-names>
            <surname>Reichenbacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Crease</surname>
          </string-name>
          , and S. De Sabbata.
          <article-title>The concept of geographic relevance</article-title>
          .
          <source>In Proceedings of the 6th Int'l Symposium on LBS &amp; TeleCartography</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>