<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Flexible Querying in Geo-Finder</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gloria Bordogna</string-name>
          <email>gloria.bordogna@idpa.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Psaila</string-name>
          <email>psaila@unibg.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <addr-line>CNR-IDPA - via Pasubio 5, I-24044 Dalmine (BG)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The Geo-Retrieval model</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universit`a di Bergamo - viale Marconi 5</institution>
          ,
          <addr-line>I-24044 Dalmine (BG)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The evaluation of queries specifying both content based conditions and spatial conditions on documents contents in Geographic Information Retrieval requires representing the vagueness and context dependency of spatial conditions and the personal user's preferences. The Geo-Finder system [1] implements a Geo-Retrieval model that evaluates flexible spatial queries combined with content queries. The spatial condition is interpreted as the soft constraint “close” on the user's perceived distance. Two distinct semantics can be used to combine the spatial and the content conditions: and possibly or average; in both cases it is possible to modify the relative weight (preference) of conditions.</p>
      </abstract>
      <kwd-group>
        <kwd>Geographic Information Retrieval</kwd>
        <kwd>Fuzzy aggregation operators</kwd>
        <kwd>context dependent spatial query</kwd>
        <kwd>soft constraint</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        An important issue in GIR is the problem of spatial querying [
        <xref ref-type="bibr" rid="ref2 ref3 ref5">2, 5, 3</xref>
        ], intended
as supporting the distinct information needs of users that may access the same
collection for different purposes. To address it, GIRs must be developed to take
user’s preferences into account, to rank query results in terms of relevance [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        In the Geo-Finder system [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we devised a Geo-Retrieval model for flexible
querying a GIR, such that: the user expresses the spatial condition based on the
“close” soft constraint, adapting the spatial scope to the perceived meaning of
spatial conditions; the user expresses preferences on how to combine the content
conditions with the spatial conditions.
      </p>
      <p>In the spatial condition, the user’s context is modeled as user’s perceived
distance measure, that modifies the spatial scope of the query.</p>
      <p>Two distinct semantics are provided for flexibly combining the content
condition and the spatial condition: the asymmetric and possibly aggregation combines
the mandatory content condition with the optional spatial condition; the
compensative average aggregation linearly combines the two conditions. The relative
weight between the conditions can be specified to achieve personalization.</p>
      <p>
        A fuzzy footprint of a document d, denoted as Foot (d), is a fuzzy set of
geographic coordinates gc= (lat ,lon), where lat =latitude lon=longitude (expressed
in degrees), with a membership degree μF oot(d)(gc) ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] representing the
significance by which the geographic location gc belongs to the geographic focus of
document d:
      </p>
      <p>
        Foot (d) = {h gc1, μF oot(d)(gc1)i, . . . , hgcn, μF oot(d)(gcn)i}
where each gci = (lat i,loni) and its membership degree μF oot(d)(gci) are
determined by the Geo-Indexing module [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        A user query q consists of two conditions: a content-based condition,
expressed by a list of content keywords, and a spatial condition, expressed by a
list of geographic names. The spatial condition is interpreted as the requirement
for documents with geographic reference “close” to the specified place names.
These two conditions are evaluated by specific partial matching functions that
compute two distinct scores in [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ]: the Retrieval Status Value w.r.t. the
content, denoted as RSV content(d), and the Geographic Retrieval Value, denoted as
      </p>
      <sec id="sec-1-1">
        <title>GRV closeness(d).</title>
        <p>In Geo-Finder, RSV content(d) is a classical cosine similarity measure, computed
by means of the Lucene library.</p>
        <p>These two scores are finally combined to compute the global Retrieval Status
Value w.r.t. the whole query q, indicated by RSV q(d), by applying a suitable
aggregation function. We defined two aggregation functions, since we considered
two distinct aggregation semantics, i.e., the and possibly asymmetric aggregation
and the average compensative aggregation.</p>
        <p>Evaluation of the spatial condition. Given the fuzzy footprint Foot (q) of the
geographic names in the query q, the fuzzy footprints of the documents d, Foot (d),
that are likely to satisfy the query are retrieved by accessing the footprint
spatial index. The semantics of the spatial condition is that of evaluating a user’s
context dependent “closeness” of the documents’ footprints Foot (d) to the query
footprint Foot (q). This is done by a matching function close which models the
concept of “close” as a user’s context dependent soft constraint.</p>
        <p>
          The matching function close computes a Geographic Retrieval Value,
GRV closeness(d) ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ], depending on the closeness of the document footprint
to the query footprint as follows:
        </p>
        <p>GRV closeness(d) = μclose(Foot (d),Foot (q)) =
=max i∈Foot (d),j∈Foot (q) qscope(dist (i, j) × min(μF oot(d)(i), μF oot(q)(j)))
Where μF oot(d)(i) and μF oot(q)(j) are the membership degrees of the i-th and
j-th fuzzy spatial references gci ∈Foot (d) and gcj ∈Foot (q), i.e., the extent to
which a spatial reference represents the geographic focus of the document and
of the query, respectively.</p>
        <p>The dist (i, j) function is a great circle approximation of the actual distance
between the two spherical coordinates gci and gcj .</p>
        <p>The qscope function modifies the geographic distance so as to model the user
perceived distance as follows:
qscope(x) =
δ/(x + δ) if x ≤ δ + k × MaxDist (Foot (d))
0 otherwise
with δ ≥ 0, k &gt; 0
MaxDist (X) =max i,j∈X (dist (i, j)) is the maximum geographic distance between
any two geographic places i and j in the footprint X, and can be considered as the
maximum dispersion of the fuzzy footprint X. It is zero in the case X contains
just one single place. Thus MaxDist (Foot (d)) is the query dispersion. Its value
depends on the number of geographic names specified in the query and on the
maximum distance between their geographic coordinates.</p>
        <p>The parameters δ and k permit to change the spatial scope of the query. The
parameter δ is the query range, and is useful in the case of a query footprint
consisting of a single geographic coordinate pair gc in order to retrieve also
documents with footprint in the surrounding places. Distinct δ can adapt the
evaluation of the spatial condition “close” to the user perception, thus, modeling
strict or relaxed interpretations of the “closeness” surroundings of a point. The
higher the δ, the greater is the surrounding.</p>
        <p>The parameter k makes it possible to model a tolerance on the geographic
distance between a document fuzzy footprint and the query footprint, so that
one can consider close places within a distance of k times MaxDist (Foot (d)), i.e.,
k times the query maximum dispersion.</p>
        <p>We consider four main query scopes that can be related to the user’s context,
and that are defined in the Geo-Finder system by the following default values of
k and δ. (1) The small scope is defined with k = 5, δ = 3 km; it is useful when
Foot (q) is a street address within a city or a small city and we are interested in its
very near surroundings (in this case, Foot (q) could vary approximately between
0 and about 10 km): with this setting, one can retrieve documents within a
distance from the query of 3 km to about 50 km. (2) The meso scope is defined
with k = 4, δ = 50 km; in this case, MaxDist (Foot (d)) covers the area of either a
region or a small nation like Belgium. (3) The large scope is defined with k = 3,
δ = 1000 km, in this case MaxDist (Foot (d)) covers the area of a medium nation
such as France (in this case Foot (q) could vary approximately between 0 and a
few thousand kilometers). (4) The full scope is defined with k = 3, δ = 10000
km; in thsi case, MaxDist (Foot (d)) covers the area of a big nation such as Russia
or of a continent.</p>
        <p>For example, if one specifies a spatial condition with the two geographic
names Bergamo, Como (Como being at about 40 km from Bergamo), and the
query scope is meso (i.e. k = 4 and δ = 50 km) the documents with footprints
at a maximum distance of 210 km from the query footprint are retrieved: for
instance, both documents in Milano and Lugano are retrieved while a document
with a footprint in Rome is not.</p>
        <p>The Global RSV. Geo-Finder implements two distinct semantics to combine</p>
      </sec>
      <sec id="sec-1-2">
        <title>RSV content(d) and GRV closeness(d).</title>
        <p>The asymmetric and possibly semantics is defined as follows:</p>
        <p>RSV q(d) =RSV content(d) and possibly α GRV closeness(d) =
=RSV content(d)× max ((1 − α),GRV closeness(d))
Parameter α specifies the user’s preference of the spatial condition w.r.t. the
content condition. When α = 0, it means that the spatial condition can be
disregarded to rank the documents, and in this case the global Retrieval Status
Values is determined solely based on the content relevance score RSV content(d).
When α = 1, the two conditions are both mandatory: this means that the
Geographic Retrieval Value GRV closeness(d) has the same relevance of the content
Retrieval Status Value RSV content(d). In this case, the aggregation reduces to
the product, i.e., the “fuzzy Anding” of the two relevance scores. Intermediate
values of α in (0, 1) demands for an asymmetric combination. The value (1 − α)
guarantees a minimum satisfaction level for GRV closenss(d), so that the spatial
condition becomes optional and the global RSV q(d) is not too much penalized
in the case in which the spatial condition is not satisfied.</p>
        <p>With the symmetric Average semantics, the Global RSV is defined as follows:
RSV q(d) =RSV content(d) averageα GRV closeness(d) =
= (1 − α)× RSV content(d) + α×GRV closeness(d)</p>
        <p>When the preference degree α = 0, the result is determined solely by the
satisfaction of the content condition; conversely, when α = 1, the global RSV is
determined solely by the satisfaction of the spatial condition, and the content
based condition is irrelevant. Intermediate values of α permit to vary the
tradeoff between the influences of the two conditions; in this case, the two conditions
compensate each other, while with the and possibly semantics it is mandatory
to satisfy the content condition to retrieve a document.
3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Conclusions</title>
      <p>
        The Geo-Retrieval model described in this paper is implemented in the
GeoFinder system. In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we extensively presented its features. Furthermore, in
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], some evaluation results are also discussed showing the improvement of
GeoFinder ranking over Google ranking. The evaluations also showed that the
precision of Geo-Finder improves when restricting the geographic domain of interest,
thus outlining the positive role of modeling the user’s context which determines
the perceived distance when evaluating the spatial query condition.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>G.</given-names>
            <surname>Bordogna</surname>
          </string-name>
          , G. Ghisalberti, and
          <string-name>
            <given-names>G.</given-names>
            <surname>Psaila</surname>
          </string-name>
          .
          <article-title>Geographic information retrieval: Modeling uncertainty of user's context</article-title>
          .
          <source>Fuzzy Sets and Systems.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>G.</given-names>
            <surname>Cai. GeoVSM</surname>
          </string-name>
          :
          <article-title>An integrated retrieval model for geographic information</article-title>
          . In M.J. Egenhofer and
          <string-name>
            <surname>D.M. Marks</surname>
          </string-name>
          (Eds),
          <source>GIScience</source>
          <year>2002</year>
          , LNCS
          <volume>2478</volume>
          , pages
          <fpage>65</fpage>
          -
          <lpage>79</lpage>
          . 'Springer Verlag,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          .
          <article-title>Indexing implicit locations for geographical information retrieval</article-title>
          .
          <source>In n Proceedings of GIR-2006, Int. Conf. on Geographical Inf. Retrieval</source>
          , Seattle, USA,
          <year>August 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>G.</given-names>
            <surname>Mountrakis</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Stefanidis</surname>
          </string-name>
          .
          <article-title>Moving towards personalized geospatial queries</article-title>
          .
          <source>Journal of Geographic Information System</source>
          ,
          <volume>3</volume>
          :'
          <fpage>334</fpage>
          -
          <lpage>344</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>R.S.</given-names>
            <surname>Purves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Clough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.B.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arampatzis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Finch</surname>
          </string-name>
          , G. Fu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Joho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.K.</given-names>
            <surname>Syed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vaid</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the internet</article-title>
          .
          <source>International Journal of Geographical Information Science</source>
          ,
          <volume>21</volume>
          (
          <issue>7</issue>
          ):'
          <fpage>717</fpage>
          -
          <lpage>745</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>