<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Monolingual Retrieval Experiments with Spatial Restrictions at GeoCLEF 2007</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ralph Kölle</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ben Heuwing</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Mandl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christa Womser-Hacker</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>General Terms</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Measurement, Performance, Experimentation</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cross-Language Information Retrieval</institution>
          ,
          <addr-line>Evaluation</addr-line>
          ,
          <country>Geographic Information Retrieval Systems</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Information Science, University of Hildesheim</institution>
          ,
          <addr-line>Marienburger Platz 22 D-31141 Hildesheim</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The participation of the University of Hildesheim focused on the monolingual German and English tasks of GeoCLEF 2007. Based on the results of GeoCLEF 2005 and GeoCLEF 2006, the weighting and expansion of geographic named entities (NE) and Blind Relevance Feedback were combined. This year an improved model for German Named Entity Recognition was evaluated.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1 Introduction
2 Geographic Retrieval System
The system we augmented for this experimentation with (geographic) NEs in GIR is based on a retrieval system
applied to ad-hoc retrieval in previous CLEF campaigns [
        <xref ref-type="bibr" rid="ref2">Gey et al. 2007</xref>
        ]. Apache Lucene2 is the backbone
system for stemming, indexing and searching.
      </p>
      <p>Named Entity Recognition was carried out with the open source machine learning tool LingPipe3, which
identifies named entities and classifies them into the categories Person, Organization, Location and
Miscellaneous according to a trained statistical model.</p>
    </sec>
    <sec id="sec-2">
      <title>HiMoDeBase</title>
      <p>HiMoDeNe2
HiMoDeNe2Na
HiMoDeNe3
HiMoEnBase
HiMoEnNe
HiMoEnNaNe
HiMoEnNe2</p>
    </sec>
    <sec id="sec-3">
      <title>HiMoDeBase</title>
      <p>HiMoDeNe1
HiMoDeNe2
HiMoDeNe2Na
HiMoDeNe3
HiMoEnBase
HiMoEnNe
HiMoEnNaNe
HiMoEnNe2
German
German
German
German
English
English
English
English</p>
    </sec>
    <sec id="sec-4">
      <title>German</title>
      <p>German
German
German
German
English
English
English
English
x
x
x
x
After experimentation with the GeoCLEF data of 2006 we submitted runs differing in parameters and query
processing steps.</p>
      <p>Run descriptions and results measured as Mean Average Precision (MAP) are shown in Table 1 for submitted
monolingual runs and in Table 2 for the corresponding results with the training topics of 2006.
With the training topics of 2006 best results were made expanding the query with 40 geographic terms from the
best 30 documents giving each a relative weight of 0.2 compared to the rest of the query (for German) and using
20 terms from top5 documents with a relative weight of 0.5 for English (Table 2). While in the case of the
English topics this hold true for the submitted runs, for German topics the base run without NER performed best
(Table 1).</p>
      <p>The worse results for the English topics indicate more difficult topics (concerning our retrieval system) for 2007.
With the German results remaining on almost the same level, the optimised NER-model for German seems to
improve retrieval quality.</p>
      <p>Summing up, we could not find a substantial positive impact of additional geographic information, but the effect
of investment in optimizing the Geo-NE model seems to be positive.
5</p>
      <p>Conclusion and Outlook
Optimised Geo-NE models seem to have positive effect on retrieval quality for monolingual tasks. For future
experiments, we intend to integrate geographic ontologies to expand entities with neighbouring places, villages
and regions. Furthermore we will integrate Wikipedia as translation tool for Geo-NEs to participate in
multilinual tasks of GeoCLEF in the future.</p>
      <p>0.2-30-40 (nm)
0.15-30-60 (nm)
1.0-10-4 (nm)</p>
      <p>0.5-5-20
0.5-5-20
2-10-3</p>
      <p>0.2-30-40 (om)
0.2-30-40 (nm)
0.15-30-60 (nm)
1.0-10-4 (nm)</p>
      <p>0.5-5-20
0.5-5-20
2-10-3</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Bischoff</surname>
          </string-name>
          , Kerstin; Mandl, Thomas; Kölle, Ralph; Womser-Hacker,
          <source>Christa</source>
          (
          <year>2007</year>
          )
          <article-title>: Geographische Bedingungen im Information Retrieval: Neue Ansätze in Systementwicklung und Evaluierung</article-title>
          . In: Oßwald, Achim; Stempfhuber, Maximilian; Wolff,
          <string-name>
            <surname>Christian</surname>
          </string-name>
          (Hrsg.): Open Innovation - neue
          <source>Perspektiven im Kontext von Information und Wissen? Proc 10. Internationales Symposium für Informationswissenschaft (ISI</source>
          <year>2007</year>
          )
          <article-title>30</article-title>
          .
          <string-name>
            <surname>Köln</surname>
          </string-name>
          Mai - 1. Juni 2007. Konstanz: Universitätsverlag [Schriften zur Informationswissenschaft 46] pp.
          <fpage>15</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Gey</surname>
          </string-name>
          , Fredric; Larson, Ray; Sanderson, Mark; Bishoff, Kerstin; Mandl, Thomas; Womser-Hacker, Christa; Santos, Diana; Rocha, Paulo; Di Nunzio, Giorgio; Ferro,
          <string-name>
            <surname>Nicola</surname>
          </string-name>
          (
          <year>2007</year>
          ):
          <article-title>GeoCLEF 2006: the CLEF 2006 Cross-Language Geographic Information Retrieval Track Overview</article-title>
          . In: Peters, Carol et al. (Eds.).
          <source>7th Workshop of the Cross-Language Evaluation Forum</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2006</year>
          , Alicante, Spain,
          <source>Revised Selected Papers</source>
          . Berlin et al.:
          <source>Springer [Lecture Notes in Computer Science</source>
          <volume>4730</volume>
          ] pp.
          <fpage>852</fpage>
          -
          <lpage>876</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>