<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">University of Hagen at GeoCLEF 2005: Using Semantic Networks for Interpreting Geographical Queries</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Johannes</forename><surname>Leveling</surname></persName>
							<email>johannes.leveling@fernuni-hagen.de</email>
							<affiliation key="aff0">
								<orgName type="department">Intelligent Information and Communication Systems (IICS</orgName>
								<orgName type="institution">University of Hagen (FernUniversität in Hagen</orgName>
								<address>
									<postCode>58084</postCode>
									<settlement>Hagen</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sven</forename><surname>Hartrumpf</surname></persName>
							<email>sven.hartrumpf@fernuni-hagen.de</email>
							<affiliation key="aff0">
								<orgName type="department">Intelligent Information and Communication Systems (IICS</orgName>
								<orgName type="institution">University of Hagen (FernUniversität in Hagen</orgName>
								<address>
									<postCode>58084</postCode>
									<settlement>Hagen</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Dirk</forename><surname>Veiel</surname></persName>
							<email>dirk.veiel@fernuni-hagen.de</email>
							<affiliation key="aff0">
								<orgName type="department">Intelligent Information and Communication Systems (IICS</orgName>
								<orgName type="institution">University of Hagen (FernUniversität in Hagen</orgName>
								<address>
									<postCode>58084</postCode>
									<settlement>Hagen</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">University of Hagen at GeoCLEF 2005: Using Semantic Networks for Interpreting Geographical Queries</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">E5312EFFB1867D012424A58FB4188E29</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T00:38+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing-Indexing methods</term>
					<term>Linguistic processing</term>
					<term>H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval-Query formulation</term>
					<term>Search process</term>
					<term>H.3.4 [Information Storage and Retrieval]: Systems and Software-Performance evaluation (efficiency and effectiveness)</term>
					<term>I.2.4 [Artificial Intelligence]: Knowledge Representation Formalisms and Methods-Semantic networks Measurement, Performance, Experimentation Geographic information retrieval, Query expansion</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The IICS group at the University of Hagen employs multilayered extended semantic networks for the representation of background knowledge, queries, and documents for geographic information retrieval (GIR). This paper describes our work for the participation at the GeoCLEF task of the CLEF 2005 evaluation campaign (Cross Language Evaluation Forum).</p><p>In our approach, geographical concepts from the query network are expanded with concepts which are semantically connected via topological, directional, and proximity relations. We started with an existing geographical knowledge base represented as a large semantic network and expanded it with concepts automatically extracted from the GEOnet Names Server (GNS). Furthermore, we created concept hypotheses by adding a prefix with regular semantics, for example "Süd"/'South' and "Zentral"/'Central', and integrated the corresponding semantic relations into our geographical knowledge base.</p><p>Several experiments for GIR on German documents have been performed: a baseline corresponding to a traditional information retrieval approach; a variant expanding thematic, temporal, and geographic descriptors from the semantic network representation of the query; and an adaptation of a question answering (QA) algorithm based on semantic networks.</p><p>The second experiment is based on a representation of the natural language description of a topic as a semantic network, which is achieved by a deep linguistic analysis. The semantic network is transformed into an intermediate representation of a database query explicitly representing thematic, temporal, and local restrictions. This experiment showed the best performance with respect to mean average precision (MAP): 10.53 percent using the topic title and description or 10.22 percent using title, description, and additional location information.</p><p>The third experiment, adapting a QA algorithm, uses a modified version of the QA system InSicht. The system matches deep semantic representations (semantic networks) of queries or their equivalent or similar variants to semantic networks for document sentences. Since this approach was too much oriented towards precision, partitioning a query network was allowed when certain graph topologies exist. For example, local specifications can be split off, so that they can be matched in other sentences of the document under investigation. The geographical knowledge base developed for the other experiments improved the results of this approach, too. To keep answer time low and main memory consumption acceptable, some parameters of the InSicht system had to be adjusted.</p><p>In conclusion, we provide a basic architecture for further experiments in geographic information retrieval based on semantic networks. Future research aims at improving the named entity recognition for toponyms, connecting semantic networks and databases, expanding our geographical knowledge base, and investigating the role of semantic relations in geographic queries.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Geographical Information Retrieval (GIR) is concerned with the retrieval of documents involving the interpretation of geographical knowledge by means of topological, directional, and proximity information. Documents typically contain descriptions of events or static situations that are temporally and/or spatially restricted. For example, consider the phrases "the industrial development after World War II" and "the social security system outside of Scandinavia". Furthermore, many documents contain ambiguous geographic references. There are, for example, more than 30 cities named "Zell" in Germany, and any occurrence of this name in a document can have a different meaning and should be disambiguated from context. In addition, a toponym (a name for a geographic entity) can be referred to with names in different languages or local dialects, with historical names, etc., which will require normalization or translation to enable successful document retrieval. The latter problems are similar to the problems of polysemy and synonymy in traditional information retrieval <ref type="bibr">(IR)</ref>.</p><p>GeoCLEF is a task of the Cross Language Evaluation Forum offering scientific challenges in the interpretation of geographical information retrieval queries. The queries are targeted at existing CLEF document collections of news stories that include a variety of topics and geographical regions; for German, these are the news articles of "Der Spiegel", "Frankfurter Rundschau", and "Schweizer Depeschenagentur" from 1994 and 1995. The goal of the GeoCLEF task is to find all and only documents that are relevant to a given topic.</p><p>GeoCLEF topics include a short description (title and description), a longer narrative, and location elements consisting of a combination of thematic concepts (DE-concept), spatial relations (DE-spatialrelation), and place names (DE-location). For example, the short description of the first topic includes the natural language title "Haifischangriffe vor Australien und Kalifornien"/'Shark attacks off Australia and California'.</p><p>A traditional approach to GIR involves at least the following processing steps to identify geographical entities <ref type="bibr">(Jones et al., 2002)</ref>:</p><p>• named entity recognition (NER), including the detection of geographic names (tagging with named entities, including toponyms);</p><p>• collecting and integrating information from the contexts of named entities;</p><p>• disambiguation of named entities; and</p><p>• grounding the entities (i.e. connecting them to the model) and interpreting coordinates.</p><p>After identifying toponyms in queries and documents, coordinates can be assigned to them. In GIR, assigning a relevance score to a document for a given query typically involves calculating the distance between geographical entities in the query and the document and mapping it to a score. One of the major problems for GIR is the disambiguation of toponyms from semantic context and identifying spatial ambiguity (e.g. 'California' in 'Mexico' and/or 'California' in the 'United States of America'). Table <ref type="table" target="#tab_0">1</ref> shows a comparison of ambiguity in GermaNet<ref type="foot" target="#foot_0">1</ref>  <ref type="bibr" target="#b11">(Kunze and Wagner, 2001)</ref> and the GEOnet Names Server data (GNS, described in Section 2.2). As shown in the table, synonymy seems to be a lesser problem for German geographic names (1.08 synonyms per synset vs. 1.45 synonyms per synset in a lexical-semantic net for German), while the role of polysemy (word senses and disambiguation) becomes more important for GIR <ref type="bibr">(1.28 vs. 1.16)</ref>.</p><p>Problems that are less often identified and less investigated in research for GIR are:</p><p>• Toponyms in different languages. The translation of toponyms plays an important role even for monolingual retrieval when different and external information resources are integrated. In gazetteers, mostly English naming conventions are used.</p><p>• Name variants. The same geographic object can be referenced by endonymic names, exonymic names, and historical names. An endonym is a local name for a geographic entity, for example, "Wien", "Köln", and "Milano". An exonym is a place name in a certain language for a geographic object that lies outside the region where this language has an official status; for example, "Vienna" is the English exonym for "Wien", "Cologne" is the English exonym for "Köln", and "Mailand" is the German exonym for "Milano". Examples of historical names or traditional names are "New Amsterdam" for "New York" and "Colonia Claudia Ara Agrippinensium" or "Cöllen" for "Köln".</p><p>For GIR, name variants should be conflated.</p><p>• Composite names. Composite names or complex named entities consist of two or more words.</p><p>Frequently, appositions are considered to be a part of a name. For example, there is no need for the translation of the word "mount" in "Mount Cook" (the German translation is "Mount Cook"), but "Insel" is typically translated in the expression "Insel Sylt"/"island of Sylt". For named entity recognition, certain rules have to be established how composite names are normalized. In some composite names, two or more toponyms (geographic names) are employed in reference to a single entity, for example, "Haren (Ems)", "Frankfurt/Oder", or "Freiburg im Breisgau". While additional toponyms in a context allow for a better disambiguation, such composite names require a normalization, too.</p><p>• Semantic relations between toponyms and related concepts. In named entity recognition and GIR, semantic relations between toponyms and related concepts are often ignored. Concepts related to a toponym such as the language, inhabitants of a place, properties (adjectives), or phrases ("former Yugoslavia") are not considered in geographic tagging. For example, the toponym "Scotland" can be inferred for occurrences of "Scottish", "Scotsman", or "Scottish districts".</p><formula xml:id="formula_0">c26 d SUB region    FACT real QUANT one REFER det CARD 1    ATTR c c / / c27 na SUB name QUANT one CARD 1 VAL c s / / lateinamerika.0 fe c29 l    FACT real QUANT one REFER det CARD 1    *IN c s O O c16 d∨io SUB institution    FACT real QUANT one REFER det CARD 1    ATTCH s s ATTR c c / / c17 na SUB name QUANT one CARD 1 VAL c s c20 as∨io PRED menschenrecht FACT real QUANT mult REFER det LOC s s O O c8? ad∨d∨io PRED bericht FACT real QUANT mult REFER indet MCONT c r o o</formula><p>amnesty international.0 fe Figure <ref type="figure">1</ref>: Automatically generated semantic network for the description of GeoCLEF topic GC003 ("Finde Berichte von Amnesty International bezüglich der Menschenrechte in Lateinamerika.", 'Amnesty International reports on human rights in Latin America.'). The relations are explained in Table <ref type="table">2</ref>. Note that nodes representing proper names bear a .0 suffix, and subordinating edges (PRED, SUB) are folded below the node name. The imperative verb has already been removed.</p><p>• Temporal changes in toponyms. Geographic concepts undergo temporal changes. For example, the effect of wars (the geographic area of "Poland" during the last centuries) or treaties (e.g. "the EU" refers to a different region after the expansion of the European Union) change what a geographic name represents. This is an indication that temporal and spatial restrictions should not be discussed separately.</p><p>• Metonymic usage. Toponyms are used ambiguously. For example, "Libya" occurs in the news corpus as a reference to the "Libyan government" (as in "Libya stated that . . . "). Similarly, "British soil" is used as a reference to "Great Britain".</p><p>Currently, there is no practical solution for these problems and their investigation is a long-term issue for GIR. We concentrate on providing a basic architecture for geographic information retrieval with semantic networks, which will be refined later.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Interpreting Geographical Queries with Semantic Networks</head><p>The IICS group employs a syntactico-semantic parser (WOCADI parser -WOrd ClAss based DIsambiguating parser, <ref type="bibr" target="#b4">Hartrumpf (2003)</ref>) to obtain the representation of queries and documents as semantic networks according to the MultiNet paradigm <ref type="bibr" target="#b7">(Helbig, 2005)</ref>. This approach has been used in experiments for domain-specific IR <ref type="bibr" target="#b15">(Leveling and Hartrumpf, 2005)</ref> as well as in question answering <ref type="bibr" target="#b5">(Hartrumpf, 2005)</ref>. Aside from broadening the application domain of the MultiNet paradigm, its corresponding tools, and its applications, our work for the participation in the GeoCLEF task serves the following purposes:</p><p>1. To identify possible improvements for the NER component in the WOCADI parser.</p><p>2. To improve the connectivity between semantic networks and large resources of structured information (e.g. databases).</p><p>3. To create a larger set of geographical background knowledge by semi-automatic and automatic knowledge extraction from geographic resources. 4. To investigate the role of semantic relations and their interpretation for GIR.</p><p>We will discuss these points in the following subsections before presenting our results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Improving the NER</head><p>Currently, the NER for WOCADI is based on large lists of names including cities, countries, organizations, products, etc. These name lexica contain more than 230,000 proper names. This approach is suitable in a domain where proper names are known in advance or in a limited domain. In general, a method to dynamically identify proper nouns for a semantic analysis is needed. There is a machine learning module in preparation that creates a hypothesis for a word form while parsing a text with WOCADI.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Connecting Semantic Networks and Databases</head><p>Data from the GEOnet Names Server<ref type="foot" target="#foot_1">2</ref> (GNS<ref type="foot" target="#foot_2">3</ref> , containing approximately 4.0 million entities with 5.5 million names world-wide and 169,407 entities for Germany) was processed to provide a data set for a gazetteer database. The GNS is a valuable resource for geographical information, but the use of this multilingual gazetteer data has proved problematic in our setup, so far.</p><p>• Some data may not be present at all (e.g. "the Scottish Trossachs"), so that a geographic interpretation fails for some concepts.</p><p>• Geographic names may be present in the native language, in English, or both. For some concepts, there are no native language forms (the GNS data has an American English background). For example, "The North Sea" is present as a toponym in the database, but "Nordsee" is not. Similarly, there is no entry for "Schottland" in the GNS data, but the English spelling "Scotland" occurs. This means that the GNS data cannot be obtained (and subsequently used) for a non-English monolingual task without an additional translation phase.</p><p>• Geographic objects may cover several areas or may be adjacent to several regions (such as rivers traversing different countries, large forests, seas, or mountains, e.g. "Bodensee"/'Lake of Constance', "Rhein"/'Rhine', "Donau"/'Danube', or "Alpen"/'Alps').</p><p>• Relations or modifiers generate name variants which are not covered by a gazetteer ("Süddeutschland"/'South(ern) Germany', 'the southern part of Germany'), because they are subject to interpretation or do not have corresponding coordinates.</p><p>• The data representation may be inconsistent. For example, some rivers (streams) are represented by a set of coordinates (e.g. "Alter Rhein"), some are represented by a single coordinate (e.g. "Main") in the GNS data.</p><p>• The gazetteer does not provide sufficient information for a successful disambiguation from context (for example, temporal information is missing).</p><p>• The ontological basis of the GNS is incomplete. For example, church (CH), religious center (CTRR), monastery (MNSTY), mission (MSSN), temple (TMPL), and mosque (MSQE) are defined (among others) in the GNS data as classes of geographic entities that refer to sacral buildings. A cathedral (GeoCLEF topic GC012) is a sacral building as well but neither is there a corresponding geographic class defined nor is gazetteer data for cathedrals provided.</p><p>• The inflection of names is typically not covered in gazetteers. Many names have a special genitive form in German (and English), which the morphology component of the WOCADI parser can analyze. But there are more complicated cases, where parts of a complex name are inflected for grammatical case. For example, the river "(die) Schwarze Elster" has the genitive form "(der) Schwarzen Elster".</p><p>Because of these problems, we see the GNS data as a general source of information, which should be extended by domain-, language-, and application-specific knowledge. Gazetteers and derived knowledge bases share the same problems. Both are always incomplete (see <ref type="bibr" target="#b12">Leidner (2004)</ref> and <ref type="bibr" target="#b0">Fonseca et al. (2002)</ref> for a discussion of problems of gazetteer selection), data in both is not fine-grained or detailed enough for many tasks, and for both, entry points (valid search keys) for access must be known.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Expanding Geographical Background Knowledge</head><p>The IICS created and maintained a large semantic network as a geographical knowledge base for expanding geographical concepts. This knowledge base was automatically extended by generating hypotheses for new geographical concepts and integrating them into the semantic network.</p><p>For all concepts from the existing semantic network ontology, hypotheses are generated for meronymy relations. A hypothetical concept is created by concatenating some prefix with regular semantics in geography with the original concept. Typical examples of an implied meronymy are "Süd-"/'South', "Südost-"/'Southeast', "Zentral-"/'Central', and "Mittel-"/'Middle'. The occurrence frequency of a hypothesis is looked up in the index of base forms for the entire (annotated/tagged) news corpus. Hypothetical concepts with a frequency less than a given threshold (three occurrences) were rejected. The resulting relations were integrated into the semantic network containing the background knowledge. These concepts typically do not occur in gazetteers because they are vague and their interpretation depends on context.</p><p>The second approach to expanding the geographic background knowledge at the IICS involved automatically extracting concepts from a database consisting of the GNS data for Germany. The GNS data shares much information with other major resources and services involving geographical information, such as the Getty Thesaurus of Geographical names<ref type="foot" target="#foot_4">4</ref> (TGN, containing about 1.3 million names) or the Alexandria Digital Library project<ref type="foot" target="#foot_5">5</ref> (ADL Gazetteer, containing about 4.4 million entries). Therefore we concentrated on the GNS data.</p><p>For each GNS gazetteer entry, a set of geographic codes is provided which can be interpreted to form a geographic path to the geographical object. For example, the database entry for the city of "Wien"/'Vienna' contains information that the city is located in "Amerika oder Westeuropa"/'America or Western Europe', "Europa"/'Europe', " Österreich"/'Austria', and in the "Bundesland Wien" and that "Vienna" is a name variant of "Wien". This information is post-processed and transformed into a set of semantic relations. We extracted some 20,000 geographical relations for a subset of 27 geographic classes (out of 648 classes defined in the GNS) from the German data. Relations that can be inferred from transitivity or symmetry properties are not explicitly entered into the geographical knowledge base as they are dynamically generated in our experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">The Role of Semantic Relations in Geographical Queries</head><p>The MultiNet paradigm offers a rich repertoire of semantic relations and functions. Table <ref type="table">2</ref> shows and briefly describes the most important relations for representing topological, directional, or proximity information. Note that the length of a path of semantic relations between two related concepts may be used to calculate their (thematic or geographic) proximity or distance as well. Figure <ref type="figure" target="#fig_0">2</ref> shows an excerpt from our geographical knowledge base with MultiNet relations. For the moment, the interpretation of the semantic functions is limited because we do not yet use the assigned coordinates from the GNS data. Table <ref type="table">2</ref>: Overview of some important MultiNet relations and functions for the interpretation of geographic queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>MultiNet Relation Description</head><p>ASSOC(x, y) concepts associated with toponyms and properties corresponding to toponyms, (e.g. language, inhabitant, or adjective form) ATTCH(x, y) attachment between objects; y is attached to x ATTR(x, y) attribute of an object; y is an attribute of x CIRC(x, y) situational circumstance (semantically non-restrictive) CTXT(x, y) contextual restriction DIRCL(x, y), ORNT(x, y) direction and orientation of events and situations (e.g. "the flight to Berlin") EQU(x, y) name variants and equivalent names, including endonyms, exonyms, and historical names LOC(x, y) specifying locations; x takes place at y; x is located at y PARS(x, y) meronymy, holonymy (PART-OF); x is part of y PRED(x, y) predication; every z from set x is a y SUB(x, y) subordination (IS-A); x is a y SYNO(x, y) synonyms and near-synonyms  <ref type="bibr">(Hammer et al., 1995</ref><ref type="bibr">(Hammer et al., -2005))</ref>, which supports a standard relevance ranking (term weighting by tf-idf ). Queries (topics) are analyzed with WOCADI to obtain the semantic network representation. The semantic networks are transformed into a Database Independent Query Representation (DIQR) expression. For some experiments (FUHo10tdl and FUHo14tdl), the location elements consisting of the concept, spatial-relation, and place names of a topic are transformed into a corresponding DIQR expression as well. Additional concepts (including toponyms) are added to the query formulation by including semantically related concepts. This approach is described in more detail in <ref type="bibr" target="#b13">Leveling (2004)</ref>. The fifth experiment employs a modified question answering approach for GIR and is described in Section 4. The five experiments are characterized in the parameter columns of Table <ref type="table" target="#tab_1">3</ref>. It also shows the performance of our experiments with respect to mean average precision (MAP) and number of relevant and retrieved documents.</p><p>The experiments with a query expansion based on additional geographic knowledge show a higher performance than the traditional IR approach wrt. MAP (FUHo10td vs. FUHo14td and FUHo10tdl vs. FUHo14tdl). The performance of the experiment employing traditional IR and the experiment with query expansion may be increased by changing to a database supporting a OKAPI/BM25 search for the thematic parts of a query.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">GIR with Deep Sentence Parses</head><p>In addition to the runs described in Section 3, we experimented with an approach based on deep semantic analysis of documents and queries. We tried to turn the InSicht system normally used for question answering <ref type="bibr" target="#b5">(Hartrumpf, 2005</ref>) into a GIR system (here abbreviated as GIR-InSicht). To this end, the following modifications were tried:</p><p>1. generalizing the central matching algorithm, which is sentence-based, 2. adding geographical background knowledge, and 3. adjusting parameters for network variation scores and limits for generating query network variants.</p><p>These three areas are explained in more detail in the following paragraphs.</p><p>The base system, InSicht, matches semantic networks derived from a query parse (topic title or topic description 7 ) to document sentence networks one by one (whereas sentence boundaries are ignored in traditional IR). In GIR (as in IR), this approach yields high precision, but low recall because often the information contained in a query is distributed across several sentences in a document. To adjust the matching approach to such situations, the query network is split if certain graph topologies are encountered. The resulting query network parts are viewed as conjunctively connected. The query network can be split at the following semantic relations: CIRC, CTXT, LOC, TEMP. For example, the LOC edge in Figure <ref type="figure">1</ref> can be deleted leading to two separate semantic networks. One corresponds to "Bericht von Amnesty International über Menschenrechte" ('Amnesty International reports on human rights') and the other to "Lateinamerika" ('Latin America'). The greatest positive impact for GeoCLEF comes from splitting at LOC edges.</p><p>The geographical knowledge base described in Section 3 is scanned by GIR-InSicht; relations that contain names that do not occur in the document parse results (i.e. the semantic networks of document sentences) are ignored. For simplicity, meronymy edges (PARS) are treated like hyponymy edges so that GIR-InSicht can use all part-whole relations for concept variations (see <ref type="bibr" target="#b5">Hartrumpf (2005)</ref>) in query networks. Without the geographical knowledge base, recall is much lower. Some InSicht parameters had to be adjusted in order to yield more results in GIR-InSicht and/or to keep answer time and RAM consumption acceptable even when working with large background knowledge bases like the one mentioned in Section 2.3. The final steps of InSicht that come after a semantic network match has been found (answer generation and answer selection) can be skipped. Some minor adjustments further reduce run time without losing relevant documents. For example, an apposition for a named entity (like "Hauptstadt" ('capital') for "Sarajevo"/"Sarajewo") leads to a new query network variant only if this combination occurs at least 2 times in the document collection.</p><p>We evaluated about ten different setups of GIR-InSicht on the GeoCLEF 2005 topics. The setups used different extension combinations from the extensions described above and other extensions, like coreference resolution for documents. The performance differences were often marginal. In some cases, this indicates that a specific extension is irrelevant for GIR; in other cases, the number of topics (23 topics have relevant documents) and the number of relevant documents might be too small to draw any conclusions. However, one can see considerable performance improvements for some extensions, e.g. splitting query networks at LOC edges. Larger evaluations are needed to gain more insights about which development directions are most promising for the semantic network matching approach to GIR.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion and Outlook</head><p>The semantic network representation with MultiNet offers representational means useful for GIR. We successfully employed semantic networks to uniformly represent queries, documents, and geographical background knowledge and to connect to external resources like GNS data. Three different approaches have been investigated: a baseline corresponding to a traditional IR approach; a variant expanding thematic, temporal, and geographic descriptors from the MultiNet representation of the query; and an adaptation of InSicht, a QA algorithm based on semantic networks. The diversity of our approaches looks promising for a combined system.</p><p>Future work also includes completing the topics described in Section 2, namely improving the NER, connecting semantic networks and databases, expanding geographical background knowledge, and investigating the role of semantic relations in geographical queries. It remains to be investigated whether methods that are successful in traditional IR are equally successful for treating polysemy and synonymy for toponyms in GIR.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Excerpt from the MultiNet representation of a geographical knowledge base.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Overview of synonyms and word senses in GermaNet and GNS data for a selected subset of 169,407 geographical entities in Germany (DE). The data normalization consisted of removing all name variants introduced by the transcription of German umlauts (e.g. the name "Koln" is removed if it refers to the same entity as "Köln").</figDesc><table><row><cell>Characteristic</cell><cell></cell><cell>Resource</cell><cell></cell></row><row><cell></cell><cell cols="3">GermaNet GNS (DE, all) GNS (DE, normalized)</cell></row><row><cell>synsets total</cell><cell>41,777</cell><cell>95,993</cell><cell>95,993</cell></row><row><cell>synonyms in synsets</cell><cell>60,646</cell><cell>121,055</cell><cell>103,508</cell></row><row><cell>unique literals</cell><cell>52,251</cell><cell>94,187</cell><cell>80,808</cell></row><row><cell>synonyms per synset</cell><cell>1.45</cell><cell>1.26</cell><cell>1.08</cell></row><row><cell>word senses per literal</cell><cell>1.16</cell><cell>1.29</cell><cell>1.28</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3 :</head><label>3</label><figDesc>Overview of parameter settings and results for monolingual GeoCLEF experiments with the German document collection. The results displayed are the mean average precision (MAP) and the number of relevant and retrieved documents (rel ret) for a total of 785 documents assessed as relevant.</figDesc><table><row><cell>TEMP(x, y)</cell><cell>temporal specification</cell></row><row><cell>VAL(x, y)</cell><cell>value specification; y is value for attribute x</cell></row><row><cell>*IN(x, y)</cell><cell>semantic function; x is contained in y</cell></row><row><cell>*NEAR(x, y)</cell><cell>semantic function; x is close to y</cell></row><row><cell>*OUTSIDE(x, y)</cell><cell>semantic function; x is not inside y</cell></row><row><cell>*SOUTH OF(x, y)</cell><cell>semantic function; x is south of y</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.sfs.nphil.uni-tuebingen.de/lsd/Intro.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://earth-info.nga.mil/gns/html/cntry_files.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">The GEOnet Names Server contains data from the National Geospatial-Intelligence Agency and the U.S. Board on Geographic Names database</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3">Monolingual GeoCLEF Experiments (German -German)Currently, the GeoCLEF experiments follow our established setup for information retrieval tasks. The WOCADI parser is employed to analyze the newspaper and newswire articles, and concepts (or rather:</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_4">http://www.getty.edu/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_5">http://www.alexandria.ucsb.edu/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_6">http://www.indexdata.dk/zebra</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_7">GIR-InSicht combines the results for a query from the title field and for a query from the description field. All other topic fields are ignored. The information from the attributes DE-concept, DE-spatialrelation, and DE-location were equally well derived from the parse result of the title or description attribute.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Using ontologies for integrated geographic information systems</title>
		<author>
			<persName><forename type="first">Frederico</forename><forename type="middle">T</forename><surname>Fonseca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Max</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Peggy</forename><surname>Egenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Agouris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gilberto</forename><surname>Câmara</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Transactions in GIS</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">3</biblScope>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Sebastian</forename><surname>Hammer</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Adam</forename><surname>Dickmeiss</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Zebra -User&apos;s Guide and Reference</title>
		<author>
			<persName><forename type="first">Heikki</forename><surname>Levanto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Mike</forename><surname>Taylor</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1995">1995-2005</date>
			<pubPlace>Copenhagen</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Hybrid Disambiguation in Natural Language Analysis</title>
		<author>
			<persName><forename type="first">Sven</forename><surname>Hartrumpf</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2003">2003</date>
			<publisher>Der Andere Verlag</publisher>
			<pubPlace>Osnabrück, Germany</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Question answering using sentence parsing and semantic network matching</title>
		<author>
			<persName><forename type="first">Sven</forename><surname>Hartrumpf</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Multilingual Information Access for Text, Speech and Images: Results of the Fifth CLEF Evaluation Campaign</title>
				<editor>
			<persName><forename type="first">C</forename><surname>Peters</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Clough</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><forename type="middle">J F</forename><surname>Jones</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Gonzalo</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">M</forename><surname>Kluck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Magnini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Lecture Notes in Computer Science (LNCS</title>
		<imprint>
			<biblScope unit="volume">3491</biblScope>
			<biblScope unit="page" from="512" to="521" />
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Knowledge Representation and the Semantics of Natural Language</title>
		<author>
			<persName><forename type="first">Hermann</forename><surname>Helbig</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2005">2005</date>
			<publisher>Springer</publisher>
			<pubPlace>Berlin</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">B</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Purves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anne</forename><surname>Ruas</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Mark</forename><surname>Sanderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Monika</forename><surname>Sester</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Spatial information retrieval and geographical ontologies -an overview of the SPIRIT project</title>
		<author>
			<persName><forename type="first">J</forename><surname>Marc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Van Kreveld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Robert</forename><surname>Weibel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR 2002</title>
				<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="387" to="388" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Anwendungsperspektiven des GermaNet, eines lexikalischsemantischen Netzes für das Deutsche</title>
		<author>
			<persName><forename type="first">Claudia</forename><surname>Kunze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andreas</forename><surname>Wagner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Chancen und Perspektiven computergestützter Lexikographie</title>
		<title level="s">Lexicographica Series Maior</title>
		<editor>
			<persName><forename type="first">Ingrid</forename><forename type="middle">;</forename><surname>Lemberg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Bernhard</forename><surname>Schröder</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">;</forename></persName>
		</editor>
		<editor>
			<persName><forename type="first">Angelika</forename><surname>Storrer</surname></persName>
		</editor>
		<meeting><address><addrLine>Tübingen, Germany</addrLine></address></meeting>
		<imprint>
			<publisher>Niemeyer</publisher>
			<date type="published" when="2001">2001</date>
			<biblScope unit="volume">107</biblScope>
			<biblScope unit="page" from="229" to="246" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Towards a reference corpus for automatic toponym resolution evaluation</title>
		<author>
			<persName><forename type="first">Jochen</forename><forename type="middle">L</forename><surname>Leidner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Geographic Information Retrieval held at the 27th Annual International ACM SIGIR Conference (SIGIR 2004)</title>
				<meeting>the Workshop on Geographic Information Retrieval held at the 27th Annual International ACM SIGIR Conference (SIGIR 2004)<address><addrLine>Sheffield, UK</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">University of Hagen at CLEF 2003: Natural language access to the GIRT4 data</title>
		<author>
			<persName><forename type="first">Johannes</forename><surname>Leveling</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Comparative Evaluation of Multilingual Information Access Systems, 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003, Revised Selected Papers</title>
				<editor>
			<persName><forename type="first">Carol</forename><forename type="middle">;</forename><surname>Peters</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Julio</forename><surname>Gonzalo</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">Martin</forename><surname>Braschler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Kluck</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Lecture Notes in Computer Science (LNCS</title>
		<imprint>
			<biblScope unit="volume">3237</biblScope>
			<biblScope unit="page" from="412" to="424" />
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">University of Hagen at CLEF 2004: Indexing and translating concepts for the GIRT task</title>
		<author>
			<persName><forename type="first">Johannes</forename><surname>Leveling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sven</forename><surname>Hartrumpf</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Multilingual Information Access for Text, Speech and Images: Results of the Fifth CLEF Evaluation Campaign</title>
				<editor>
			<persName><forename type="first">C</forename><surname>Peters</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Clough</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J F</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gonzalo</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">M</forename><surname>Kluck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Magnini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Lecture Notes in Computer Science (LNCS</title>
		<imprint>
			<biblScope unit="volume">3491</biblScope>
			<biblScope unit="page" from="271" to="282" />
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
