<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Experiments on Robust NL Question Interpretation and Multi-layered Document Annotation for a Cross-Language Question/Answering System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gu¨nter Neumann</string-name>
          <email>neumann@dfki.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bogdan Sacaleanu</string-name>
          <email>bogdan@dfki.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>LT-Lab</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saarbru¨cken</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>This report describes the work done by the QA group of the Language Technology Lab at DFKI, for the 2004 edition of the Cross-Language Evaluation Forum (CLEF). Based on the experience we obtained through our participation at QA@Clef-2003 with our initial cross-lingual QA prototype system BiQue (cf. [NS03]), the focus of the system extension for this year's task was a) on robust NL question interpretation using advanced linguistic-based components, b) flexible interface strategies to IR-search engines, and c) on strategies for off-line annotation of the data collection, which support query-specific indexing and answer selection. The overall architecture of the extended system, as well as the results obtained in the CLEF-2004 Monolingual German and Bilingual German/English QA tracks will be presented and discussed throughout the paper.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The basic functionality of an open–domain cross–language question/answering (QA) system is
simple: given a Natural Language query in one language (say German) find answers for that
query in textual documents written in another language (say English). In contrast to a standard
cross-language IR system, the NL queries are usually well-formed NL–query clauses (instead of a
set of keywords), and the identified answers should be the exact answer string (instead of complete
documents containing the answer). Thus, for a question like “Welches Pseudonym nahm Norma
Jean Baker an?” (Which pseudonym did Norma Jean Baker use?) the answer should be “Marilyn
Monroe” rather than an English document containing this name. In contrast to QA@Clef-2003,
this year the task was made further difficult by demanding that only one exact answer should be
returned instead of a ranked list of (say three) answer candidates.</p>
      <p>Last year our group participated for the very first time in a QA competition. Since the focus
was more on system implementation than on system tuning, the main motto was ”participation
is everything”. However, we learned a lot and found several sources of potential improvements for
our initial system. Especially two aspects have drawn our attention.</p>
      <p>Firstly, the use of a statistical based chunk-parser turned out to be a major bottleneck for the
complete NL question processor. In our Clef–2003 system, we implemented a two-stage question
process: first we performed a shallow chunk analysis using a statistical based chunker (trained for
German as well as English) on which output we applied a manually written specialized question
grammar. The rules of this grammar represented direct relationships between relevant chunks and
their interpretation wrt. question and expected answer type. However it turned out that the error
rate of the first stage actually caused to much noisy input for the second stage, so that in many
cases we were not able to determine the expected answer type correctly. This was further effected
by the very low coverage of the manually specified question grammars, so that the whole question
processor actually performed quite poor. However, it is known that a high number of errors in
question answering can be attributed to errors in question analysis (cf. [MPHS02]). Furthermore,
since the Clef-2004 QA task required that only one exact answer should be returned, we were
convinced that it would be at least a good strategy to prefer a more deeper linguistic–based
question analysis strategy.</p>
      <p>Secondly, in the Clef-2003 system we applied a very simple strategy for determining relevant
paragraphs which are then used as starting points for determining possible answer candidates,
simply by directly using the SGML paragraph tags from the original corpus. Furthermore, the
IR-query language of the MG system actually turned out to be too inflexible so that we could not
take advantage of a preprocessing of the corpus wrt. different dimensions. Hence, we could only
perform a very basic word/stem–level oriented paragraph indexing.</p>
      <p>Based on these experiments, we decided to extent the Clef–2003 system to the following
directions:
• development of a robust NL question interpretation using sophisticated deeper
linguisticbased strategies,
• development of flexible “programmable” interface strategies to IR-search engines, and
• development of strategies for off-line annotation of the data collection, which support
queryspecific indexing and answer selection.</p>
      <p>We now start with an overview of the whole Clef–2004 system, and highlight some technical
aspects. Finally, we present and discuss the results we have obtained for the task.
2</p>
    </sec>
    <sec id="sec-2">
      <title>System overview</title>
      <sec id="sec-2-1">
        <title>1. the linguistic core engine</title>
      </sec>
      <sec id="sec-2-2">
        <title>2. the multi-dimensional index of the answer source corpus</title>
      </sec>
      <sec id="sec-2-3">
        <title>3. the robust NL query processor</title>
      </sec>
      <sec id="sec-2-4">
        <title>4. the information search component</title>
      </sec>
      <sec id="sec-2-5">
        <title>5. the answer processor</title>
        <p>The linguistic core engine consists of two major sub-components: a) LingPipe, which performs
NE and sentence boundary recognition, as well as NE co-reference resolution, and b) a
sentencebased syntax parser. This parser is currently only used for German NL question and document
analysis.</p>
        <p>For each corpus of the individual task (German and English), a multi-dimensional index
structure is computed off–line. This is done by first preprocessing the whole corpus with the LingPipe
component of the linguistic core engine, which basically adds named entities, sentence boundary,
NE–co-references and abbreviations to each document in form of XML–tags. For each specific
dimension a separate index structure is computed which can be accessed via the IR server (we are
using the Jakarta Lucene full–text search engine, see also sec. 4.1).1</p>
        <p>The major control flow for both QA tasks can now briefly be described as follows:</p>
        <sec id="sec-2-5-1">
          <title>1cf. http://jakarta.apache.org/lucene/docs/index.html</title>
          <p>Robust NL Question Analysis The main purpose of the NL question analysis in the context
of a QA-system is to determine the expected answer type, the set of relevant keywords, and the set
of recognized NE–instances in order to guide information search and answer processing. Consider,
for example, the NL–query result presented in figure 4, where the value of tag a-type represents
the expected answer type, s-ctr’s value represents the answer control strategy, and the value of
scope represents additional constraints for the search space (for more details, see sec. 4.2).
NL Question Refinement Refinement of the result of the NL–query covers the translation of
the NL–query and its expansion. The cross-language aspect of the system has been approached by
using machine translation engines for query translation (along the line of the approach described
in [NS03]). We have selected a number of 8 translation services (7 online + 1 offline) in order to
account for a better lexical coverage for the translated queries. The results of translation have
been linguistically processed, annotated with named entities and merged into a translation object
consisting of named entity instances and keywords (open class words which were not parts of
named entities). The question analysis has yielded a similar structure for the original question
plus additional information about expected answer type and scope. By using a dictionary-based
alignment technique this additional information has been transferred to the translation object.
The same happened with the named entities of types PERSON, DATE (year instances) and
NUMBER, which should remain unchanged through translation. As for the remaining types
of named entities (LOCATION, ORGANIZATION and DATE without year instances), which
might have different lexical representations in source and target language, the following heuristic
applied: if the original string and its translation were different (e.g., “Europ¨aische Gemeinschaft”
vs. “European Union”) they were regarded as unreliable, added as keywords and discarded from
named entities. The distinction made between named entities and keywords along the question
analysis process will be used later on in constructing the IR-query and defining search strategies.</p>
          <p>In contrast to our previous system, we wanted to implement and test question expansion
methods based on natural language generation (NLG), instead of using WordNet (in a latter
system, we will combine these methods, of course). The main reason for doing this, is the fact,
that the NL–query analysis actually normalizes all words to their corresponding lemmas. On the
other side, the morphological component of our German parsing system SMES (cf. section 3.2) is
reversible, i.e., can also be used for the generation of word forms. Of course, one could directly
use the word forms of the input query (accessible via indices).2 However, generating all plausible
word forms directly from the input query actually would perform a controlled morpho-syntactic
query expansion. Thus, in the case of the monolingual German task, for all relevant lemmas of the
NL–query analysis (these are basically belonging to the open-class words), we generate all word
forms which are consistent with the feature description of the syntax analysis of the parsing result.
In other words, this means that the parsing output directly controls the generation input. For
example for the lemma geben (to give) we are generating the word forms gaben, gab, gegeben,
gibt.3
Information Search In order to perform the information search, the (possibly refined) NL–
query has to be mapped to a concrete IR-query. Most today’s information search engines come
with expressible IR-query interfaces, which support a flexible user-driven filtering of the index
space. In order to take advantage of this rich parameterization and to support the use of multiple
IR-engines in the future, we actually perform the mapping from a NL-query to a IR-query in two
steps (cf. also figure 3):</p>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>1. construction of a IR-query schema</title>
        <p>2We have to submit word forms instead of lemmas to Lucene, because it applies its own stemmer on each
IR–query term.</p>
        <p>3Note that our method also allows to specify additional constraints for the generation process, e.g., that only
word forms of a certain tempus should (not) be generated. In this way, a more sensitive morpho-syntactic-based
control of query expansion is possible.</p>
      </sec>
      <sec id="sec-2-7">
        <title>2. construction of a IR-query</title>
        <p>An IR-query schema is actually an under-specified representation of an IR-query. It is
constructed directly from the NL–query result. Although it contains all relevant information from the
NL–query, this information is under-specified, because it still lacks the use of IR-specific syntax
(e.g., the ’+’–prefix for necessary terms) and a specification of logical connectives. The main task
of the IR–query construction component is to create a concrete IR–query from this schematic
description, which directly can be feed into a IR–engine (in our current QA system, we are using the
Jakarta Lucene full text engine. However, it would also be possible to create, say a Google-specific
expression from it using a different script). Which mapping to perform is expressed in form of
search strategies which are activated on basis of concrete values of the question tags of NL–query.
Furthermore, based on input from the answer validation component (through feedback loops), the
component might use different sorts of logical connectives resulting in different IR–queries, e.g., in
figure 3 two alternatives are displayed: a strict IR–query (using only logical AND), and a relaxed
IR–query (using only OR).</p>
        <p>Performing the IR–query construction process in the way just described allows us to selectively
make use of different indexing structures.</p>
        <p>
          Answer Processing The result of the information search is a set of N indexes (currently N=10),
where each index is a pointer to a single sentence of an annotated document. Each sentence is
tagged with all NE–instances recognized by LingPipe during the document preprocessing phase.
Note that, because we do indexing on a sentence level
          <xref ref-type="bibr" rid="ref5">(and not on a paragraph level as done in our
Clef-2003 system)</xref>
          , we actually can take advantage of cross–document sentence–level redundancies.
        </p>
        <p>Next, all NE–instances which are type compatible with the expected answer type of the question
are selected as possible answer candidate. All identified exact answer candidates are stored in a
global list together with its frequency counts (determined on basis of the selected N–sentences).
During that step a similarity function is applied on the NE–instances in order to identify variants
of the same name. Note that this means that the quality of the answer extraction step currently
depends directly on the quality of the used NE–recognition system.</p>
        <p>By default, we do the information search with a strict IE–query. If in this case, no sentence can
be retrieved or no answer can be extracted, we perform a new information search using a relaxed
IR–query and re-call the answer processing component.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Linguistic Core Engine</title>
      <p>Our linguistic core engines consists of two major components which we describe briefly in the next
two subsections.
3.1</p>
      <p>LingPipe
LingPipe, which is a software package from Alias–i, consists of several language processing modules:
a statistical named entity recognizer, a heuristic sentence splitter, and a heuristic within-document
co-reference resolution system.4</p>
      <p>LingPipe comes with a English language model. The types of NE covered by LingPipe are
locations, persons and organizations. We have retrained LingPipe, both for English and German,
so as to cover two more named entities types: DATE and NUMBER (just for German available)
and extended the co-reference resolution algorithm to count for German pronouns as well. A large
Gazetteer of named entity instances has been used for both languages and for English a PERSON
Gazetteer with gender attributes has been integrated for a better co-reference resolution.
3.2</p>
      <p>SMES
SMES is a robust wide-coverage unification-based system for the parsing of German texts (cf.
[NBB+97, NP02]).5 It produces a partial analysis of natural language texts by combining
shallow processing techniques (i.e., finite state regular expression recognizers) with generic linguistic
resources (e.g., subcategorisation, morphology, online compound analysis). In contrast to the
common approach of deep grammatical processing, where the goal is to find all possible
readings of a syntactic expression, we provide a complete but underspecified representation, by only
computing a general coarse-grained syntactic structure which can be thought of as domain
independent. This rough syntactic analysis can be then made more precise by taking into account
domain-specific knowledge. Our parser recognizes basic syntactic units and grammatical relations
(e.g., subject/object) robustly by using relatively underspecified feature structures, by postponing
attachment decisions and by introducing a small number of heuristics.</p>
      <p>Originally, SMES was developed as an Information Extraction core system, however we now
have SMES extended substantially for its use as a core-engine in textual question answering
systems. Major extensions of SMES concern the development of the robust interpretation of NL
questions (see sec. 4) and the development of a distributed representation for the dependency
structures, which we will describe now in more detail.</p>
      <p>Distributed representation In the original SMES system, the analysis of a sentence is
represented in form of a possibly recursive dependency tree where each node and edge is decorated
with rich feature information. During the development of our Clef–2004 system it turned out
that the nested parse trees (which can be very huge for very long sentences) are unsuited as a
generic interface, because they do not support a flexible and efficient access to relevant linguistic
information. Furthermore, a nested representation cannot easily be enriched with additional
linguistic structure, e.g., additional grammatical functions or deeper attachment, scopus etc. The
same is true for a selective, local integration of domain–specific information (e.g., to perform a</p>
      <sec id="sec-3-1">
        <title>4LingPipe is available at http://www.alias-i.com/lingpipe/</title>
        <p>5SMES is available at http://www.dfki.de/˜neumann/pd-smes/pd-smes.html. SMES has been used in a number
of third–party projects, and has extensively been evaluated.
sort of concept–spotting on basis of domain-independent syntactic normalization of relevant text
fragments).</p>
        <p>For that reason, we re–represent dependency trees in form of a distributed representation,
adapting the approach of [Mil00]. A distributed representation provides the robustness of a
bag–of–object approach with the ability to use higher level relational information where this can
provide a more accurate analysis. Thus distributed representations are more flexible wrt. the
integration of shallow and deep linguistic analysis, and the integrating domain knowledge. In
our distributed representation, we explicitly separate the representation of linguistic entities like
words/chunks/named entities (the ”bag–of–objects” or BaseObjects) from their structural
relationship like head/modifier/topology/grammatical functions (the ”bag–of–links” or LinkObjects).
Both layers are connected through indices which allow a simple bidirectional traversal between
the different object types. Linguistic and application specific extension can then be described
as operations (typing, re-organization of attachment) applied on LinkObjects. Actually, this is
how the strategies of the semantic NL–query interpretation are implemented for determining the
expected answer type and question scope (cf. sec. 4.2). It is also basis for the specification of a
flexible similarity function applied on two distributed dependency trees.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Some more details</title>
      <p>Multi-layered Document Annotation
The current annotation analysis performed on the document collection consisted in sentence
boundary identification, named entities annotation and co-reference resolution of named
entities and personal pronouns. A heuristic syntax–based algorithm for identifying abbreviations and
their possible extensions was run over the collection as well, resulting in an annotation similar to
that of named entities.</p>
      <p>Throughout the document processing part of the system we have insisted on a systematic
analysis of named entities and a reduction of the amount of information necessary to answer a
question. Based on experiments and results with the question set of the previous competition, we
have confined the information amount to sentence level and added named entity and abbreviation
types, along words, as basic units of information in the indexing process. By doing this, we could
query the IR component not only by keywords extracted from the questions, but also by NE types
corresponding to their expected answer types. An example would make this clear: for the question
Where did John Lennon die? beside creating an IR–query containing the keywords: {+‘‘John
Lennon’’, +die}, we could supply also the expected answer type LOCATION querying an additional
field neTypes: {+text:‘‘John Lennon’’, +text:die +neTypes:LOCATION}. This will not only
narrow the amount of data being analyzed for answer extraction, but will also guarantee existence
of an answer candidate.
4.2</p>
      <p>Robust NL question analysis
In context of a QA system or information search in general, we interpret the result of a NL
question analysis as declarative description of search strategy and control information. Consider,
for example, the NL question result presented in figure 3, where the value of tag a-type represents
the expected answer type, s-ctr the answer control strategy, and scope additional constraints
for the search space. Parts of the information can already be determined on simple local
lexicosyntactic criteria (e.g., for the Wh-phrase where we known that the expected answer type is
location), however in most cases we have to consider larger syntactic units in combination with
information extracted from external knowledge sources. For example for a definition question
like What is DFKI/a battery?, we have to combine syntactic and type information from the verb
and the relevant NP (e.g., consider definite/indefinite NPs together with certain auxiliary verb
forms) in order to distinguish it from a description question like What is the name of the German
Chancellor?.
&lt;IOOBJ msg=’quest’ s-ctr=’C-DESCRIPTION’ q-weight=’1.0’&gt;
&lt;A-TYPE&gt;NUMBER&lt;/A-TYPE&gt;
&lt;SCOPE&gt;analphabet&lt;/SCOPE&gt;
&lt;BRELS&gt;
&lt;BREL rel=’GOV’ level=’0’&gt;
&lt;ARG1 pos=’V’&gt;geb&lt;/ARG1&gt;
&lt;ARG2 pos=’N’&gt;analphabet&lt;/ARG2&gt;
&lt;/BREL&gt;
&lt;BREL rel=’GOV’ level=’0’&gt;
&lt;ARG1 pos=’V’&gt;geb&lt;/ARG1&gt;
&lt;ARG2 pos=’N’&gt;es&lt;/ARG2&gt;
&lt;/BREL&gt;
&lt;BREL rel=’GOV’ level=’0’&gt;
&lt;ARG1 pos=’V’&gt;geb&lt;/ARG1&gt;
&lt;ARG2 pos=’P’&gt;auf&lt;/ARG2&gt;
&lt;/BREL&gt;
&lt;BREL rel=’GOV’ level=’1’&gt;
&lt;ARG1 pos=’P’&gt;auf&lt;/ARG1&gt;
&lt;ARG2 pos=’N’&gt;welt&lt;/ARG2&gt;
&lt;/BREL&gt;
&lt;BREL rel=’GOV’ level=’0’&gt;
&lt;ARG1 pos=’N’&gt;analphabet&lt;/ARG1&gt;
&lt;ARG2 pos=’WP’&gt;wieviel&lt;/ARG2&gt;
&lt;/BREL&gt;
&lt;/BRELS&gt;
&lt;PRELS&gt;
&lt;PREL rel=’GOV’ level=’1’&gt;
&lt;ARG1 pos=’N’&gt;welt&lt;/ARG1&gt;
&lt;ARG2 pos=’QUANT’&gt;d-det&lt;/ARG2&gt;
&lt;/PREL&gt;
&lt;/PRELS&gt;
&lt;KWS&gt;
&lt;KW type=’UNIQUE’&gt;
&lt;TK pos=’V’&gt;geb&lt;/TK&gt;
&lt;/KW&gt;
&lt;KW type=’UNIQUE’&gt;
&lt;TK pos=’N’&gt;analphabet&lt;/TK&gt;
&lt;/KW&gt;
&lt;KW type=’UNIQUE’&gt;
&lt;TK pos=’N’&gt;welt&lt;/TK&gt;
&lt;/KW&gt;
&lt;/KWS&gt;
&lt;NEL/&gt;
&lt;NETS/&gt;
&lt;/IOOBJ&gt;</p>
      <p>In our system, we are doing this by following a two-step parsing schema, where in a first
step a full syntactic analysis is performed (cf. sec. 3.2), and in a second step a question–specific
semantic analysis. During the second step, the values for the question tags a-type, scope and
s-ctr are determined on basis of syntactic constraints applied on relevant NP and VP phrases,
and by taking into account information from two small knowledge bases, see also figure 2. They
basically perform a mapping from linguistic entities to values of the questions tags, e.g., trigger
phrases like name of, type of, abbreviation of or lexical elements to expected answer types, like
town, person, president. Note that in case of the German language, we perform a sort of fuzzy
match to the knowledge bases taking into account on–line compound analysis and string–similarity
tests. For example, assuming the lexical mapping Stadt=⇒LOCATION for the lexeme town, then
automatically we will also map the nominal compounds Hauptstadt (capital), Großstadt (large
city) to the a-type LOCATION.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Results and Discussion</title>
      <p>We have submitted two runs. One for the monolingual German task, and one for the bilingual
German/English task. The results are as follows:</p>
      <p>Track
DE-DE
DE-EN
#Answ
197
200
#T
50
47
#F
143
151
#Inexact
1
0
#Unsup.</p>
      <p>3
2</p>
      <p>Overall Acc
25.3%
23.5%</p>
      <p>Fact. Acc.</p>
      <p>28.25%
23.8%</p>
      <p>Def. Acc.</p>
      <p>0
20%</p>
      <p>NIL prec.</p>
      <p>13.6%
10.79%</p>
      <p>Compared to our results obtained at QA@Clef2003 this is a good improvement because the
tasks were more difficult and because we could use nearly the same system for both, the bilingual
track as well as the monolingual track. We will now discuss the results for the two individual
tasks, comparing them where possible.</p>
      <p>In both cases, we only considered answers which directly where recognized as NE instances,
i.e., for all questions which would refer to more general noun phrases or to NE types and instances
LingPipe did not recognize, we did not identify any answer candidates. Note that although in
both tasks we were able to properly analyze all definition questions as such, in our current system
we only determine possible answer candidates for abbreviation based questions (by way: not
such questions were found in the German test set, which explains, why we did not recognize any
definition question). The fact, that we did not answer definition question (modulo abbreviation)
correlates with our restrictions to only consider NE instances as answer candidates.</p>
      <p>As previously mentioned, both the monolingual and bilingual task have shared the same
QAframework, which was presented above. Nevertheless, there were task specific system
configurations, resulting in different retrieval and answer extraction methods, which will shortly be
mentioned in the following lines.</p>
      <p>Monolingual Task For the German monolingual task we were able to have the system recognize
named entity instances of type NUMBER, as result of training LingPipe on a German corpus with
a larger coverage of named entity types than its English counterpart. Even though the indexing
method was similar for both tasks, the monolingual task did not make any use of the named entity
type field (neType) during information search.</p>
      <p>Bilingual Task No questions with a MEASURE expected answer type were considered, because
the bilingual settings were not able to identify named entity instances of type NUMBER. The
system used a similarity function, which compared to the monolingual task, resembles a co–reference
algorithm by identifying answers mentioning the same NE instance in the answer candidate set
(e.g., “Bill Clinton” and “Clinton” will count as two references to the same person).
The work presented in this paper has been funded by the BMBF project Quetal, FKZ 01 IW C02.
Many thanks to Jumamurat Bayjanov and Olga Goldmann for their implementation support.
[Mil00]</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>In Proceedings of the ACL-2000</source>
          , pages
          <fpage>133</fpage>
          -
          <lpage>141</lpage>
          ,
          <string-name>
            <given-names>Hong</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [MPHS02]
          <string-name>
            <given-names>Dan</given-names>
            <surname>Moldovan</surname>
          </string-name>
          , Marius Pasca, Sanda Harabagiu, and
          <string-name>
            <given-names>Mihai</given-names>
            <surname>Surdeanu</surname>
          </string-name>
          .
          <article-title>Performance issues and error analysis in an open-domain question answering system</article-title>
          .
          <source>In Proceedings of the ACL-2002</source>
          , pages
          <fpage>33</fpage>
          -
          <lpage>40</lpage>
          , Philadelphia,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [NBB+97]
          <string-name>
            <given-names>G.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Backofen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Baur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Becker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Braun</surname>
          </string-name>
          .
          <article-title>An information extraction core system for real world german text processing</article-title>
          .
          <source>In ANLP 97</source>
          , pages
          <fpage>208</fpage>
          -
          <lpage>215</lpage>
          , Washington, USA, March
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [NP02] [NS03]
          <string-name>
            <given-names>G.</given-names>
            <surname>Neumann</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Piskorksi</surname>
          </string-name>
          .
          <article-title>A shallow text processing core engine</article-title>
          .
          <source>Journal of Computational Intelligence</source>
          ,
          <volume>18</volume>
          (
          <issue>3</issue>
          ):
          <fpage>451</fpage>
          -
          <lpage>476</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>Gu¨nter Neumann and Bogdan Sacaleanu. A cross-language question/answering-system for german and english</article-title>
          .
          <source>In proceedings of the CLEF 2003 working notes of the QA@CLEF</source>
          , Trondheim,
          <year>August</year>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>