<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cross-Language Question Answering at the University of Helsinki</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lili Aunimo</string-name>
          <email>aunimojrkuuskosjjamakkon@cs.helsinki</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Reeta Kuuskoski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juha Makkonen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Our Results at</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>P.</institution>
          <addr-line>O. Box 68, FIN-00014 UNIVERSITY OF HELSINKI</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2004</year>
      </pub-date>
      <fpage>2</fpage>
      <lpage>11</lpage>
      <abstract>
        <p>Tikka is a cross-language question answering system developed at the University of Helsinki for the purposes of the QA@CLEF 2004 evaluation campaign, Tikka was con¯gured to answer Finnish questions using a text corpus in English, but it is designed so that it can be con¯gured to work with any other languages as well. Tikka is the ¯rst general domain question answering system ever reported to have used Finnish. The question type classi¯er, the translator, the answer extractor and the answer scorer are the components of Tikka that are especially developed for question answering.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>original question as the basis of processing for as long as possible, because when translation is
performed, the information content of the question is almost always altered.</p>
      <p>In the following chapter we will descibe the overall architecture of our QA system. After that
each of the main components of the system are described in detail. Section 3 descibes the processing
of questions, that is, question classi¯cation and translation. In Section 4, the information retrieval
component of our system is detailed. Answer processing, which consists of answer extraction
pattern creation and instantiation and of answer selection and scoring, is described in Section 5.
Section 6 is about evaluation and it presents our o±cial results at QA@CLEF 2004. It also contains
some discussion on the e®ects of translation on the overall performance of a QA system. Finally,
Section 7 concludes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>System Architecture</title>
      <p>The name of our QA system is Tikka (Woodpecker). It has three modules: Question Processor,
Information retrieval (IR) Engine and Answer Processor. A system architecture is shown in Fig. 1.
The ¯gure also shows the con¯guration of Tikka for Finnish-English QA and for the document
database used in the QA@CLEF evaluation initiative. The Question and Answer Processors are
the modules which are especially developed for QA. The IR Engine, which is described in more
detail in section 4, is a standard search engine. The Question Processor, which is described in
section 3, ¯rst produces a syntactic parse of the question, then it classi¯es the question and
¯nally it translates the relevant terms of the question. The Answer Processor ¯rst instantiates the
answer extraction pattern prototypes with the translated words of the question. Then it applies
the patterns to the documents retrieved by the IR Engine and ¯nally it selects the best answer
among the candidates extracted and gives it a con¯dence value. The Answer Processing module
is described in detail in section 5.</p>
      <p>When Tikka was used for the Finnish-English experiments of QA@CLEF, its document database
consisted of 670 megabytes of newspaper text (The Glasgow Herald from 1995 and Los Angeles
Times from 1994). Other external knowledge sources that the system used were the MOT
dictionary software from Kielikone Ltd. 3, the functional dependency grammar parser from Connexor
Ltd. 4 and a Country and Capital Translation Database extracted from the web site of Statistics
Finland 5.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Question Processing</title>
      <sec id="sec-3-1">
        <title>Question Classi¯er for Finnish</title>
        <p>
          The question processing commences by determination of the question type. The possible types
were de¯ned already in Multisix corpus [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]: date, location, measure, object, organization, other, and
person. In addition, CLEF 2004 introduced new two types: manner (answering how-questions),
abstraction and de¯nition. The last type was tagged in the evaluation corpus, and thus the number
of types to be recognized was ten.
        </p>
        <p>Obviously, the question type can often be determined just by looking at the question word.
However, in Finnish this is not always a straight-forward task as the language is morphologically
rich. Typically, instead of prepositions there are agglutinated morphemes denoting the in°ected
cases, and within a noun phrase, for example, the words comply to congruity, i.e., attributes follow
the case of the head word. As a simple example, consider the following uses of 'who' (kuka):
3http://www.kielikone.¯/en/
4http://www.connexor.com/
5http://www.tilastokeskus.¯/index en.html</p>
        <p>Finnish</p>
        <p>English</p>
        <p>Dictionary
Country and</p>
        <p>Capital
Translation</p>
        <p>Database</p>
        <p>
          There are 15 cases for each noun, adjective, pronoun and numeral in singular, and another 15 in
plural. Furthermore, many morphemes produce changes also in the word body, and thus merely
stripping morphemes at the end of the word is often of little avail. Without a morphological
analysis it would be very di±cult to take any further steps, because words are seldom used in
their baseforms. We employ Connexor's functional dependency parser [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for Finnish. Consider a
sentence: MinÄa vuonna se alkoi? ('In what year did it start?'). It would be parsed as:
From this we see that, for instance, the pronoun minÄa is the essive of mikÄa (what) and that it is
an attribute of the nominal head vuosi (year).
        </p>
        <p>The time-related questions in the test corpus typically fell into one of three categories: a
general 'when' (milloin, koska), a speci¯c interval, e.g., 'what yearjmonthjtime' (minÄa vuonna?,
missÄa kuussa?, mihin aikaan? ) and a duration, 'how long' (kuinka kauan, kauanko, miten kauan,
kuinka pitkÄan aikaa ). The ¯rst two are date-questions and the last a measure-question.</p>
        <p>Likewise, many measure-questions are somewhat straight-forward to recognize. The question is
scanned for occurrence of quantity-related question words ( e.g., kuinkajmiten monijkauanjpaljon,
montakojmonikojpaljonkojkauanko). Then there are 'what-is'-questions, such as MikÄa on Suomen
vÄakiluku? (What is the population of Finland?) The classi¯cation of the question relies in
identifying the complement (population) as a measure-related word. Same technique is used with
person, location, and organization related question: the type is based on classifying the object or
the complement. Sometimes verbs are a helpful indicators of the question type.</p>
        <p>The question classi¯ed in other-type, if the complement is a verb or if the complement or
the object does not relate to person, location or organization. The manner-questions typically
start with either mitenjkuinka (how) or millÄa tavoinjtavallajkeinoin (in what mannerjway). It has
been di±cult to identify object-questions as they vary considerably. Hence, we regard them as
other-type.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Translator</title>
        <p>Once the question has been classi¯ed, it is passed on to the Translator. It decides which of the
words are translated, how to deal with proper names, homonyms and polysemous words and with
words that have no translation in the dictionary. The Translator also decides which words are
used in the query that is given to the IR Engine, and in answer extraction pattern prototype
instantiation. For these decisions, it uses the syntactic parse tree of the question.</p>
        <p>Once a question and its type is received by the Translator, it checks for country and capital
names in the Country and Capital Translation Database. It contains 244 country and capital
names in Finnish and their translations into English. The country and capital information is up
to date as a new database is fetched from the web pages of Statistics Finland every once in a while.
The version that we used in the CLEF evaluation exercise dates from 16.4.2004. This caused some
problems, because the World has changed since 1994 and 1995 from where the CLEF newspaper
text database dates. For example, two CLEF questions were about Yugoslavia, which our Country
and Capital Translation Database naturally did not contain.</p>
        <p>If the question contains a name that is in the database, it is given a translation and taken o®
from the list of words that will be passed on to the dictionary software. It is crucial that the proper
names have been transformed into their baseforms before their existence in the database is checked
because the database naturally does not contain any in°ected proper names. For example, among
the 34 country and capital names occurring in this year's questions, only 2 were unin°ected.</p>
        <p>After the Country and Capital Translation Database checking routine the translator determines
which words are passed on to the dictionary software. All nouns are translated. If no translation
is found, and the noun is a compound word, it is split into two parts both of which are used in
the search from the dictionary. If there are more than two parts in the compound, then the last
part forms the ¯rst search word and all the rest of the parts form the second search word. This
is sensible, because quite often the preceding parts together are a modi¯er of the last part. For
example (compound boundaries are marked with #): In kori#pallo#joukkue (basketball team)
kori#pallo (basketball) modi¯es joukkue (team). This very coarse heuristic also has many
counterexamples. One of them is kulttuuri#pÄaÄa#kaupunki (Capital of Culture) where kulttuuri (culture)
modi¯es pÄaÄa#kaupunki (capital). In those cases where the noun is a compound word containing
at least three parts and where the ¯rst part begins with a capital and ends with a hyphen, we split
the word into dictionary search words from the hyphen, because the ¯rst part is most probably a
proper noun and an unin°ected modi¯er of the latter part and the latter part is the main part of
the compound and it is in°ected. For example in Andrew-#pyÄorre#myrsky (Hurricane Andrew)
Andrew is a modi¯er for pyÄorre#myrsky (Hurricane). The proper noun could also contain several
parts, for example La# Scala -#ooppera#talo (La Scala opera house), where La# Scala modi¯es
ooppera#talo (opera house).</p>
        <p>In addition to nouns, all adjectives that are attributes to nouns are translated. For example,
in How many Japanese students were there in the United States in 1990?, Japanese is translated
because it is a modi¯er of students.</p>
        <p>If a word has no translation in the dictionary, and it looks like a proper name (begins with a
capital and is not the ¯rst word of the question), its case is checked. If it is not nominative, but
one of the other fourteen cases in which a noun can be, the baseform is passed on. Otherwise, the
original word in the question is passed on. This is because in the nominative case, no in°ection
is added to the proper name, while in the other cases, a su±x is added to the end of the word.
In order to be able to use an in°ected proper name as an English query term, we have to ¯nd its
baseform.</p>
        <p>The main reason for only translating nouns and their attributes is that the verbs used in the
questions tend to be highly polysemious and they tend to have one or more homonymes. For
example, in the case of this year's question number 40: Who directed "Braveheart"?, in Finnish
Kuka ohjasi elokuvan "Braveheart - Taipumaton"? the verb ohjata (to direct) has 22 di®erent
senses in English, and only the seventh is the correct sense. However, the problem of polysemious
words and homonyms also exists for nouns. For example, in this years question set, question 192
contained the word laivasto (navy), for which our dictionary software gave 3 di®erent senses and 4
di®erent translations (°eet, naval, forces and navy). If the di®erent translations represent the same
sense, they are often synonyms or regional variants. An example of synonyms: the translation
candidates for laulaja are:</p>
        <p>singer, songster, vocalist
which all represent the same sense according to our dictionary software. An example of regional
variants: the translation candidates for maanalainen are:</p>
        <p>metro, tube (br; the tube), underground (br; the underground), subway (yl am)
where br means British English and am means American English.</p>
        <p>There are two main problems that could be studied further in the Translator. First, we should
investigate whether query terms and answer extraction pattern prototype instantiation terms
should be di®erent. At the moment, the same terms are used for both.</p>
        <p>The second area for further investigations is that of ¯nding the correct translation or
translations for a word in a given question. At the moment we take at most the two ¯rst translations
and hope that the correct one is among these. Usually it is, because in general, the dictionary
software lists the translation alternatives in the order of their frequency.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Information Retrieval</title>
      <p>After the query terms have been selected, tey are given to the information retrieval engine. We
used Managing Gigabytes (MG)6 for IR task in Tikka. MG is an open source text indexing and
retrieval engine developed as a joint venture of multiple Australian universities.</p>
      <p>Prior to indexing, the documents were split so that each document was in its own ¯le. The
more ¯ne-grained segmentation was not applied, since some of the answers to the training questions
were not within one sentence, or even one paragraph. The ¯les were then fed to MG for indexing.
The contents of the documents were not otherwise preprocessed, although it might have enhanced
the results, since the special characters caused some problems in retrieval. For instance, changing
the dollar signs to corresponding strings might have been worthwhile.</p>
      <p>The maximum number of retrieved documents was limited to one hundred, since we did not
want the document sets to be processed in Answer Selection phase grow too large. In our
experiments we noticed that if found at all, the document containing the correct answer was generally
within the ¯rst 100. By default, MG was run in boolean query mode.</p>
      <p>From our point of view, MG has some drawbacks. Firstly, it does not support phrase search
or proximity constraints. This made it di±cult to search for compound terms. It would also have
been nice to be able to weight the terms according to their importance. For instance, one would
have wanted to tell that the proper names occurring in the question are obligatory and they must
be present in the retrieved documents, but other terms are less important. Now each of the terms
were treated individually, and given the same relevance.</p>
      <p>Especially with the questions that included proper names the boolean mode proved to work
better than the ranked query. Since the query terms could not be weighted, the ranked query could
sometimes give lots of irrelevant results. In the boolean mode at least the presence of the most
fundamental terms can be required. Sometimes the query conditions were too strict, however, and
the result set became empty, in which case the mode was switched. This might cause the amount
of the result document set to grow so large that the document with the correct answer could be
left out of the set of 100 best and , hence, not be processed at all.</p>
      <p>According to our experiments with the training data set, it seemed worthwhile to include
also the corresponding adjective to the question as an alternative in case there was a name of a
nation in the question. This is because the translations are in some situations more natural if
the part of speech is altered. For example, question 29 in the test set was in English What is the
o±cial German airline called? The corresponding Finnish question is MikÄa on Saksan virallisen
lentoyhtiÄon nimi? Here German is an adjective, but Saksan is the genetive form of the noun Saksa
(Germany).</p>
      <p>Another motivation for adding the corresponding adjectives/nouns is the fact that even within
one natural language, both of these expressions occur in sentences that have the same meaning.
For instance, question 90 in the training set was How many people in U.S. do not have health
insurance?, where U.S. is a noun. The correct answer to it was 37 million, which existed in
the following snippet: ... the existing system, which leaves 37 million Americans without health
insurance and ... There the triggering term is American, which is an adjective.</p>
      <p>The expansion of the query terms with synonyms would probably have improven the results.
The disambiguity of the query terms, especially in bilingual question answering task, enlargens
the expansion term candidate set notably, however. Some proper names could have quite easily
be expanded, though, such as United States, which might have been worth expanding with terms
US, America and American, as discussed above.</p>
      <p>The most important terms in the query seemed to be the proper nouns, as one might expect.
After that came the common nouns, possibly expanded with their synonyms. Next to the common
nouns were verbs, outside of some verbs that were so common that they didn't actually mean
anything (such as do, be). The least important group of words were generally the adjectives,
though there were some questions in which the adjectives were very signi¯cant, for instance in
question 79: What is the highest active volcano in Europe?. This has been taken into account in
query term selection, as was described in section 3.2.</p>
      <p>The seach results are passed onward to Answer Selection module for the execution of the next
phrase, the answer extraction.
5
5.1</p>
    </sec>
    <sec id="sec-5">
      <title>Answer Processing</title>
      <sec id="sec-5-1">
        <title>Answer Extraction Patterns</title>
        <p>
          Answer extraction pattern instantiation is the ¯rst step in Answer processing. This is done by
creating instances of pattern prototypes. Each question type has a set of pattern prototypes that
have been induced from the 1994 L.A. Times and the 1995 Glasgow Herald using the Multisix
Corpus [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The pattern prototypes have slots where translated words from the question are
inserted in order to form pattern instances.
        </p>
        <p>
          Tikka contains pattern prototypes for six question types. They are: date, de¯nition, location,
measure, person and other. Based on the question types in the Multisix Corpus [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], we could
have developed pattern prototypes also for the classes object and organization. However, we
picked the most common categories for pattern prototype development and left the rest for future
development. Other is a class where we classify all those questions that do not belong to the
other ¯ve classes. In addition, the CLEF-2004 Question Answering Track Guidelines 7 contained
classes abstraction and manner, but we did not develop pattern prototypes for these since we had
no training material.
        </p>
        <p>Below are 3 examples of the 11 instantiated location patterns for question 116 Where is the
Reichstag?
[Ii]n (([A-Z][a-z]+ ){1,5}[A-Z][a-z]+), a [a-z]* [a-z]+,[^a-zA-Z0-9]+Reichstag[,\.]
at ([A-Z][a-z]+,? ([A-Z][a-z]+)?), [^\.\?\!0-9\",]* Reichstag</p>
        <p>Class
date
de¯nition
location
measure
other
person
total</p>
        <p>The prototypes of these patterns are identical to the instantiated patterns, except that the word
Reichstag is replaced with a wildcard denoting any noun from the question. The third pattern is
the one that matched both of the answers that were found. Both of the answers are Berlin, and
here is their context:</p>
        <p>Two matches for question 116:
WORKERS lower a giant panel of cloth over the entrance to the Reichstag in Berlin, helping
Hungarian artist Christo to fulful a dream of 24 years.</p>
        <p>Reichstag in Berlin, he
He will use 160 assistants to wrap the Reichstag with 90,000 square yards of a silver propylene
fabric, chosen "because it ¯ts with the building, the heaven and light in Berlin." Christo
Reichstag with 90,000 square yards of a silver propylene fabric,
chosen "because it ¯ts with the building, the heaven and light in Berlin."</p>
        <p>The answer pattern prototypes consist of regular expressions and of slots for proper names
and other words that have been picked from the question. The answer pattern prototypes do not
contain any syntactic or morphological information at the moment. Table 1 lists all the pattern
classes and the number of prototype patterns that each class contains. In future research, it would
be interesting to incorporate at least part of speech information into the patterns. Examples of
pattern instances that are derived from the same location pattern prototype:
the city of ([^ ,\.\?\!0-9]+), Mike Kelley[^\.\?\!0-9]*
the town of ([^ ,\.\?\!0-9]+), Mike Kelley[^\.\?\!0-9]*</p>
        <p>In the above example, the word kaupunki has two translations, city and town, and the pattern
prototype is expanded with both.</p>
        <p>Another example:
PROPER NAME[^,\.\?\!0-9]* TITLE,? [^A-Z]*(([A-Z][a-z]+[ -])*[A-Z][a-z]+)</p>
        <p>In the above person pattern prototype the slots for PROPER NAME and TITLE are ¯lled with
words from the question. For example, in the question 2 from 2003, Kuka on YK:n pÄaÄasihteeri?,
Who is the head of the United Nations?, the slot for PROPER NAME is ¯lled by UN, United
Nations and UN (United Nations). The slot for TITLE is ¯lled by Sectretary General and
secretarygeneral. When all these instantiations are combined, we end up with 6 di®erent pattern instances.
The di®erent variations for the slots except for the combination UN (United Nations) are retrieved
from the dictionary. For all acronyms that have the longer form listed in the dictionary, the system
performs the same type of expansion as for UN.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Answer Selection and Scoring</title>
        <p>Answer selection is based on frequency, which means simply that among the answer candidates,
the answer that appears most often is selected. If there are several answer candidates with the
same frequency, the one appearing ¯rst in the results retrieved by the IR Engine, is selected. This
is a reasonable approach, because the IR Engine search results are ranked in the order of relevance.</p>
        <p>Con¯dence measure generation is a function of both the total number of candidates retrieved
and of the frequency of the selected candidate. This function is illustrated as an area plot in
Figure 2. In Tikka, the frequencies and numbers of di®erent candidates are discrete and not
continuous as shown in the ¯gure. The con¯dence score is 1, if the number of di®erent candidates
is a number between 1 and 5, or if the number of di®erent candidates is a number between 6 and
14 and the frequency of the candidate is greater than 1 (the area marked with tiles in ¯gure 2).
The con¯dence score is 0.5 if the number of di®erent candidates is between 6 and 10 and the
frequency is 1 (the area marked with diagonal lines in ¯gure 2). The con¯dence score is 0.25 if
the number of di®erent candidates is between 11 and 14 and the frequency is 1, or if the number
of di®erent candidates is over 14 (the area marked blank in ¯gure 2). All those answers that we
detected as not having an answer in the text database (answers of type NIL) had a con¯dence
score of 0. Detecting the degree of con¯dence for answers of type NIL is a goal for future research.</p>
        <p>FREQ
3
2
1
1
0.5</p>
        <p>As can be seen from the table 3, the con¯dence function should have been more strict, i.e.
the score 1 should have been given to fewer answers. However, the con¯dence function depends
heavily on the data and questions at hand and on how well the answer extraction patterns match
to that data. We trained Tikka with questions and answers from QA@CLEF from 2003, and it
seems that the answer extraction pattern prototypes were too speci¯c to those answers. With the
2003 questions we got 132 NIL answers, but with this years material, the number of NIL answers
was 159. The distribution of con¯dence scores from 2003 is shown in table 3.
6
6.1</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Evaluation</title>
      <p>Accuracy
Accuracy of factoid questions
Accuracy of de¯nition questions
Number of NIL answers
Accuracy of NIL answers
Con¯dence-weighted score</p>
      <p>We had 21 right answers, among which 20 were factoid questions and 1 was a de¯nition
question. One of our answers was inexact and there were no unsupported answers in our answer
set.
6.2</p>
      <sec id="sec-6-1">
        <title>Inter-Translator Agreement</title>
        <p>The questions for Finnish-English QA were translated from English. The assessor of the
evaluation campaign compared the English questions against the results given by Tikka. However, the
translation process is not straightforward, because for most questions, there seem to be as many
translations as there are translators. In addition, not all questions are sensible when translated.
For example, the question 86 (What does a luthier make?) became pointless in Finnish, because
our word for luthier (soitinrakentaja) tells what a luthier does. Another example of the in°uence
of translation on the questions is question 85 What did the artist Christo wrap up?. This can be
translated in two ways which have a completely di®erent meaning due to the ambiguity of the
verb to wrap up. To wrap up can be translated as denoting concrete wrapping up, which was
the correct meaning according to the correct answer, which is that The artist Chisto wrapped up
the Reichstag in silver fabric tied with blue rope. The other meaning of to wrap up is an abstract
one, and it means ¯nishing something. The translations 1 and 2 translated to wrap up with its
concrete sense paketoida, but translation 2 has the abstract sense saattaa pÄaÄatÄokseen. We did two
more translations of the English questions by translators who had not seen the o±cial translation
in order to measure the di±culty of the translation task and the inter-translator agreement rate.
The amount of inter-translator agreement is illustrated in Table 5. Translation 1 is the o±cial
translation where the errors have been corrected 8.</p>
        <p>All three translations:
Translation 2 and 3:
Translation 1 and 2:
Tarnslation 1 and 3:</p>
        <p>Absolute numbers
51/200
86/200
74/200
69/200</p>
        <p>The most common translator disagreement types are lexicogra¯c disagreement, word order
disagreement and disagreement in the use of conventions. Lexicographic disagreement means a
di®erent choice of words where the words are synonyms or semantically very closely related. For
example: manufacture translated as valmistaa or tuottaa. Word order disagreement means that
the words in the question are in a di®erent order. For example: MissÄa on Hyde Park? (Where
is Hyde Park?) and MissÄa Hyde Park on? (Where Hyde Park is?). Disagreement on the use of
conventions means that there are many, equally correct, di®erent conventions on how to express
a concept. For example, there are several conventions for expressing names of movies that have
originally appeared in another language than Finnish. For example, the question 175 is about the
movie Nikita. Nikita was translated into Finnish in three di®erent ways: Nikita, elokuva \TyttÄo
nimeltÄa Nikita" and Nikita (La Femme Nikita). The translation of names of movies is problematic
because some movies have an o±cial translation into Finnish and some don't. In the case of
Nikita, there were two o±cial translations, Nikita and TyttÄo nimeltÄa Nikita. Proper names are
often typed, elokuva \TyttÄo nimeltÄa Nikita" (movie \Nikita"), because then the type (movie) gets
in°ected and there is no need to in°ect the proper name. One convention in expressing movie
names is that of ¯rst writing the o±cial translation in Finnish and then adding the name of the
original movie in parenthesis after it, as in Nikita (La Femme Nikita). It will be interesting to
compare the QA results with translations 1, 2 and 3 once we have the correct answers for this
year's questions.
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusions and Future Work</title>
      <p>To the best of our knowledge, the work presented in this paper is the ¯rst time cross-language QA
has been done using Finnish as a source language. Altogether, there has been very little work on
any type of QA for Finnish. Keeping this in mind, it was interesting to get the system up and
running and to observe that it could answer 10,88 % of the questions presented to it correctly.</p>
      <p>Due to the very di®erent nature of Finnish in comparison to any of the other languages
participating in the QA@CLEF, special attention has been paid to question translation and to the
e®ects of the translation phase to the overall performance of the system. This is also a sub¯eld
on which we plan to focus our attention in the future.</p>
      <p>Another interesting sub¯eld is that of answer extraction patterns. We plan to study carefully
which patterns matched well and which didn't and to ¯nd out the reasons for this. We are also
planning to investigate the use of POS tags and possibly surface syntactic tas in the answer
extraction patterns. The results obtained in this evaluation showed that by developing further the
question and answer processing modules, as well as by tuning the IR engine more carefully, the
performance of Tikka is very likely to improve.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Aunimo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Heinonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kuuskoski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Makkonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Petit</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Virtanen</surname>
          </string-name>
          .
          <article-title>Question answering system for incomplete and noisy data: Methods and measures for its evaluation</article-title>
          .
          <source>In Proceedings of the 25th European Conference on Information Retrieval Research</source>
          (ECIR
          <year>2003</year>
          ), pages
          <fpage>193</fpage>
          {
          <fpage>206</fpage>
          ,
          <string-name>
            <surname>Pisa</surname>
          </string-name>
          , Italy,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Busemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schmeier</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. G.</given-names>
            <surname>Arens</surname>
          </string-name>
          .
          <article-title>Message classi¯cation in the call center</article-title>
          .
          <source>In Proceedings of 6th Applied Natural Language Processing Conference</source>
          , Seattle, Washington, USA,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          .
          <article-title>Scenarios for interactive cross-language retrieval systems</article-title>
          .
          <source>In Proceedings of the Workshop</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Cross-Language Informaion Retrieval</surname>
          </string-name>
          : A Research Roadmap Workshop held at the 25th Annual International ACM SIGIR Conference, Tampere, Finland, aug
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>JaÄrvinen</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Tapanainen</surname>
          </string-name>
          .
          <article-title>A dependency parser for english</article-title>
          .
          <source>Technical Report TR{1</source>
          , Department of General Linguistics, University of Helsinki,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Romagnoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vallin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Penas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Peinado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Verdejo</surname>
          </string-name>
          , and M. de Rijke.
          <article-title>The Multiple Language Question Answering Track at CLEF 2003</article-title>
          . In C. Peters, editor,
          <source>Working Notes for the CLEF 2003 Workshop</source>
          , Trondheim, Norway, aug
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>