<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>First evaluation of Esfinge - a question answering system for Portuguese</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luís Costa Linguateca at SINTEF ICT Pb</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Blindern</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Norway luis.costa at sintef.no</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>In this paper I will start by describing Esfinge - a general domain Portuguese question answering system, and then the strategies I used to participate in the CLEF-2004 QA track. Then I will present and discuss the results obtained and finally describe some of the work planned for the near future. With a question answering system we want, for a given question, that the system be able of returning answers with the help of an information repository. This task requires the processing of the question and of the information repository. Existing systems use in this processing various linguistic resources like taggers, named entities extractors, semantic relations, dictionaries, thesauri, etc… Esfinge (http://acdc.linguateca.pt/Esfinge/) is based on the architecture described by Eric Brill in (Brill, 2003). Brill tried to check the results that could be obtained by investing less in the resources to process the question and the information repository and more in the volume of the information repository itself. The Web, as the biggest free information repository that we know is a good candidate for these experiences. Brill's approach was never tried for Portuguese and this language is quite used in the Web (Aires &amp; Santos, 2002). The motivation to start developing Esfinge was to check the results that could be obtained by applying Brill's approach to Portuguese. The planned architecture has four modules:</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Esfinge</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Question reformulation</title>
      <p>In this module, patterns of plausible answers to a given question are obtained. These patterns are based on the
words in the question. As an example, for the question: “In which year did Vasco da Gama arrived in India?” a
plausible pattern would be “Vasco da Gama arrived in India in”.</p>
      <p>It’s too optimistic to expect the existence of pages with answers in “friendly” formats for all the questions (with
the exact format as the result of the question reformulation module). Therefore, patterns of plausible answers
with less ambitious strings, like for example the simple conjunction of the question words are also considered.</p>
      <sec id="sec-2-1">
        <title>Each one of these patterns is scored according to the probability of helping to find correct answers.</title>
      </sec>
      <sec id="sec-2-2">
        <title>The patterns were initially scored according to my intuition. At the moment the scores range from 1 to 20. The linguistic information of this module is encapsulated in a text file using the regular expression syntax of the computer programming language Perl. Each triple (question pattern, answer pattern, score) is defined in a line separated by a slash (/).</title>
      </sec>
      <sec id="sec-2-3">
        <title>Here follows a sample of the referred text file (it’s actually a simplification for clarity sake):</title>
        <p>O que ([^\s?]*) ([^?]*)\??/"$2 $1"/10
O que ([^?]*)\??/$1/1
The first rule says that for a question starting with “O que X Y?” (What X Y?), answers with the pattern “Y X”
should be granted the score 10 (since Y and X are enclosed in double quotes, it means this is a phrase pattern – Y
must appear just before X). For the question “O que é a MTV?” (What is MTV?), this rule generates the pattern
“a MTV é” with the score 10.</p>
        <p>The second rule says that for a question starting with “O que X?” (What X?), answers with the pattern X should
be granted the score 1. For the question in the previous example, this rule generates the pattern “é” “a” “MTV”
with the score 1. Since the words in the pattern are not all enclosed in a pair of double quotes – this means they
don’t need to appear in this order or even in the same sentence.
1.2</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>N-grams harvesting</title>
      <p>
        In this module, the resulting patterns of the Question Reformulation module are queried against an information
repository. For that purpose they are submitted to a web search engine (for the moment I’ve been using Google
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]).
      </p>
      <sec id="sec-3-1">
        <title>In Figure 1 we can see the results of querying the pattern “a antiga capital da Polónia” (the former capital of</title>
      </sec>
      <sec id="sec-3-2">
        <title>Poland) in Google.</title>
      </sec>
      <sec id="sec-3-3">
        <title>The next step is to extract and measure the frequency of word N-grams from the resulting snippets (I’m considering the first 100 snippets).</title>
      </sec>
      <sec id="sec-3-4">
        <title>I’ve been using Ngram Statistics Package (NSP) (Banerjee &amp; Pedersen, 2003) for that purpose. For the 100 first document snippets of the previous query we got the following N-gram distribution (I’m presenting only the 16 most frequent N-grams):</title>
      </sec>
      <sec id="sec-3-5">
        <title>There is some hope that the correct answer will be among the extracted N-grams Next, these N-grams of different lengths will be scored accordingly to their frequency, length and the scorings of the patterns that originated them.</title>
      </sec>
      <sec id="sec-3-6">
        <title>I’m using the following equation: N-gram score = (F * S * L) through the first 100 snippets resulting from the web search where</title>
        <p>F = n-gram frequency
S = score of the search pattern which recovered the document
L = n-gram length
1.3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>N-grams filtering</title>
      <p>This module re-evaluates the scorings obtained in the module “N-grams harvesting”. In this module the N-grams
will be analysed by their particular features.</p>
      <sec id="sec-4-1">
        <title>To a given question, even if we don’t know the answer, we can predict the type of expected answer.</title>
      </sec>
      <sec id="sec-4-2">
        <title>For example:</title>
        <p>- A “When?” question implies an answer of type “date”. It can be more or less precise: a year (like 1973) or a
complete date (like 11/10/1973), but answers like “Lisboa” or “George W. Bush” don’t make sense in this
context.
- A “How many?” question implies an answer of type “number”. Answers like “Oslo” or “5/8/2004” are not
acceptable answers.</p>
        <p>Analysing the N-Grams about the presence of digits, capitalization and typical patterns may allow reclassifying
those N-Grams or even discarding them. The PoS information provided by a morphologic analyser may also be
used to enhance the scorings of N-grams with interesting sequences of PoS categories.
1.4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>N-grams composition</title>
      <p>This module tries to deal with questions with a set of answers, like “Who were the musicians in Queen?”. The
complete answer to this question demands the composition of the word N-grams "Freddy Mercury", "Brian
May", "Roger Taylor", "John Deacon" that can be expected to be among the top scored word N-grams obtained
from the three previous modules.</p>
      <p>The first task in this module is to determine whether the type of answer is singular (ex: “Who was the first king
of Norway?”), plural with a known number of items (ex: “Which are the three largest cities in Portugal?”) or
plural with an unknown number of items (ex: What are the colours of Japan’s flag?”).</p>
      <p>For the first type this module will return the best scored word N-gram resulting from the previous modules. For
the second type it will return the required number of best scored word N-grams (three, in the example above).
For the third type, it will need to decide which word N-grams will be part of the answer. This can be done using
a threshold that will define which word N-grams will be part of the answer according to their scoring. The
proximity of the scoring values can also be used as a decisive factor.</p>
    </sec>
    <sec id="sec-6">
      <title>Strategies for CLEF 2004</title>
      <p>Esfinge is still in its first stages of development, but participating in the CLEF-2004 QA track seemed a good
way of evaluating the work done so far, feel some of the difficulties in this IR field and get in touch with the
state-of-the-art of actual QA systems and their approaches.</p>
      <p>For the QA-CLEF monolingual track, one had to supply, along with each answer, one document in the document
collection that supported it. As said above, my system originally used Google’s search results and was mainly
statistical (tried to use the redundancy existing in the Web), so I knew I would need to add some extra
functionalities.</p>
      <p>I tested three different strategies. In the first, the system searched the answers in the CLEF document collection
(Run1). In the second, it searched the answers in the Web and used the CLEF document to confirm these
answers (Run 2). Finally, in the third strategy my system searched the answers in the Web (this one was not
submitted to the organization).
2.1</p>
      <p>Run 1</p>
      <sec id="sec-6-1">
        <title>The first thing I needed was some way of searching in the document collection. I have some experience in encoding corpora using IMS Workbench (Christ et al., 1999) as well as using its querying capabilities. So, it seemed a good idea to use it to encode the CLEF document collection and to use its querying capabilities to search for desired patterns.</title>
        <p>Another important decision to be made was the size of the text unit to be considered when searching for patterns
– the entire text of each document, or a passage: a fixed number of sentences or a fixed number of words. I had
not a definitive answer for this question, so I chose to do some experiments.</p>
        <p>
          Since the document length seemed too big for a unit, I tried the three following strategies:
- Considered the text unit as 50 contiguous words. This is done dynamically: it is possible to query corpora
encoded using IMS Workbench for the context (in terms of words) in which the required patterns co-occur.
- Divided each document into sentences. Those sentences were considered as the text unit. . To segment the
document collection into sentences, I used the Perl Module Lingua::PT::Segmentador [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The resulting
sentences had an average of 28 words per sentence.
- Divided each document into sets of three sentences. Those sets of three sentences were considered as the text
unit.
        </p>
        <p>For each question in the QA track, Esfinge did the following steps:
1. Question reformulation
- Submitted the question to the question reformulation module. The result was a set of pairs (answer pattern,
score).</p>
      </sec>
      <sec id="sec-6-2">
        <title>2. Passage extraction</title>
      </sec>
      <sec id="sec-6-3">
        <title>3. N-grams harvesting</title>
        <p>- Searched each of these patterns in the document collection and extracted the text units (50 contiguous words,
one sentence or three sentences) where the pattern was found. The system discards stop-words without context.
For example in the query “a” “antiga” “capital” “da” Polónia”, the words “a” and “da” are discarded while in
the query “a antiga capital da Polónia” (phrase pattern) they are not discarded. Currently I’m discarding the 22
most frequent words in the CETEMPúblico corpus (Santos &amp; Rocha, 2001). At this stage the system retrieved a
set of document passages {P1, P2 … Pn}.
- Computed the distribution of word n-grams (from length 1 to length 3) of the document excerpts.
- Ordered the list of word n-grams according to a score based on the frequency, length and scorings of the
patterns that originated the document excerpts where the n-grams were found (formula in section 1.2). At this
stage the system had an ordered set of possible answers (A1, A2 … An).</p>
      </sec>
      <sec id="sec-6-4">
        <title>4. N-grams filtering</title>
        <p>- The next step was to discard some of these possible answers using a set of filters.</p>
      </sec>
      <sec id="sec-6-5">
        <title>The filters used were:</title>
        <p>- First, a filter to discard answers that are contained in the questions. Ex: for the question “Qual é a capital da
Rússia” (What is the capital of Russia?), the answer “capital da Rússia” (capital of Russia) is not desired and
should be discarded.
- Then, a filter that used the morphologic analyser jspell (Simões &amp; Almeida, 2001) to check the PoS of the
various words in each answer. The analyser returns a set of possible PoS tags for each word. I erroneously
assumed that the order in which the PoS tags were returned was related to their frequency. With that in mind, I
was using only the first PoS for each word. Recently I found out that this assumption was wrong. This filter
considered some PoS as “interesting”: adjectives (adj), common nouns (nc), numbers (card) and proper nouns
(np). All answers whose first and final word didn’t belong to one of these “interesting” PoS were discarded. It’s
worthwhile to say that most probably my misinterpretation of the analyser’s results led to a poor performance
by this filter.</p>
      </sec>
      <sec id="sec-6-6">
        <title>For the question “Quem é Andy Warhol?” (Who is Andy Warhol?), the system had the following answers among the highest scored: Example:</title>
        <p>The morphologic analyser gives the following information:
que: prel
um: art
de Andy: prep np
por: prep
como: con
pela primeira vez: cp nord nc
sua: ppos
mais: pind
ou: con
artista: nc
que Andy: prel np
com esta dimensão: prep pdem nc
segundo andar chamado: nord nc v
cola em garrafa: nc prep nc
artista: nc
cola em garrafa: nc prep nc</p>
      </sec>
      <sec id="sec-6-7">
        <title>After applying the filter the set of highest scored answers will be:</title>
        <p>- The final answer will be the candidate answer with the highest score in the set of candidate answers which
were not discarded by any of the filters above. If all the answers were discarded by the filters then the final
answer is NIL (meaning the system is not able to find an answer in the document collection).
2.2</p>
        <p>Run 2
It was possible to send two sets of results to the organization. I wanted to do some experiments using also the</p>
      </sec>
      <sec id="sec-6-8">
        <title>Web as source since that’s the line of work where I expect to get better results.</title>
      </sec>
      <sec id="sec-6-9">
        <title>For that purpose I selected one of the previous experiences to send to the organization (my run1): the one considering sets of three sentences as the text unit because it seemed the one with best results, even though the results were quite similar in all three experiments.</title>
      </sec>
      <sec id="sec-6-10">
        <title>The next experiment used the strategy described in (Brill et al., 2001).</title>
        <p>First, it looked for answers in the Web, and then tried to find documents in the document collection supporting
those answers. It submitted the patterns obtained in the question reformulation module to Google. Then the
document snippets {S1, S2 … Sn) were extracted from Google’s results pages. These snippets are usually
composed by fragments of the different sentences in the recovered documents that contain the query words and
have approximately 25 words.</p>
      </sec>
      <sec id="sec-6-11">
        <title>The next step was to compute the distribution of word n-grams (from length 1 to length 3) existing in this</title>
        <p>document snippets. From this point the algorithm followed the one described as run 1, with an extra filter in the
N-grams filtering module: a filter that searched the document collection for documents supporting the answer –
containing both the candidate answer and a pattern obtained from the question reformulation module. This filter
is necessary because it was stated in the task guidelines that the system should return the code of a document
supporting each answer.
2.2.1</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Brazilian Portuguese. A problem?</title>
      <p>Using texts in Brazilian web pages may enlarge the corpus the system uses to find answers, but may also bring
some problems. The system may return an answer in the Brazilian variant which is not possible to support in the
document collection, which was built with newspaper texts written in European Portuguese.
Example: For the question “Qual é a capital da Rússia?” (What is the capital of Russia?), my system returned the
answer “Moscou” (in the Brazilian variant). It would be much easier to support the answer “Moscovo” (same
word in the European variant).</p>
      <p>Another problem may occur when the scoring gets diluted by the two variants (like “Moscou” and “Moscovo” in
the example), thus allowing other answers to get better scores. Searching only in Portuguese pages can obviate
this problem, but will diminish the corpus to search into.</p>
      <p>Yet another example can be illustrated by the query in section 1.2: “a antiga capital da Polónia”. Even though,
using the word “Polónia” (Portuguese variant) in the query, this word is not on the top 10 of harvested n-grams.
On the other hand, “Polônia” (in the Brazilian variant) is third placed on the n-gram ranking. The reason for this
is that Google doesn’t differentiate between accentuated and non-accentuated characters, so the characters “ó”,
“ô” and “o” are exactly the same thing to this search engine. This can be a serious problem, when one is
processing a language with the variety and heavy use of accentuation as present in Portuguese. One way to
obviate this problem is to develop a post-Google filter to discard non-interesting documents, thus overcoming</p>
      <sec id="sec-7-1">
        <title>Google’s limitations regarding the Portuguese.</title>
        <p>2.3</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Web-only experiment</title>
      <p>For the present paper, I did an extra “run” using the Web as document collection and without crosschecking the
answers in CLEF’s document collection. I thought this experience could give some insight on whether there are
advantages in combining two different information sources (Web and CLEF’s document collection) or whether
one can get better results using only one of these information sources.
3.1</p>
    </sec>
    <sec id="sec-9">
      <title>Results by type of question</title>
      <p>Quem (Who)
Qual (Which)
Onde (Where)
O que (What)
Em que (In which)
Quanto(a)s (How many)
Como (How)
Que (What, Which)
Quando (When)
De que (Of what, which)
A que (To which, what)
Mencione, Nomeie,
Indique (Name)
X ... em que (... in which)
Total</p>
      <sec id="sec-9-1">
        <title>In Table 1 we can see that the results in run2 (the one which used the Web crosschecking the results in the</title>
        <p>document collection) are slightly better. However we can also see that the type of question is not irrelevant to the
results. For example run1 had better results for questions of type “Qual” (Which). There are also some relatively
frequent questions types without any right answer in either of the runs (like “Como”, “Quando”, “De que”). This
probably means that there is something in these types of questions that Esfinge doesn’t deal properly in the
answer-finding procedure.</p>
        <p>
          Both run1 and run2 were evaluated by the organization. To evaluate my Web-only experience I needed to know
the right answers. For that purpose I created a list with my “right answers” to the question set.
The Web-only experience is in some aspects a different task from the one proposed in CLEF. For example it
was stated in CLEF’s guidelines [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] that some questions might have no answer in the document collection (NIL
answer), but it’s much more difficult to say such thing when using the Web as the document collection. For that
reason I considered not answered questions as wrong when evaluating this experience. Since my system was not
recording the addresses of the documents it used to get the answers in the Web, it was not possible to check if
the answers were supported or not.
        </p>
        <p>Globally, we can see that the best results were obtained combining the use of the document collection and the
Web. The worst results are the ones obtained using solely the Web. It is somehow surprising that the results
using solely the document collection are better than the ones using solely the Web, since the approach I’m trying
to test was designed to take advantage of the redundancy in larger corpora. Possible explanations for this are:
- My system is not extracting efficiently text from the Web. Possibly it is getting control symbols and documents
in other languages - according to Nuno Cardoso (p.c.), it is common for search engines to mistake UTF for
iso8859-1 character encoding. Also, the snippets resulting from the search engine are most probably not good
enough to extract good answers, since most of those snippets are formed by truncated sentences.
- Some documents in the Web, rather than helping to find answers, do the exact opposite (jokes, blogs, …).
- The text size unit of 3 sentences
include all the words in the query.</p>
        <p>90 words gives a larger context, while many Google snippets do not even
3.2</p>
        <p>Results by question length
3 words
4 words
5 words
6 words
7 words
8 words
9 words
10 words
11 words
12 words
13 words
16 words
Total</p>
      </sec>
      <sec id="sec-9-2">
        <title>In table 2, I try to study the influence of the question length in the results of “run1” and “run2”.</title>
        <p>
          In order to determine the length of the questions, I used the Perl Module Lingua::PT::Atomizador [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] to tokenize
the questions..
        </p>
        <p>In “run1” the most significant results are obtained in questions from length 6 to 8, while in “run2” the system
gets better results in questions from length 5 to 6. This difference can be explained by the different length of the
passages recovered from the Web and from the document collection. The passages recovered from the Web
being shorter, may be more suited to answer shorter questions, while the passages recovered from the document
collection being longer, needs the questions to be longer in order to get the appropriate context.
3.3</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Results considering 5 answers per question</title>
      <p>The system can be configured to give more than one answer to a question. It returns these answers ordered by
their probability of being the right answer. In this table a question is considered rightly answered if a right
answer is found in the top 5 answers returned by the system.
% right
(run1)
12,5 %
To do this evaluation I needed also to know if the answers were supported in the document collection. For that
purpose I included in my list of “right answers” the list of questions to which I couldn’t find any answer
supported by the document collection (NIL answers).</p>
      <p>The results are obviously better than the ones present in the previous table, but not dramatically. This suggests
that most problems are located before the scoring of the candidate answers.
3.4</p>
      <p>Causes for wrong answers
# wrong
answers
in first 30
questions
(run 2)
% wrong
answers in
first 30
questions
(run2)</p>
      <sec id="sec-10-1">
        <title>Failure in document recovery</title>
      </sec>
      <sec id="sec-10-2">
        <title>Filter “discard answers</title>
        <p>contained in questions”</p>
      </sec>
      <sec id="sec-10-3">
        <title>Filter “interesting PoS”</title>
      </sec>
      <sec id="sec-10-4">
        <title>Filter “documents</title>
        <p>supporting answer”</p>
      </sec>
      <sec id="sec-10-5">
        <title>Answer scoring algorithm</title>
        <p>Answer length &gt;3
9
In the table above I tried to find out why the system produces wrong answers. To find the causes takes some
time, so I started with the run with best results (run 2) and did the evaluation only for the first 30 questions of the
question set. For some questions I counted more than one reason for failure. This sort of evaluation can give
some insight into the system modules that are causing more errors and therefore should be looked into more in
detail.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Future work</title>
    </sec>
    <sec id="sec-12">
      <title>Development of Esfinge 4.1.1</title>
    </sec>
    <sec id="sec-13">
      <title>Question reformulation</title>
      <p>In this module the linguistic information is encapsulated in a text file using Perl’s regular expression syntax. This
syntax is quite powerful, however it is much more suited to the thought processes of computer-scientists than to
linguists’ ones. In case it is intended to include professionals in that area to improve the question reformulation
patterns in a more advanced stage of development, it would be better to use a friendlier syntax.
4.1.2</p>
    </sec>
    <sec id="sec-14">
      <title>N-grams harvesting</title>
      <sec id="sec-14-1">
        <title>There are planned experiences about extracting word N-grams not from the snippets returned by the search engine, but from the actual pages. Other planned experiences are related to the type of web pages to be considered: only European Portuguese pages, pages written in other languages, only news sites…</title>
        <p>4.1.3</p>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>Machine learning techniques</title>
      <p>An interesting experience/refinement that is planned is to use a set of questions associated with their answers as
a training set for the system
The results of the system on the training set questions can be compared with the correct answers. The scorings of
the patterns and/or the word n-grams can then be changed and the system executed again against the training set,
the new results compared with the right answers and the results checked again to understand if the system is
improving.</p>
    </sec>
    <sec id="sec-16">
      <title>Further evaluation of Esfinge</title>
      <sec id="sec-16-1">
        <title>I plan to use a multitude of sources to further evaluate Esfinge:</title>
        <p>·
·
·
·</p>
      </sec>
      <sec id="sec-16-2">
        <title>The questions and answers created by QA@CLEF</title>
      </sec>
      <sec id="sec-16-3">
        <title>A set of real questions and answers found on the web, created by humans, using several distinct methods for collecting them (oráculo)</title>
      </sec>
      <sec id="sec-16-4">
        <title>A set of questions posed by real users (from Esfinge 'slogs)</title>
      </sec>
      <sec id="sec-16-5">
        <title>A set of questions with answers, created and validated by myself</title>
      </sec>
    </sec>
    <sec id="sec-17">
      <title>Acknowledgements</title>
      <p>I thank my colleague Diana Santos (Linguateca / SINTEF) for all the valuable suggestions and for helping me to
write this paper in a much more understandable way than it was written in the preliminary versions. I thank</p>
      <sec id="sec-17-1">
        <title>Nuno Cardoso (Linguateca/XLDB) for the final revision of this paper. I thank Alberto Simões (Linguateca/Universidade do Minho) for the hints on using the Perl Modules “jspell” [11], “Lingua::PT:: Atomizador” [7] and “Lingua::PT::Segmentador” [7]. I also thank the Fundação para a Ciência e Tecnologia for the grant POSI/PLP/43931/2001, co-financed by POSI.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Aires</surname>
            ,
            <given-names>Rachel &amp; Diana</given-names>
          </string-name>
          <string-name>
            <surname>Santos</surname>
          </string-name>
          .
          <article-title>"Measuring the Web in Portuguese"</article-title>
          .
          <source>In Euroweb 2002 conference . Oxford, UK</source>
          ,
          <fpage>17</fpage>
          -18
          <source>December</source>
          <year>2002</year>
          , http://www.linguateca.pt/Diana/download/AiresSantosEuroWeb2002.html
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Banerjee</surname>
            ,
            <given-names>Satanjeev &amp; Ted</given-names>
          </string-name>
          <string-name>
            <surname>Pedersen</surname>
          </string-name>
          . ”The Design, Implementation, and
          <article-title>Use of the {N}gram {S}tatistic {P}ackage”</article-title>
          .
          <source>In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics</source>
          , pp.
          <fpage>370</fpage>
          --
          <lpage>381</lpage>
          ,
          <year>February 2003</year>
          , Mexico City
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Brill</surname>
            ,
            <given-names>Eric.</given-names>
          </string-name>
          <article-title>"Processing Natural Language without Natural Language Processing"</article-title>
          , in
          <string-name>
            <surname>A</surname>
          </string-name>
          . Gelbukh (ed.),
          <source>CICLing</source>
          <year>2003</year>
          , LNCS 2588, Springer-Verlag Berlin Heidelberg,
          <year>2003</year>
          , pp.
          <fpage>360</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Brill</surname>
            , Eric,
            <given-names>Jimmy</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
          </string-name>
          , Michele Banko, Susan Dumais &amp;
          <article-title>Andrew Ng.“Data-Intensive Question Answering”</article-title>
          , In E.M.
          <string-name>
            <surname>Voorhees</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>D.K</surname>
          </string-name>
          . Harman (eds.),
          <source>Information Technology: The Tenth Text Retrieval Conference</source>
          ,
          <string-name>
            <surname>TREC</surname>
          </string-name>
          <year>2001</year>
          . NIST Special Publication 500-
          <issue>250</issue>
          , pp.
          <fpage>393</fpage>
          -
          <lpage>400</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Christ</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schulze</surname>
            ,
            <given-names>B. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Koenig</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>The IMS Corpus Workbench: Corpus Query Processor (CQP): User 'sManual</article-title>
          . University of Stuttgart, March 8,
          <source>1999 (CQP V2.2).</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Google</given-names>
            <surname>Help</surname>
          </string-name>
          <article-title>Central</article-title>
          . http://www.google.com/help/index.html
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>[7] “Lingua::PT:: Atomizador”, “Lingua::PT::Segmentador” , http://linguateca.di.uminho.pt/cvsinfo/modules.html, http://search.cpan.org/dist/Lingua-PT-Atomizador/, http://search.cpan.org/dist/Lingua-PT-Segmentador/</mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Magnini</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Romagnoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vallin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Peñas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Peinado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Verdejo</surname>
          </string-name>
          , M. de Rijke,
          <source>The Multiple Language Question Answering Track at CLEF</source>
          <year>2003</year>
          , in Carol Peters, editor,
          <source>Working Notes for the CLEF 2003 Workshop</source>
          ,
          <fpage>21</fpage>
          -
          <lpage>22</lpage>
          August, Trondheim, Norway,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>QA</given-names>
            <surname>@CLEF-2004 Guidelines</surname>
          </string-name>
          . http://clef-qa.itc.it/2004/guidelines.html
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>Diana &amp; Paulo</given-names>
          </string-name>
          <string-name>
            <surname>Rocha</surname>
          </string-name>
          .
          <article-title>"Evaluating CETEMPúblico, a free resource for Portuguese"</article-title>
          .
          <source>In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics</source>
          , pp.
          <fpage>442</fpage>
          -
          <lpage>449</lpage>
          . Toulouse, 9-
          <issue>11</issue>
          <year>July</year>
          2001
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Simões</surname>
            ,
            <given-names>Alberto</given-names>
          </string-name>
          <string-name>
            <surname>Manuel</surname>
          </string-name>
          &amp;
          <article-title>José João Almeida. "Jspell.pm - um módulo de análise morfológica para uso em Processamento de Linguagem Natural"</article-title>
          .
          <source>In Anabela Gonçalves &amp; Clara Nunes Correia (eds.)</source>
          ,
          <source>Actas do XVII Encontro da Associação Portuguesa de Linguística (APL</source>
          <year>2001</year>
          ), pp.
          <fpage>485</fpage>
          -
          <lpage>495</lpage>
          . Lisboa: APL. Lisboa,
          <volume>2</volume>
          -
          <fpage>4</fpage>
          Outubro 2001
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>