An Approach to the Main Task of QA4MRE-2013

An Approach to the Main Task of QA4MRE-2013 MaríliaSantos Departamento de Informática ECT Universidade de Évora

Portugal

JoséSaias jsaias@uevora.pt Departamento de Informática ECT Universidade de Évora

Portugal

PauloQuaresma Departamento de Informática ECT Universidade de Évora

Portugal

An Approach to the Main Task of QA4MRE-2013 548365061892C2141642FC44220BB0ED GROBID - A machine learning software for extracting information from scholarly documents MRE QA NLP

This article describes the participation of a group from the University of Évora in the CLEF2013 QA4MRE main task. Our system has a superficial text analysis based approach. The methodology starts with the preprocessing of background collection documents, whose texts are lemmatized and then indexed. Named entities and numerical expressions are sought in questions and their candidate answers. Then the lemmatizer is applied and stop words are removed. Answer patterns are formed for each question+answer pair, with a search query for document retrieval. Original search terms are expanded with synonyms and hyperonyms. Finally, the texts retrieved for each candidate response are segmented and scored for answer selection. Considering only the main questions, the system best result was obtained in the third run, having answered to 206 questions, with 0.24 c@1 and 51 correct answers. When evaluating main and auxiliary questions, the final run continued to have our better results, being answered 245 questions, with 64 right answers and 0.26 for c@1. The use of hypernyms proved to be an improvement factor in the third run, which results had a 12% increase of correct answers and a 0.02 gain in c@1.

Introduction

This article describes the participation of a group from the University of Évora in the Question Answering for Machine Reading Evaluation (QA4MRE) challenge of the 2013 edition of Cross Language Evaluation Forum (CLEF) 1 . Although some authors of this paper have previous work in other QA4MRE editions [4,5], this work is based on a new system for the QA4MRE Main Task, associated with the first author's master's thesis work, and focused on the English language. The objective of this task is the automatic understanding of one or more texts, and the subsequent identification of the answer for several questions about information that is stated or implied in those texts. While answering the questions, systems must process single documents, and Background Collections (BC) with documents that can be used as auxiliary information sources [2]. This year's QA4MRE Main Task was composed by 4 topics, namely "Aids", "Climate Change", "Music and Society" and "Alzheimer", and all of them having a background collection of documents. Each topic had 4 reading tests with 15 to 20 questions each, and each question had 5 choice answers [1]. The test was composed by 240 main questions and 44 auxiliary questions. The latter are duplicates of the main questions, but without the previously required inference, allowing to test the ability of systems to use inference and its impact in the question treatment. Next section presents our system arquitecture. Section 3 describes the methodology we used to process the questions, answers and the background information. The evaluation of the obtained results is detailed in section 4, while the last two sections are devoted to an analysis of those results, some conclusions and a balance of our participation.

Architecture

The system architecture is shown in Figure 1 and has the following components:

• XML Parser -Extracts texts, questions and answers from the input and stores them on the system;

• Indexing Component -Documents from BC pass through the lemmatizer (Candc tools/ C&C Boxer2 ) and then they are indexed with Lucene3 ;

• Consult Index Component -Responsible for processing question and answers and perform document retrieval. With keywords from question and answers, this component uses Lucene to search for relevant documents in BC. The analysis and search query creation is based on:

-Lemmatizer -Question and answers's words are parsed to the corresponding lemma form; -Named Entity Recognition (NER) -Through regular expression, the system tries identify entity names or mentions; -WordNet module from Natural Language Toolkit4 : the system uses synonyms, derivationally related forms and hypernyms; -Numerical expressions -Through regular expression, the system tries identify numerical expressions; -Remove stop words.

• Filter Component -Responsible for select relevant text segments, assigning a score to each segment and to each candidate answer. This component applies a set of criteria to choose the most plausible answer.

Methodology

The system is based on a simple approach without a deep linguistic processing. In this edition of QA4MRE, our system generated 3 runs, having minor differences in configuration, as explained below. The processing performed on the BC texts, the reading tests and questions, comprises the following steps: -If the filter returns no relevant documents, then the system selects the answer "5 -None of the above"; -The system returns Unanswer when there is more than one maximum, or in cases where there is a small difference between the maximum and another answer's score; -If none of above applies, the system returns the answer with maximum score.

The difference between the runs is reflected in the number of answers given, and in system's accuracy. This difference can be observed in the following examples:

Example 1: How can Alzheimer's patients regain the sense of smell? Unanswered in the first and the second run; Answered correctly in the third run.

Example 2: How can apolipoprotein E help people with Alzheimer's? Answered wrongly in the first and the second run; Answered correctly in the third run.

Example 3: What is U.S. AIDS policy dominated by? Unanswered in the second run; Answered correctly in the first and the third run.

Examples 1 and 2 are cases where the use of hypernyms causes a small improvement on Component Filter. Example 3 shows the importance of applying the methodology step 4.c when the information is not dispersed.

Results

In QA4MRE, the evaluation of all runs submitted is based on the c@1 measure, discussed in [3]:

c@1 = 1 n (n R + n U n R n ) (1)

Equation ( 1): n R -number of correctly answered questions; n U -number of unanswered questions; n -total number of questions.

Evaluation on the main questions

In the first approach the system answered to 188 of 240 questions, of which only 45 were correct, resulting in 0.23 c@1. In the second run, 185 questions were answered, with 0.18 c@1. And in the last run we answered to 206 questions, with 0.24 c@1 and 51 correct answers. Table 1 shows the detail of the system result assessment, by topic and by run.

Evaluation on all questions

For the first run, the system answered to 224 out of 284 questions. From those, 57 were correctly answered, and the c@1 was 0.24. In the second, 219 questions were answared. The c@1 was 0.19. In the final run, our system answered to 245 questions, finding 64 right answers and obtaining 0.26 for c@1. Table 2 shows these results with greater detail.

Discussion

One of the main causes of this system failure is the lack of an entities disambiguation module, because entities are, quite often, referred by different expressions.

Other identified causes are: 1. Yes/no questions; 2. Answers supported by adverbs of frequency (rarely, always, never, sometimes, ...); 3. Words with high frequency have a negative impact in our system due to way the scoring algorithm works. This is specially noticed when it causes the selection of non relevant documents and incorrect answers and, in this way, it invalidates the possibility of answering "5 -None of the above". These failures were observed essencially for the Aids topic.

We have also observed that using a second analysis in the Filter Component (step 4.c in the methodology section) is only effective when the information about the correct answer is not disperse over several documents. However, the use of this approach allowed the improvement of 5-8% relatively to the base option (run 2), with the exception of the topic "Music and Society", where there was no impact. The use of hyperonyms didn't cause any improvement in the Aids topic but in the "Alzheimer" and "Climate Change" topics it allowed an improvement of 10% relatively to the base option and in the "Music and Society" topic an improvement of 5%.

Conclusion

We described the experience in QA4MRE challenge, using a simple system, with a superficial text analysis based approach. This system clearly needs further developments, aiming to improve the analysis of the questions and answers. Namely, we intend to work on the disambiguation of entities, establishment of relations between acronyms and entities, and trying to handle the failure causes described in the previous sections. One of the critical aspects is to change the way our system evaluates answer patterns composed by words with high frequency; we need to add a new component to improve the answer selection process and, namely, to take into account the question and answer types. We have also detected that the incorporation of an anaphora resolution module would allow the system to answer more questions and to improve its performance.

On a more abstract level, we intend to assess the strengths of the system used by Évora's team last year and combine strategies with some new ideas tested in this year's work.

Fig. 1 .1Fig. 1. System Architecture

If the answer has a numerical expression which does not exist on the document, it is discarted;-If the answer or the question has entities and if the document does not contain 30% of them, it is discarted; (b) When a document is valid:-Each Answer Pattern that validates the current document receives a score with the sum of:• Number of entities in the text;• Number of numerical expressions in the text;• Number of times that each keyword, from current Answer Pattern, occurs in the text; -The document score is the sum of each of its Answer Patterns score; (c) Thereafter, a second analysis is performed, only on the top 5 resulting documents from the filter; (This step is used only in the first and in the(b) Question and candidate answers pass through the lemmatizer; ((clinic OR clinical) OR (test OR trial OR attempt)) ORHow can Alzheimer's patient regain the sense of smell? ((care OR treatment OR treat) OR (bexarotene)) OR1 through chemotherapy ((lie) OR (sun))2 through clinical trial3 through treatment with bexarotene 4. Filter Component -For each question:4 by lie in the sun (a) Each document is validated for each Answer Pattern:5 None of the above -If it doesn't contain 50% keywords from question and 50% keywordsfrom answer, it is discarted;(c) Stop words are removed from question and candidate answers; Alzheimer's patient regain sense smell 1 chemotherapy 2 clinical trial 3 treatment bexarotene 4 lie sun 5 none -Alzheimer's synonyms: Alzheimer's disease | hypernyms: dementia -third runs) -patient -Documents are split into text segments; hypernyms: case -Current Answer Pattern's score is incremented if 80% of Answer -regain Pattern's words are present in the current text segment and the synonyms: recover | related forms: recoverer | hypernyms: get distance between them is less or equal to 5; -sense hypernyms: awareness (d) Answer Selection:-smellhypernyms: sensation-chemotherapyrelated forms: chemotherapeutical | hypernyms: therapy-clinicalrelated forms: clinic-trialsynonyms: test | hypernyms: attempt-treatmentrelated forms: treat | hypernyms: care1 through chemotherapy2 through clinical trials (e) Document retrieval, using Lucene to get relevant documents, using the3 through treatment with bexarotene generated Answer Patterns to querying over the indexed BC;4 by lying in the sun Query:5 None of the above ((Alzheimer's OR dementia OR Alzheimer's_disease) OR (caseOR patient) OR (regain OR recoverer OR recover OR get) OREntities: Alzheimer's patients (awareness OR sense) OR (smell OR sensation)) OR((chemotherapy OR chemotherapeutical OR therapy)) OR

1. Indexing Component (this component is used only once) (a) Lemmatization is applied to the text of all documents in BC; (b) BC documents are indexed, considering the lemmatizer outcome; 2. XML Parser (a) The information from the input is extracted and stored on the system; 3. Consult Index Component -Each question is processed with the following steps and as illustrated in the examples: (a) Entities and numerical expressions from question and candidate answers are stored in the system. The filter uses them to score answers and text segments; How can Alzheimer's patients regain the sense of smell? (d) For each pair (question, candidate answer) try to form an Answer Pattern. The Answer Pattern is compound by: keywords from question; keywords from answer; synonyms, derivationally related forms and hypernyms (used only on the third run) from each keyword;

Table 1 .1Results of the main questionsAnswered

Table 2 .2Results of the main + auxiliary questionsAnswered

http://clef2013.org/ http://svn.ask.it.usyd.edu.au/trac/candc/wiki/boxer ApacheLucene is an open source information retrieval software library. http://lucene.apache.org/ http://nltk.org

<author> <persName><surname>Qa4mre</surname></persName> </author> <ptr target="http://celct.fbk.eu/QA4MRE" /> <imprint> <date type="published" when="2013">2013</date> </imprint> </monogr> </biblStruct> <biblStruct xml:id="b1"> <monogr> <ptr target="http://celct.fbk.eu/QA4MRE" /> <title level="m">Track Guidelines QA4MRE@CLEF2013 A simple measure to assess non-response APeñas ARodrigo Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies -Volume 1 the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies -Volume 1 2011 HLT '11, Association for Computational Linguistics The di@ue's participation in qa4mre: from qa to multiple choice challenge JSaias PQuaresma CLEF 2011 Labs and Workshop: Notebook Papers VPetras PForner PDClough

Amsterdam, The Netherlands

2011 Di@ue in clef2012: question answering approach to the multiple choice qa4mre challenge JSaias PQuaresma Proceedings of CLEF 2012 Evaluation Labs and Workshop -Working Notes Papers CLEF 2012 Evaluation Labs and Workshop -Working Notes Papers

Rome, Italy

September 2012