=Paper= {{Paper |id=Vol-1174/CLEF2008wn-QACLEF-IfteneEt2008b |storemode=property |title=Answer Validation on English and Romanian Languages |pdfUrl=https://ceur-ws.org/Vol-1174/CLEF2008wn-QACLEF-IfteneEt2008b.pdf |volume=Vol-1174 |dblpUrl=https://dblp.org/rec/conf/clef/IfteneB08a }} ==Answer Validation on English and Romanian Languages== https://ceur-ws.org/Vol-1174/CLEF2008wn-QACLEF-IfteneEt2008b.pdf
                  Answer Validation on English and Romanian Languages

                                     Adrian Iftene1, Alexandra Balahur-Dobrescu1, 2
                   1
                    UAIC: Faculty of Computer Science, “Alexandru Ioan Cuza” University, Romania
              2
                  University of Alicante, Department of Software and Computing Systems, Alicante, Spain
                                            {adiftene, abalahur}@info.uaic.ro



       Abstract. The present article presents the steps involved in the transformation of the TE system
       that was used in the RTE3 competition in 2007 for the AVE 2008 exercise. We describe the rules
       followed in building the patterns for question transformation, the generation of the corresponding
       hypotheses and finally for answer ranking. We conclude by presenting an overview of the perform-
       ance obtained by this approach and a critical analysis of the errors obtained.



1 Introduction

AVE1 (Answer Validation Exercise) is a task introduced in the QA@CLEF competition, with the aim of promot-
ing the development and evaluation of subsystems validating the correctness of the answers given by QA sys-
tems. Participant systems receive a set of triplets (Question, Answer, and Supporting Text) and they must return a
judgment of SELECTED, VALIDATED or REJECTED for each triplet.
   This year, for our second participation in the AVE competition, we improved the system used last year and,
additionally introduced a question analysis part, which is specific to a question answering system. In this year’s
AVE competition we also participated with a system working in Romanian, using a Textual Entailment (TE)
system working on Romanian. The latter is similar to the TE system working in English with which we partici-
pated in the RTE 3 competition in 2007 (Iftene, Balahur-Dobrescu, 2007b). Due to this reason, the present paper
describes solely the AVE system working in English. The following sections present the new functionalities that
have been added to our English TE system.



2 Textual Entailment System

The main architecture of our Textual Entailment system remains the same (Iftene, Balahur-Dobrescu, 2007a).
The goal of the system is to transform the hypothesis making use of extensive semantic knowledge from re-
sources like DIRT, WordNet, Wikipedia, and database of acronyms. Additionally, we built a system to acquire
the extra Background Knowledge needed and applied complex grammar rules for rephrasing in English. Tools
used are LingPipe2 and MINIPAR3 (Lin, 1998).
   Based on a tree edit distance algorithm (Kouylekov and Magnini, 2005), the main goal of our algorithm is to
map every entity in the dependency tree associated with the hypothesis to an entity in the dependency tree associ-
ated with the text.
   For every mapping, we compute a local fitness value, which indicates the appropriateness between entities.
Based on this local fitness, further on an extended local fitness is computed and, eventually, using all partial
values, the global fitness is summed up. Two rules are also added for the global fitness calculation, namely:
       •    The Semantic Variability Rule – which is a rule regarding the negation of verbs, words that are
            “stressing certainty (preserving it)” regarding the sense of the sentence and, on the other hand, words
            that are “certainty diminishing” the sense of the sentence, negating it;
       •    The Rule for Named Entities - The rule is applied for named entities from the hypothesis which have
            no correspondence in the text. If the word is marked as named entity by LingPipe, we try to use the
            acronyms’ database or obtain information related to it from the background knowledge. In last year’s

1
  http://nlp.uned.es/QA/ave/
2
  http://www.alias-i.com/lingpipe/
3
  http://www.cs.ualberta.ca/~lindek/minipar.htm
            version of the TE system, in the event that even after these operations we cannot map the word from
            the hypothesis to one word from the text, we set the value for the global fitness to 0.
   The main change from this year is regarding the Rule for Named Entities. There are cases where it is possible
for all pairs to have hypotheses with Named Entity problems. With the old rule, the global fitness for all these
pairs is set to 0, and it is thus impossible to select the best value (corresponding to the SELECTED answer).
   In the new rule, we compute the global fitness value for current pair, but we also mark the current pair as hav-
ing a “NE Problem”. Further on, we will see how this marking is used in ordering the answers and in the final
evaluation for the AVE task.
   Changing this rule helps our program in cases such as that of the question with id = “0054”:


                                       Table 1: Question with id = “0054”
               In what date did Mathieu Orfila write his "Traité des poi-
               sons"?


   In which all justification snippets for answers contain the name “Mathieu Orfila”, but don’t contain the exact
article name “Traité des poisons”. From all possible answers, we select as correct the answer “1813”, with the
justification snippet:


                           Table 2: Justification snippet for question with id = “0054”
               Mathieu Orfila is considered to be the modern father of toxi-
               cology, having given the subject its first formal treatment
               in 1813 in Mathieu Orfila Trait des poisons, also called
               Toxicologie generate.




3 Using the TE System in the AVE track

The system architecture for this year is presented below:

                                                                  pattern
                                            Question                              Patterns
                                                                  building                            Hypothesis
                                                                                                        (H)
                      AVE                                                    hypothesis
                                          Answers
                      test data                                              building
                                                                    text
                                       Justification                building          Text (T)
                                           texts




                                         Answer type         Expected Answer
                                            (AT)               type (EAT)                      TE
                                                                                             System
                                                  if (AT = EAT) order
                                                  after global fitness                      order after
                                                                                          global fitness

                                                       AVE                                 AVE
                                                       Results1                            Results2


                   Figure 1: AVE System. The structure is similar for English and Romanian
The steps executed by our system are the following:
 • From the system built for AVE 2007, we keep the following steps:
            We build a pattern with variables for every question according to the question type;
            Using a pattern and all possible answers, we build a set of hypotheses for each of the questions: H1,
             H2, H3 etc.;
            We assign the justification snippet the role of text T and we run the TE system for all obtained pairs:
             (T1, H1), (T2, H2), (T3, H3), etc.
 • Additionally, we perform the next steps:
            Identify for the answers the Answer Type (AT);
            Identify for the questions the Expected Answer Type (EAT).

  Lastly, we submit two results for our system:
       1. In the first one we consider the correct answer for the current question the candidate from the hypothe-
          sis for which we obtain the greatest global fitness;
       2. In the second one, we consider the correct answer for the current question the candidate with AT equal
          with EAT and for which we obtain the greatest global fitness.


3.1 Pattern Building

In order to use the TE system for ranking the possible answers in the AVE task, all these questions are first trans-
formed according to the algorithm presented in (Bar-Haim et al., 2006).
   For question 13 we have:
  Question: What is the occupation of Richard Clayderman?
  Our program generates the following pattern:
  Pattern: The occupation of Richard Clayderman is JOB.
   where JOB is the variable in this case. We generate more specific patterns this year according to the following
answer types: City, Count, Country, Date, Job, Measure, Location, Person, Organization, Year and Other. Next
table presents the identified types of patterns:

                                          Table 3: Examples of Patterns

           Answer type     Cases Number        Question example             Pattern
           City                    3           What is the capital of       The capital of Latvia is
                                               Latvia?                      CITY.
           Count                  14           How many "real tennis"       COUNT "real tennis" courts
                                               courts are there?            are there.
           Date                   10           When was Irish politi-       Irish politician Willie O'Dea
                                               cian Willie O'Dea born?      was born at DATE.
           Job                     4           What is the occupation       The occupation of Jerry Hick-
                                               of Jerry Hickman?            man is JOB.
           Measure                14           What distance is run in a    MEASURE is run in a
                                               "Marathon"?                  "Marathon".
           Location               14           Where does Muriel            Muriel Herkes lives in
                                               Herkes live?                 LOCATION.
           Person                 17           Which composer wrote         PERSON wrote "pacific
                                               "pacific 231"?               231".
         Answer type      Cases Number          Question example            Pattern
         Organization             14            What is the political       ORGANIZATION is the
                                                party of Tony Blair?        political party of Tony Blair.
         Year                     6             In what year did Emer-      Emerson Lake&Palmer was
                                                son Lake&Palmer form?       form in YEAR.
         Other                    21            In Japanese, what is        "bungo" is OTHER.
                                                "bungo"?

   Following the building of the pattern, we proceed to constructing the corresponding hypotheses. A special case
is for DEFINITION questions, when we didn’t build any pattern (in this case only the answer will be the hy-
pothesis).


3.2 Hypothesis building

   Using the pattern building mechanism above and the answers provided within the AVE data, we built the cor-
responding hypotheses. For example, for question 27, we build, according to the answers from the English test
data (“a_str” tags), the following hypotheses:

  H13_1: The occupation of Richard Clayderman is Number.
  H13_2: The occupation of Richard Clayderman is teacher         Qualifications.
  H13_3: The occupation of Richard Clayderman is ways.
  H13_8: The occupation of Richard Clayderman is pianist.
  H13_11: The occupation of Richard Clayderman is artist.
  H13_12: The occupation of Richard Clayderman is Composer.
  H13_13: The occupation of Richard Clayderman is teachers.

   For each of these hypotheses, we consider as having the role of text T the corresponding justification text (con-
tent of the “t_str” tag).


3.3 Global Fitness Calculation

   We consider the pairs built above as input for our Textual Entailment system. After running the TE system, the
global fitness values and the values with marked “NE Problems” for these pairs are the following:

                                              Table 4: TE System output

                                      Pair      Global Fitness     NE Problem
                                      13_1            1.5          Clayderman
                                      13_2           2.35          Clayderman
                                      13_3           2.31          Clayderman
                                      13_8           1.92
                                      13_11          1.82
                                      13_12          1.86
                                      13_13          1.89          Clayderman
3.4 Answers Type and Expected Answer Type Identification

   The aim in performing this step is to eliminate the cases in which there are differences between these values.
For example, in the case of question 13, since the expected answer type is JOB, it is normal to try to identify the
correct answer in the sub-set of answers of type JOB.
   The patterns used in the identification of the expected answer type (EAT) are similar to the patterns used in
3.1. For the identification of the answer type (AT), we use GATE4 for the following types: Job, City, Country,
Location, Person, Organization and we build specific patterns in order to identify the following types: Date,
Measure, and Count. When an answer cannot be classified with GATE or with our patterns, it is considered with
type Other. For question number 13, we have:

                                            Table 5: EAT and AT comparison

                        Pair        EAT               Answer               AT     Match score
                        13_1        JOB               Number             OTHER          0.25
                        13_2        JOB     teacher     Qualifications   OTHER          0.25
                        13_3        JOB                Ways              OTHER          0.25
                        13_8        JOB               Pianist             JOB               1
                        13_11       JOB                Artist             JOB               1
                        13_12       JOB            Composer               JOB               1
                        13_13       JOB               teachers            JOB               1

   On last column is the matching score between EAT and AT. In order to compute this value, we use a set of
rules. The most important rules are:


                                      Table 6: Rules for matching score calculation

                                                   Rule                          Match score
                                                AT = EAT                               1
                             (EAT = “DEFINITION”) and (AT = “OTHER”)                   1
                               EAT and AT are in the same class of entities:
                             {CITY, COUNTRY, REGION, LOCATION} or                     0.5
                               {YEAR, DATE} or {COUNT, MEASURE,
                                            YEAR}
                                (AT = “OTHER”) or (EAT = “OTHER”)                     0.25
                                              OTHERWISE                                0


3.4 Answers classification

   We submit two runs on each of the languages (English and Romanian) according to the use or not of some sys-
tem components. The systems are similar and only the external resources used by the TE system or by GATE are
language-specific.
  First run: is based on TE System output. The answers for which we have NE problems are considered as
REJECTED (for question 13, using table 4, we can deduce that answers 1, 2, 3 and 13 are REJECTED). Answers

4
    http://www.gate.ac.uk/
without NE problems are considered as VALIDATED (answers 8, 11, 12) and the answer with the highest global
fitness is considered as SELECTED (answer 8). If all answers contain NE problems, then all answers are consid-
ered REJECTED, except the answer with highest global fitness, which will be considered SELECTED.
   Second run: in addition to the first run, we add the comparison between EAT and AT. In the cases where we
have NE Problems, the answers are considered as REJECTED as well, and we also take into consideration if the
matching score between EAT and AT is 0 (incompatible types). Of the remaining answers, if the matching score
is not 0, then all answers are VALIDATED. For the identification of the SELECTED answer, we select the an-
swers with the highest matching score (8, 11, 12) and the highest global fitness. In this case, the results are the
same.


3.5 Results

Our AVE systems have the following results:

                                          Table 7: AVE Results in 2008

                                                             English           Romanian
                                                        Run1       Run2       Run1     Run2
                       F measure                           0.17     0.19      0.22         0.23
                       Precision over YES pairs            0.09     0.11      0.12         0.13
                       Recall over YES pairs               0.76     0.85      0.92         0.92
                       QA accuracy                         0.19     0.24      0.17         0.24
                       Estimated QA performance            0.19     0.24      0.17         0.25


                    Table 8: Distribution of our results on answer classes for Run2 in English

                Answers Class in Gold file        Unknown         Validated     Rejected          Total
                Correct                                0               67            398           465
                Incorrect                             36               49            542           627



4 Conclusions

Last year, we showed how the TE system used in the RTE3 competition can successfully be used as part of the
AVE system, resulting in improved ranking between the possible answers, especially in the case of questions with
answers of type Person, Location, Date and Organization. This year, changing some of the rules employed in the
Textual Entailment system and adding the question and answer type classification and matching component, we
showed how we improved, on the one hand, the correct classification of the answers, and on the other hand, the
validation of more answers.
   One of the main problems encountered was the class of UNKNOWN answers types, which our system does
not identify. In order to detect these cases, a three way classification of the answers, such as that proposed in the
RTE 3 pilot task is intended to be used in the future.
   The rule regarding the presence of NEs remains of great importance, identifying the correct cases. However, in
the cases where the NER or NEC is not correctly performed, the system fails. Moreover, this rule is not enough
to identify the entire class of REJECTED answers, as shown in table 8. In order to better identify these situations,
additional rules must still be explored in order to further improve the system.
References

Bar-Haim, R., Dagan, I., Dolan, B., Ferro, L., Giampiccolo, D., Magnini B., Szpektor, I. 2006. The Second
  PASCAL Recognising Textual Entailment Challenge. In Proceedings of the Second PASCAL Challenges
  Workshop on Recognizing Textual Entailment. Venice. Italy.
Kouylekov, M., Magnini, B. 2005. Recognizing Textual Entailment with Tree Edit Distance Algorithms. In Pro-
  ceedings of the First Challenge Workshop Recognising Textual Entailment, Pages 17-20, 25–28 April, 2005,
  Southampton, U.K.
Iftene, A., Balahur-Dobrescu, A. 2007a. Hypothesis Transformation and Semantic Variability Rules Used in
   Recognizing Textual Entailment. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and
   Paraphrasing. Pages 125-130. 28-29 June, Prague, Czech Republic.
Iftene, A., Balahur-Dobrescu, A. 2007b. Improving a QA System for Romanian Using Textual Entailment. In
   Proceedings of RANLP workshop “A Common Natural Language Processing Paradigm For Balkan Lan-
   guages”. Pages 7-14, September 26, Borovets, Bulgaria.
Lin, D. 1998. Dependency-based Evaluation of MINIPAR. In Workshop on the Evaluation of Parsing Systems,
   Granada, Spain, May.