=Paper=
{{Paper
|id=Vol-1178/CLEF2012wn-QA4MRE-IfteneEt2012
|storemode=property
|title=Enhancing a Question Answering System with Textual Entailment for Machine Reading Evaluation
|pdfUrl=https://ceur-ws.org/Vol-1178/CLEF2012wn-QA4MRE-IfteneEt2012.pdf
|volume=Vol-1178
}}
==Enhancing a Question Answering System with Textual Entailment for Machine Reading Evaluation==
<pdf width="1500px">https://ceur-ws.org/Vol-1178/CLEF2012wn-QA4MRE-IfteneEt2012.pdf</pdf>
<pre>
   Enhancing a Question Answering system with Textual
       Entailment for Machine Reading Evaluation

    Adrian Iftene1, Alexandru-Lucian Gînscă1, Alex Moruz1, 2, Diana Trandabăț1, 2,
                          Maria Moruz3, Emanuela Boroș1
     1
       UAIC: Faculty of Computer Science, “Alexandru Ioan Cuza” University, Romania
                2
                  Institute of Computer Science, Romanian Academy Iasi Branch
          3
            Center of Biblical-Philological Studies Monumenta linguae Dacoromanorum,
                            “Alexandru Ioan Cuza” University, Romania
     {adiftene, lucian.ginsca, amoruz, dtrandabat, mhusarciuc, emanuela.boros}@info.uaic.ro


         Abstract. This paper describes UAIC1’s Question Answering for Machine
         Reading Evaluation systems participating in the QA4MRE 2012 evaluation
         task. We submitted two types of runs, first type of runs based on our system
         from 2011 edition of QA4MRE, and second type of runs based on Textual
         Entailment system. For second types of runs, we construct the Text and the
         Hypothesis, asked by Textual Entailment system from initial test data (the
         <documents> tag was used to build the Text and the <question> and <answer>
         tags were used to build the Hypothesis). The results offered by organizer
         showed that second type of runs were better than first type of runs for English.
         Keywords: Question Answering for Machine Reading Evaluation, Information
         Retrieval, Textual Entailment


1 Introduction

As in the 2011 campaign, the Question Answering for Machine Reading Evaluation
(QA4MRE2) task in 2012 intends to cross-evaluate the ability of systems to read and
understand texts. The task focuses on reading a document and identifying the correct
answer from a set of five multiple choice answers, using inferences and previously
acquired background knowledge. The test data and background knowledge are related
to four topics: AIDS, Climate Change, Music and Society (the same topics adopted
last year [1]) and a new one i.e. Alzheimer [2]. An important note is that, for all
involved languages (English, Spanish, German, Italian and Romanian), the test data
was the same (parallel translations) and the background knowledge was available to
all participants.
    For UAIC’s participation in the QA4MRE task in 2012, we used as base the
system built for the 2011 QA4MRE edition [3], which was, at its turn, an updated
version of our previous systems from the 2009 and 2010 QA@CLEF editions [4], [5].


1 University “Al. I. Cuza” of Iasi, Romania
2 QA4MRE: http://celct.fbk.eu/QA4MRE/index.php
The base system was further improved by adapting a Textual Entailment component
for the Question Answering module, similar to the approach in [6].
   The rest of the paper is structured as follows: Section 2 details the general
architecture of our Question Answering system for Machine Reading Evaluation and
the new textual entailment module, Section 3 presents the results and an error
analysis, while the last Section discusses the conclusions.


2 System components

In QA4MRE 2012, UAIC submitted two types of runs for Romanian and English. For
the first type of runs, we use the system from the previous edition of QA4MRE 2011
[3], consisting in modules specialized for test data processing, background knowledge
indexing, snippet extraction and identification of the correct answer. For the second
type of runs, we use the Romanian and the English textual entailment systems [7, 8],
similar to the approach detailed in [9]. The English system is similar to the Romanian
system, with the difference that a part of the modules presented in subsections 2.1
were only partially used.


2.1 The base architecture

In 2012, the Romanian background knowledge consisted of a collection of 184,263
documents in text format (28,826 correspond to the AIDS topic, 57,160 to Climate
Change topic, 88,687 to Music and Society topic and 9,590 to Alzheimer topic). The
test data consists in an XML file with 16 test documents (4 documents for each of the
four topics), 10 questions for each document (160 questions in total) and 5 possible
answers for each question (800 possible answers in total).
   The base architecture is similar to the system used for the 2011 edition of the
QA4MRE competition, presented in [3]. Thus, after indexing the background
collection using Lucene3 libraries, the system processes the test data applying 3
operations: (a) extracting documents from the background knowledge, (b) analyzing
the test questions and (c) processing possible answers. If the first step is performed
using Lucene indexing of the background collection, for analyzing the question we
used our question processing module [1] and the web services available both for
Romanian and English from the Sentimatrix 4 project [10] to eliminate stop words,
perform lemmatization and identify the Named Entity in the question. Then, a Lucene
query is build. For instance, in the case of the question with q_id = “8”:

  Ro:   Care   dintre   următoarele    nu   este   o   cauză    a
vulnerabilității femeilor căsătorite față de infecțiile cu HIV?
  En: Which of the following is not a cause of HIV infection
for married women?


3 Lucene: http://lucene.apache.org/
4 Sentimatrix: http://www.sentimatrix.eu/
the execution of the above steps has the following results:
   - in the first step, the following stop words are eliminated: care, dintre,
       următoarele, o, a, de, cu (En: which, of, following, a, for);
  -    in the next step, lemmas for the words cauză, vulnerabilității,
       femeilor, infecțiile (En: cause,         vulnerability,  women,
       infections) are identified;
  -    in the third step, HIV is identified as a Named Entity;
  -    in the last step, the Lucene query is build: “nu (cauză^2 cauza)
       (vulnerabilității^2 vulnerabilitate) (femeilor^2 femeie)
       (căsătorite^2 căsătorit) (infecțiile^2 infecție) HIV^3”.

   From the above Lucene query, one can notice that we consider named entities to be
of most relevance (hence receiving a boost of 3, expressed as using the ^ operator),
while the inflected form of the words existing in the question receive a lower boost
value (2 in the example above).
   Another module analyzes the possible answers types and features, using the
ontology presented in [11], more specifically the relations between regions and cities
and the relations between cities and countries, in order to eliminate the answers with
low probability to be the required answer. For instance, for the question:

  Ro: În ce stat american oamenii de știință universitari au
calculat că pentru combaterea SIDA în Africa fiecare american
trebuie să plătească un cost de 5 dolari anual?
  En: In what American state did university scientists
calculate the cost to each American of spending 5 dollars
annually to combat AIDS in Africa?,

we eliminate from the list of possible answers the answers with non-American states.
   As presented in [3], the index of background knowledge is queried, and all
retrieved documents are placed in separate indexes. The results of this step are 160
separate indexes for every question from the initial test data. Then every index is
searched for every answer, and a list of documents with Lucene relevance scores are
returned, where Score(d, a) is the relevance score for document d when we search
with the Lucene query associated to the answer a. Finally, a normalized value is
computed for all answers associated to a question, and the answer with the highest
value is selected as the most probable answer.


2.2 Enhancing the base architecture with Textual Entailment

The architecture of the components that used the Textual Entailment (TE) system is
presented in Figure 1, being based on the system used in AVE exercises in 2007 and
in 2008 [9] and being similar to the architecture of one of the best systems from the
QA4MRE 2011 edition [13].
   The steps executed by our system are as follows:
        We build a pattern with variables for every question according to the
            question type;
          Using a pattern and all possible answers, we build a set with 5 hypotheses
           for each of the questions: H1, H2, H3, H4, H5;
          We assign to the document tag from the initial XML file the role of text T
           and we run the TE system for all obtained pairs: (T, H 1), (T, H2), (T, H3),
           (T, H4), (T, H5).

   Finally, we consider the candidate from the hypothesis for which we obtain the
greatest global fitness to be the correct answer for a current question.

                          Documents

              Test        Questions
                                                                                     Text
              Data                    Hypothesis building                            (T)
                                                            Hypothesis
                                      with patterns
                          Answers                              (H)


                                                                       TE
                                                                     System


                                                                      Partial and
                                                                     global scores
                                                                     per answers


     Figure 1: The second architecture based on Textual Entailment (TE) system

Pattern building
In order to use the TE system for ranking the possible answers in the QA4MRE task,
all these questions are first transformed according to the algorithm presented in [14].
    For example, for the following question we have:
  Question: What is the goal of the ABC strategy?
  Our program generates the following pattern:
  Pattern: ANSWER is the goal of the ABC strategy.
where ANSWER is the variable in this case. We generate specific patterns according to
the following answer types: Measure (How many, How much), Person (Who,
Name), Location (In what), Date (On what date, When) and Other. Following the
building of the pattern, we proceed to constructing the corresponding hypotheses.

Hypothesis building
Using the pattern building mechanism above and the answers provided within the
QA4MRE input XML data, we built the corresponding hypotheses. For example, for
the question above, we built, according to the answers from the English test data, the
following hypotheses:
  H1: promoting women's social and economic rights is the goal
of the ABC strategy.
  H2: combating women's inequalities is the goal of the ABC
strategy.
  H3: promoting abstinence, faithfulness, and use of condoms is
the goal of the ABC strategy.
  H4: ignoring poverty, social inequality and traditional ways
is the goal of the ABC strategy.
  H5: promoting the prevention and treatment programs is the
goal of the ABC strategy.

  For each of these hypotheses, we consider that the corresponding text from the
“document” tag as having the role of text T.

Answers classification
We consider the pairs built above as input for our Textual Entailment system. After
running the TE system, the global fitness values for the exemplified pairs are the
following:
   GlobalFitness(H1, T) = 2.1854732
   GlobalFitness(H2, T) = 1.3577608
   GlobalFitness(H3, T) = 1.92097
   GlobalFitness(H4, T) = 2.2404695
   GlobalFitness(H5, T) = 2.2766914
   Since in the considered case the highest value is obtained for the answer 5
“promoting the prevention and treatment programs”, we consider it as
the most probable answer. The NOA answers were considered the pairs for which we
have the maximum value for GlobalFitness very close to 0.


3 Results and Evaluation

For the QA4MRE 2012 task, our team submitted 10 runs, out of which 5 were for the
Romanian-Romanian language pair and 5 for the English-English pair.
    The evaluation of the results is done from two different perspectives in a similar
manner as in the 2011 QA4MRE edition. The first one is equivalent to a traditional
evaluation in which all the answers are gathered in a single set which is then
compared to a gold standard, not taking into account the document associated with a
particular answer. On the other hand, the reading perspective offers insight on how
well the system “understands” a particular document. At first, the C@1 measures of
each test comprising of 10 questions per document are taken into consideration. These
results are then used to obtain statistical measures, such as the mean, average and
standard deviation over values grouped by topic or as an overall view.
    In the following 4 tables, we detail the result obtained by each of the 5 different
configurations for Romanian and other 5 configurations for English. In each case, the
first two configurations (C1 and C2) refer to the first architecture design. The
difference between C1 and C2 represents the difference in choosing the threshold for
providing the “NOA” response. Our intent was to evaluate the impact of a more
permissive configuration, which gives less “NOA” answers versus a more restrictive
one. The last three configurations represent runs in which the architecture involving
Textual Entailment system was used. The difference between these three
configurations resides, as in the case of the first two, in the different choice of
threshold for the “NOA” answers. We tested a permissive, a moderate and a
restrictive threshold.


3.1 Evaluation at the question answering level

In Table 1, we present the results for the 5 runs on Romanian and in Table 2, the same
results are provided for the 5 runs on English.

            Table 1: Results of UAIC’s Ro-Ro runs at question answering level
                                       C1       C2       C3      C4       C5
             Answered right            34       38       34      33       21
             Answered wrong           114      111      113     104       67
             Total answered           148      149      147     137       88
            Unanswered right            3        3       4        0       12
            Unanswered wrong            9        8       9       23       60
            Total unanswered           12       11       13      23       72
            Overall accuracy          0.21     0.24     0.21    0.21     0.13
              C@1 measure             0.23     0.25     0.23    0.24     0.19

   As can be seen in Table 1, the best result of our system in terms of C@1 measure is
obtained for the run in which the first type of architecture was used together with a
slightly more permissive threshold for the unanswered questions. Contrary to this, for
English, two out of the three query reformulation runs outperform the best result of
the first two configurations. This shift can be explained by the increased effectiveness
of the patterns applied for query rewriting when working on the English language.

            Table 2: Results of UAIC’s En-En runs at question answering level
                                       C1       C2       C3      C4       C5
             Answered right            34       23       34      37       25
             Answered wrong            96       65       78     104       62
             Total answered           130       88      112     141       87
            Unanswered right            7       16       6        3       15
            Unanswered wrong           23       54       42      16       58
            Total unanswered           30       72       48      19       73
            Overall accuracy          0.21     0.14     0.21    0.23     0.16
              C@1 measure             0.25     0.21     0.28    0.26     0.23

   We can observe the influence of the correctly unanswered questions in the C@1
measure when comparing the number of right answers for the best run for Romanian,
with the best from the English runs. Although in the Ro-Ro run, a higher number of
questions were correctly answered (38 right answers) than in the En-En run (34 right
answers), the C@1 measure obtained for the English run (0.28) is higher than the one
given by the best Romanian run (0.25). This is explained by the difference in the
number of correctly unanswered questions.
   A common denominator between the results for Romanian and those for English is
that a balanced threshold provided the best results. This is best observed when
comparing the last three configurations both for English and for Romanian. For
example, in Table 1, the C4 configuration in which there were 24 unanswered
questions outperformed the C3 (13 unanswered questions) and C5 (72 unanswered
questions). The same pattern is found in Table 2, for the En-En runs.


3.2 Evaluation at the reading test level

In Table 3, we present the median and mean for each of the 4 topics, Topic1 (AIDS),
Topic2 (Climate Change), Topic3 (Music and society) and Topic4 (Alzheimer) and
their overall values for the Ro-Ro runs. In Table 4, the same results are provided for
the En-En runs.

                Table 3: Results of UAIC’s Ro-Ro runs at reading test level
                                        C1       C2       C3       C4      C5
             Topic 1 median            0.16     0.26     0.26     0.24    0.18
             Topic 2 median            0.31     0.31     0.31     0.20    0.21
             Topic 3 median            0.20     0.20     0.15     0.18    0.07
             Topic 4 average           0.29     0.29     0.21     0.28    0.21
             Overall median            0.24     0.29     0.26     0.23    0.16
             Topic 1 average           0.18     0.28     0.26     0.25    0.16
             Topic 2 average           0.27     0.27     0.27     0.25    0.26
             Topic 3 average           0.20     0.20     0.18     0.18    0.07
             Topic 4 average           0.26     0.26     0.19     0.26    0.20
             Overall average           0.23     0.25     0.22     0.23    0.17


                Table 4: Results of UAIC’s En-En runs at reading test level
                                        C1       C2       C3       C4      C5
             Topic 1 median            0.25     0.30     0.31     0.30    0.33
             Topic 2 median            0.21     0.23     0.25     0.21    0.22
             Topic 3 median            0.20     0.14     0.28     0.23    0.17
             Topic 4 average           0.20     0.08     0.22     0.18    0.16
             Overall median            0.21     0.16     0.29     0.26    0.22
             Topic 1 average           0.28     0.29     0.34     0.34    0.32
             Topic 2 average           0.22     0.22     0.24     0.24    0.20
             Topic 3 average           0.23     0.17     0.25     0.24    0.18
             Topic 4 average           0.22     0.08     0.22     0.19    0.16
             Overall average           0.23     0.19     0.26     0.25    0.21
   These results in term of average and median are consistent with the trend
introduced in Table 1 and Table 2. The best overall mean was obtained for the third
configuration on English and the second one, on Romanian.


3.3 Error analysis

In extension to the analysis carried out above, we have also performed an error
analysis over the reported results. The analysis was carried out exclusively over the
questions in topic 2 (the topic was arbitrarily chosen), and a report of the most
relevant error sources is given below. In interpreting the analysis results, two
important factors need to be taken into account:
        Firstly, the submitted runs are grouped according to the basic philosophy
            regarding query generation. In the first case (the first three submitted
            runs), queries were generated on the basis of the question alone, and then
            the potential answer was searched in the top scoring results. In the second
            case (the last two submitted runs), we generated 5 queries for each
            question, one for each potential answer (the potential answer was included
            in the query). The textual entailment system was also used in the second
            case.
        Secondly, the various runs for each case were obtained by tweaking the
            threshold at which the system decided to provide an answer.
   One of the first types of error we have encountered is the fact that the second type
of runs has different queries than the first type. This is the case of the first question in
the second topic, reading 5:

   Ro: Care dintre următorii este un biocombustibil?
   En: Which of the following is a biofuel?

   The first three runs provide the correct answers (using the method described
above), while the last two run have wrong answers. This is due to the fact that the
query (următorii^2 următor) biocombustibil etanol provides lower
scoring snippets than the query (următorii^2 următor) biocombustibil
carbon (the correct answer is “ethanol”, but the provided answer is “carbon”). Upon
examining the snippets returned by the five queries extracted for this particular
question, we have discovered that the correct answer scored fourth overall, which
practically excludes the correct answer from consideration. This type of error was also
encountered for questions 4, 5 and 6, reading 5 in the 2 nd topic.
   A more subtle type of error is the one which generated an incorrect answer for the
question 2, reading 5, topic 2. In this case, regardless of the manner of creating the
query, the chance of obtaining the correct answer is low because of the nature of the
base text. The fault comes from the answer extraction module, which is unable to
solve coreference, and therefore cannot extract the correct answer

  Ro: ... combustibilul lichid este atât de valoros. Până în
prezent, este câştigătorul evident atunci când avem nevoie de
energie pentru transport - în special transport aerian şi
transport maritim greu pe distanţe mari - deoarece ne permite să
înghesuim o grămadă de energie într-un spaţiu de stocare relativ
mic şi să realimentăm cu ușurință...

   Some errors are caused by the query generation module, such as the case of
question 3, reading test 5, topic 2. In this instance, none of the five submitted runs
provided the correct answer, mainly because most of the query words are not found in
the vicinity of the correct answer. The query generated by the question analysis
module is:

  Ro: (poate^2 putea) (mărită^2 mări) (cantitatea^2 cantitate)
(culturi^2 cultură) (cultivate^2 cultivat) simultan bucată
pământ

to which the system then adds the query:

  Ro: (folosind^2 folosi) (culturi^2 cultură) (anuale^2 anual)
(succesive^2 succesiv)

for the expected answer. The text span which contains the correct answer is:

  Ro: A doua premiză        greșită din scenariile cele mai
nefavorabile este aceea că de pe aceeași suprafață de teren nu
se pot obține mai multe recolte. Amestecurile perene pentru
biocombustibili celulozici ar putea fi, de fapt, cultivate
alături de culturile anuale sau, pe acelaşi teren, între
recoltare şi însămânțare...

and we can see that some of the keywords of the query are not found within it. The
only way in which the system could solve this is by using synonyms for the keywords
(in this case, culturi in the question and recolte in the answer). The fact that most
of the keywords in the query are not found in the supporting text is also highlighted
by the fact that the system did not provide an answer for this question in run 5,
because of the low score of all the retrieved snippets. The same type of error was
observed in the case of question 9, reading 5, topic 2 and question 6, reading 6, topic
2. An extreme case of this error can be seen in question 1, reading 6, topic 2, where
none of the runs gave any answer, although the extracted answer was correct, because
of the los score of the supporting snippets.
    In some cases, errors arise from the addition of the second query in the case of the
first three runs. This can be seen for the answer generated for question 7, reading 5,
topic 2, where the initial query provides high scoring snippets which contain the
correct solution:

  Ro: O parte din problemă provine din apetitul deosebit al
porumbului pentru stimulente, cum ar fi îngrăşămintele.

but these snippets are then penalized because of the second query (in this particular
case, the secondary query (absoarbe^2 absorbi) (cantități^2 cantitate)
(reduse^2 reduce) (gaze^2 gaz) efect seră introduces a far score for a
different snippet, because of the high number of keywords compared to the correct
solution query, nevoie (cantități^2 cantitate) (mari^2 mare)
fertilizatori^2). This type of issue could be corrected to some extent by the use
of synonymy, which would increase the score of the correct snippet. This issue can
also be seen in the case of question 8, reading 5, topic 2 and questions 2 and 4,
reading 6, topic 2.
   A type of error that stems from the lack of sufficient background knowledge can be
found for question 10, reading 5, topic 2:

  Ro: Care este biocombustibilul a cărui producție reduce cel
mai mult emisiile de gaze cu efect de seră?
  En: Which is the biofuel which reduces greenhouse gas
emissions most?)

  The correct answer for this question, etanol celulozic (En: cellulose
ethanol) cannot be found as such in the text, although it is referred in another form:

  Ro: o versiune celulozică de etanol
  En: a cellulosic version of ethanol

as can be seen in the snippet containing the correct answer:

  Ro: Există o mulţime de moduri diferite de a face
biocombustibili celulozici, inclusiv o versiune celulozică de
etanol, şi ei reduc emisiile cu un procent enorm, între 82% și
87%.

  This type of problem can only be corrected by the appropriate background
knowledge.


4 Conclusions

This paper presents the updated Question Answering system developed by UAIC for
the Machine Reading Evaluation task within CLEF 2012 labs. The presented systems
were built starting from the main components of our QA systems (the question
processing and information retrieval modules), but the multiple choice questions were
addressed using a textual entailment component.
   The evaluation shows a best overall median for all 4 topics of 0.29 for both the
Romanian and English monolingual tasks. We can observe the influence of the
correctly unanswered questions in the C@1 measure when comparing the number of
right answers for the best run for Romanian, with the best from the English runs.
Although in the Ro-Ro run, a higher number of questions were correctly answered (38
right answers) than in the En-En run (34 right answers), the C@1 measure obtained
for the English run (0.28) is higher than the one given by the best Romanian run
(0.25). This is explained by the difference in the number of correctly unanswered
questions.
Acknowledgement. The research presented in this paper was funded by the Sector
Operational Program for Human Resources Development through the project
“Development of the innovation capacity and increasing of the research impact
through post-doctoral programs” POSDRU/89/1.5/S/49944.


References

1. Peñas, A., Rodrigo, A.: A Simple Measure to Assess Non-response. In Proceedings of 49th
    Annual Meeting of the Association for Computational Linguistics - Human Language
    Technologies (ACL-HLT 2011), Portland, Oregon, USA, June 19-24. (2011)
2. Peñas, A., Hovy, E., Forner, P., Rodrigo, Á., Sutcliffe, R., Sporleder, C., Forascu, C.,
    Benajiba, Y., Osenova, P.: Overview of QA4MRE at CLEF 2012: Question Answering for
    Machine Reading Evaluation. CLEF 2012 Evaluation Labs and Workshop Working Notes
    Papers, 17-20 September, 2012, Rome, Italy. (2012)
3. Iftene, A., Gînscă, A. L., Moruz, A., Trandabăţ, D., Husarciuc, M.: Question Answering for
    Machine Reading Evaluation on Romanian and English Languages. Notebook Paper for the
    CLEF 2011 LABs Workshop, 19-22 September, Amsterdam, Netherlands. (2011)
4. Iftene, A., Trandabăţ, D., Moruz, A., Pistol, I., Husarciuc, M., Cristea, D.: Question
    Answering on English and Romanian Languages. In C. Peters et al. (Eds.): CLEF 2009,
    LNCS 6241, Part I (Multilingual Information Access Evaluation Vol. I Text Retrieval
    Experiments). Pp. 229-236. Springer, Heidelberg. (2010)
5. Iftene, A., Trandabăţ, D., Moruz, A., Husarciuc, M.: Question Answering on Romanian,
    English and French Languages. Notebook Paper for the CLEF 2010 LABs Workshop, 22-
    23 September, Padua, Italy. (2010)
6. Iftene, A., Balahur-Dobrescu, A.: Improving a QA System for Romanian Using Textual
    Entailment. In Proceedings of RANLP workshop "A Common Natural Language
    Processing Paradigm For Balkan Languages". Pp. 7-14, September 26, 2007, Borovets,
    Bulgaria. (2007)
7. Iftene, A.: Textual Entailment (Ph.D. Thesis) Technical Report. "Al. I. Cuza" University.
    ISSN 1224-9327. 169 pages. October, 2009. Iasi, Romania. (2009)
8. Iftene, A., Balahur-Dobrescu, A.: Textual Entailment on Romanian. The third Workshop on
    Romanian Linguistic Resources and Tools for Romanian Language Processing. ISSN 1843-
    911X. Pp. 109-118, 14-15 December. Iasi, Romania. (2007)
9. Iftene, A., Balahur, A.: Answer Validation on English and Romanian Languages. In
    Evaluating Systems for Multilingual and Multimodal Information Access, 9th Workshop of
    the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19,
    2008, Revised Selected Papers. Lecture Notes in Computer Science. Vol. 5706/2009, Pp.
    385-392. (2009)
10. Gînscă, A. L., Boroș, E., Iftene, A., Trandabăţ, D., Toader, M., Corîci, M., Perez, C. A.,
    Cristea, D.: Sentimatrix - Multilingual Sentiment Analysis Service. In Proceedings of the
    2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
    (ACL-WASSA2011). Portland, Oregon, USA, June 19-24. (2011)
11. Iftene, A., Balahur-Dobrescu, A.: Named Entity Relation Mining Using Wikipedia. In
    Proceedings of the Sixth International Language Resources and Evaluation (LREC’08). 28-
    30 May, Marrakech, Morocco. (2008)
12. LUCENE: http://lucene.apache.org/java/docs/.
13. Pakray, P., Bhaskar, P., Banerjee, S., Chandra Pal, B., Bandyopadhyay, S., Gelbukh, A.: A
    Hybrid Question Answering System based on Information Retrieval and Answer
    Validation. Notebook Paper for the CLEF 2011 LABs Workshop, 19-22 September,
    Amsterdam, Netherlands. (2011)
14. Bar-Haim, R., Dagan, I., Dolan, B., Ferro, L., Giampiccolo, D., Magnini B., Szpektor, I.:
    The Second PASCAL Recognising Textual Entailment Challenge. In Proceedings of the
    Second PASCAL Challenges Workshop on Recognizing Textual Entailment. Venice. Italy.
    (2006)

</pre>