=Paper= {{Paper |id=Vol-1179/CLEF2013wn-QA4MRE-GomezAdornoEt2013 |storemode=property |title=Two Approaches for QA4MRE: Information Retrieval and Graph-based Knowledge Representation |pdfUrl=https://ceur-ws.org/Vol-1179/CLEF2013wn-QA4MRE-GomezAdornoEt2013.pdf |volume=Vol-1179 |dblpUrl=https://dblp.org/rec/conf/clef/Gomez-AdornoPA13 }} ==Two Approaches for QA4MRE: Information Retrieval and Graph-based Knowledge Representation== https://ceur-ws.org/Vol-1179/CLEF2013wn-QA4MRE-GomezAdornoEt2013.pdf
    Two Approaches for QA4MRE: Information
      Retrieval and Graph-based knowledge
                  representation

             Helena Gómez-Adorno, David Pinto, Darnes Vilariño

                           Faculty of Computer Science
                   Benemérita Universidad Autónoma de Puebla
               Av. San Claudio y 14 Sur,C.P. 72570, Puebla, Mexico
                  {helena.gomez,dpinto,darnes}@cs.buap.mx,
                             http://www.cs.buap.mx/



      Abstract. In this paper we present our approaches for tackling the
      QA4MRE 2013 main task. We have built two different methodologies,
      one based on information retrieval and the other one based on graph
      representations of the text, additionally we have built a third hybrid
      methodology combining both of the previous one. The first methodo-
      logy uses the Lucene information retrieval engine for carrying out infor-
      mation extraction employing additional automated linguistic processing
      such as stemming, anaphora resolution and part-of-speech tagging. This
      approach validates the answers based on a textual entailment assessment,
      lexical and semantic similarity measures. In the second methodology the
      documents along with its hypotheses are parsed to produce a lexical,
      morphological and syntactic graph representation. Thereafter, we tra-
      verse different paths on the document and the hypothesis in order to
      find features in those graphs by counting text components (word lem-
      mas, PoS tags, grammatical tags). As a result of this procedure, we obtain
      two feature vectors for each traversed path. Finally, a cosine based simi-
      larity is calculated over the feature vectors in order to select the correct
      hypothesis.

      Keywords: Question answering system, reading comprehension, infor-
      mation retrieval, graph-based representation


1   Introduction

In this paper we present the experiments carried out as part of the participation
in the main task of QA4MRE@CLEF 2013. The QA4MRE task is associated
with the ability of a system to understand the main ideas established in a given
text. The task consists of reading a document and identifying answers for a
set of questions about the information that is expressed or implied in the text.
The questions are written in the form of multiple choices; each question has 5
different options, and only one option is the correct answer. The detection of
the correct answer is specifically designed to require various types of inference,
2      Authors Suppressed Due to Excessive Length

and the consideration of prior knowledge acquired from a collection of reference
documents [1, 2]. Answering a question about a given text in an automatic way
to evaluate the understanding of that text, is a very difficult task that oftenly
has been tackled in the literature through some Natural Language Processing
(NLP) techniques, such as Question Answering (QA). Information retrieval and
QA are related, however, QA assumes that given a query, the result must be the
correct answer of that question, instead of a number of references to documents
that contain the answer.
    The main idea behind QA4MRE task is to answer questions based on a single
document. This approach is different from that of traditional QA systems, in
which they have a very large corpus for searching the requested information,
which implies in some cases a very different system architecture.
    Since the first edition of this task in 2011, and later in the 2012, it has pro-
vided a single evaluation platform for the experimentation with new techniques
and methodologies towards giving a solution to this problem. In this sense we
can take the systems presented in this conference as state-of-the-art work for
this research field.
    The rest of the paper is organized as follows. Section 2 describes the Devel-
oped Approaches. Section 3 presents the evaluation results in the collection of
documents of the QA4MRE task at CLEF 2013. Finally, Section 4 presents the
conclusions obtained, so that it outlines some future work directions.


2     Developed Approaches

We have developed two different methodologies for tacking the problem for two
languages, English and Spanish. The first one based on Information Retrieval
techniques, and the other one is based on text representations by means of
graphs. Both of them include a general Document Processing module. In the
following sections each of the methodologies are discussed.


2.1   Document Processing

This module is executed for both data sets, English and Spanish. An XML parser
receives as input a corpus structured in XML format which contains all the do-
cuments, along with their respective questions and multiple choice answers, as is
shown in Figure 1. The XML parser extracts the documents, questions and as-
sociated answers. It stores the questions and answers identifying them according
to the document to which they belong, in order to be used in the following pro-
cesses. Later, the queries associated to each document are analyzed, applying a
Part-Of-Speech (POS) tagger in order to identify the “question keywords” (what,
where, when, who, etc.), and the result is passed to the hypothesis generation
module. Thereafter, hypothesis generation module receives the set of questions
with their multiple choice answers, previously processed. We construct what we
means hypothesis as the concatenation of the question with each of the possible
                                                            Title Suppressed Due to Excessive Length        3

answers. This hypothesis is intended to become the input to the Information Re-
trieval (IR) module, i.e., the query, as well as to the graph generation module. In
order to generate the hypothesis, first the “question keyword” is identified and
subsequently replaced by each of the five possible answers, thereby obtaining
five hypotheses for each question. For example, given the question: Who is the
founder of the SING campaign?. And a posible answer: Annie Lennox. The
obtained hypothesis is: Annie Lennox is the founder of the SING campaign.

                                                                    XML Document




                                          Questions and Answers         XML           Associated Document
                    Document Processing




                                                                     Interpreter


                                              Question
                                                                               - Anaphora Resolution
                                              Analysis



                                            Hypothesis Generator
                                              (Question + Answer)




                                               Hypotheses                              Document



                       Fig. 1. Document Processing Architecture




    Afterwards, we perform anaphora resolution for the English documents as-
sociated with the questions using the JavaRAP1 system. It has been observed
that applying anaphora resolution in QA systems improves the results obtained,
in terms of precision [3].
    The output of this module is the set of hypotheses along with its reference
documents. These sets are the input for the two approaches previously men-
tioned.


2.2    Information Retrieval Approach

The Information Retrieval approach consists of the following two submodules:
Information Retrieval (IR) and Answer Validation. Both submodules are illus-
trated in the Figure 2.


2.3    Information Retrieval module

The IR module was built using the Lucene2 IR library. It is responsible for
indexing the document collection, and for the further passage retrieval, given a
1
    http://wing.comp.nus.edu.sg/ qiu/NLPTools/JavaRAP.html
2
    http://lucene.apache.org/core/
4        Authors Suppressed Due to Excessive Length

                                    Hypotheses                                       Documents




                     Information
                       Retrieval
                                                             Information
                                                               Retrieval


                                                                           Hypothesis + Retrieved
                                                                             Passage + IR score




                     Validation
                      Answer
                                    Final         Answer               Semantic        Textual
                                   Answer        Selection             Similarity     Entailment




                           Fig. 2. Information Retrieval Approach



query. Each hypothesis obtained in the hypothesis generation module is processed
in order to identify the query keywords, removing stop words (using the stop word
list of python NLTK3 ). Every processed hypothesis is sent to the IR module. The
IR module returns a relevant passage for each hypothesis. This passage is used
as a support text to decide whether or not the hypothesis can be the right
answer. For each hypothesis the first passage returned is taken (only one), which
is considered the most important one. This process generates a pair “Hypothesis
+ Passage (H-P )”, along with a lexical similarity score calculated by lucene.

Answer validation module The answer validation module aims to assign a
score based on the textual entailment judgment to the pair H-P generated in the
Information Retrieval module. In addition to the Textual Entailment process a
semantic similarity measure is calculated over the H-P pair.
     It has been proven that the textual entailment judgment may improve the
performance of the hypothesis validation, given a support text, which in this case
is the retrieved passage [4–6]. The aim of this module is to obtain the textual
entailment judgment over all the H − P pairs that it receives as input. In order
to determine whether or not the passage P implies an hypothesis H, we imple-
mented an approach based in an research work[7] presented in the Crosslingual
Textual Entailment task of the SEMEVAL-20124 . In this work the set provided
in that conference is used as a training data. The textual entailment judgment
is performed over the hypotheses-passages set as test data.
     For this particular problem all the previously developed models were tested,
determining that the best performance is obtained when the following 29 features
are used: the number of n-grams of words (n = 1, · · · , 4) and characters (n =
1, · · · , 5), which share each pair of sentences and the number n-grams of words
(n = 1, · · · , 4) and characters (n = 1, · · · , 5) that are in the hypothesis and not
in the support text and viceversa. In addition, the length of both sentences are
included to the feature set, since it has been proven to help to obtain the textual
entailment judgment. Given that this problem can be seen as a classification
3
    http://nltk.org/
4
    http://www.cs.york.ac.uk/semeval-2012/task8/
                                          Title Suppressed Due to Excessive Length                     5

one, after several experiments, it was decided to use a 4-layer neural network,
using the WEKA5 data mining tool.
   The SemanticSimilarity measure used in this work [8] gives a weight to each
word of the sentence in terms of the degree of specificity of the word. For example
the words catastrophe and disaster gain more weight than words could and
should. The similarity inter-words for both sentences is integrated into this
measure. The three similarity word-to-word measures proposed are Knowledge-
based Measures, based on the Wordnet taxonomy (path, lin and wup)).
   The similarity between the pair H y P is given by the equation 1
                    ∑                                         ∑
                            (maxSim(w, P ) ∗ idf (w))                  (maxSim(w, H) ∗ idf (w))
                1 w ϵ {H}                                   w ϵ {P }
    sim(H, P ) = (             ∑                        +                 ∑                       )   (1)
                2                      idf (w)                                     idf (w)
                             w ϵ {H}                                    w ϵ {P }



    For the Answer Selection process, we have developed a method based on the
following rules:

1. Check the entailment judgment between the hypothesis and the recovered
   passage. If the judgment is “no entailment”, in the five hypotheses then this
   algorithm discards this answer, in other case, the lexical similarity score
   obtained by Lucene is used.
2. For each question, the answer obtaining the highest sum of scores is selected
   as the correct answer.
3. Finally, we check the Lucene score, and if the score is lower than 0.1 and
   higher than 0.0 we answer the question with the option “5) None of the
   above”; if the score is equal to 0.0 the question is not answered.

   The reason for discarding the hypothesis with “no entailment” judgment is
that even thought the IR module returned a passage for the hypothesis, this one
does not share sufficient information to support the selection of that hypothesis
as the correct answer to the question. The use of the lexical similarity score
obtained by lucene allows the system to determine which answer is more similar
with its support text.


2.4    Graph-based Knowledge Extraction Approach

For many problems in natural language processing, a graph structure is an intu-
itive, natural and direct way to represent the data.There exist several research
works that have employed graphs for text representation in order to solve some
particular problem [9]. The Graph-based Knowledge Extraction Approach con-
sists of the following two submodules: Graph Generation and Answer Validation.
Both submodules are illustrated in the Figure 3.
5
    http://www.cs.waikato.ac.nz/ml/weka/
6       Authors Suppressed Due to Excessive Length

                                            Hypotheses +
                                             Documents



                                             Syntactic
                                               Level                    Document
                                              Tagger




                        Graph Generation
                                                                                         Features
                                           Morphological   ‐‐‐‐‐‐‐‐
                                                           ‐‐‐‐‐‐‐‐                     Extractor
                                               Level       ‐‐‐‐‐‐‐‐
                                                                                       (traversing
                                              Tagger
                                                           ‐‐‐‐‐‐‐‐

                                                                         Hypothesis      graphs)

                                                           ‐‐‐‐‐‐‐‐
                                              Lexical      ‐‐‐‐‐‐‐‐
                                                           ‐‐‐‐‐‐‐‐
                                               Level       ‐‐‐‐‐‐‐‐



                     Validation
                      Answer



                                                                       Answer          Similarity
                                           Final Answer
                                                                      Selection       Computation




         Fig. 3. Graph-based Knowledge Extraction Approach Architecture



Graph Generation module This module receives the set of hypotheses along
with its associated documents from the Document Processing module. Text do-
cuments along with its hypotheses are parsed to produce their graph-based repre-
sentation. For the graph-based representation we took into account the different
linguistic levels (lexical, syntactic, morphologic and semantic) in order to capture
the majority of the features presented in the text. By including those linguistic
analysis we attempt to represent how different text components (words, phrases,
clauses, sentences, etc) are related.
    The process of the graph generation is described by the following submodules:
The Syntactic Level Parser is the base for the Graph-based representation.
  At the syntactic level we deal with rules and principles that govern the
  structure of a given text. Different syntactic-based parsers exist in literature,
  however, for the purposes of this work, we use the Stanford Dependency
  Parser 6 for the English language set, and Freeling7 for the Spanish language
  set. In this type of parsing, we may take advantage of the grammatical
  relation obtained between two components of the sentence.
The Morphologic Level Parser deals with the identification of the mor-
  pheme’s structure of a given language and others linguistic units, such as
  word’s roots, affixes, Part-Of-Speech(POS) tags. With the aim of introduc-
  ing these morphological components to the proposed representation, we have
  used the Stanford Log linear Part-Of-Speech Tagger8 (English) and Freeling
  (Spanish) in order to obtain the POS tags. Furthermore, the Lancaster stem-
  mer algorithm was used in order to obtain truncated words.
The Lexical Level At this level we deal with words, one of the most basic
  text units, describing their meaning in relation to the physical world or
6
  http://nlp.stanford.edu/software/lex-parser.shtml
7
  http://nlp.lsi.upc.edu/freeling/
8
  http://nlp.stanford.edu/software/tagger.shtml
                                           Title Suppressed Due to Excessive Length                 7

    to abstract concepts, without reference to any sentence in which they may
    occur.

    As a result of this process, each document is represented as a tree rooted in
a ROOT − 0 node, and branches to sub-trees that represent all the sentences in
the document. The nodes of the tree represent the word lemmas of the sentences
along with its part-of-speech tag. The branches represent the dependency tag
between the two connecting nodes, and a frequency label established as the
number of occurrences of the pair (initial node, final node) in the graph plus the
frequency of the dependency tag of the same pair of nodes. In the same way the
hypotheses are represented as a tree with the same characteristics as well.
    In Figure 4 we show the graph-based representation for the hypothesis “An-
nie Lennox is the founder of the SING campaign”, whereas, Figure 5 shows
the graph-based representation for the first sentences of the reference document
associated to the given question.


                                                          nn:3
                                             Lennox-NNP            Annie-NNP
                                 nsubj:2

                                 cop:2         is-VBZ
           root:2
  ROOT-0            founder-NN                            det:3
                                                                                          the-DT
                                 prep:2                                          det:3
                                                          pobj:2
                                                of-IN              campaign-NN
                                                                                 nn:3
                                                                                         SING-NNP




Fig. 4. Graph-based representation of the hypothesis “Annie Lennox is the founder of
the SING campaign”




    In the Features Extractor module, the process start by fixing the root node
of the hypothesis graph as the initial node, whereas the final nodes selected corre-
spond to the rest nodes of the hypothesis graph. We have used the DijkstraAlgorithm
[10] for finding the minimum path between the initial and each final node. There-
after, we count the occurrences of all the multi-level linguistic features conside-
red in the text representation such as part-of-speech tags and dependencies tags
found in the path. The same procedure is performed with the document graph,
using as initial and final node the pair of words identified in the hypothesis. As a
result of this procedure, we obtain two set of feature vectors: one for the answer
hypothesis, and another one for the reference document.


Answer Validation module The answer validation module receives the set
                       −→
of vectorial features (ft,i ) for each text. Thus, the reference document d will
                                                     −−→ −−→      −−→
now be represented by m feature vectors (d∗ = {fd,1 , fd,2 , ..., fd,m }), as well as
                                     −−
                                      →   −
                                          −→      −−→
the answer hypothesis h (h∗ = {fh,1 , fh,2 , ..., fh,m }), being m, the number of
different paths that may be traversed in both graphs, using the ROOT-0 vertex
8                 Authors Suppressed Due to Excessive Length

                                                                                                                                      Why-WRB
                                                                                                                          advmod:3

                                                                                                                                       am-VBP
                                                                                                                           cop:4

                                                                                                                            nn:9

                                                                                                           activist-NN     det:5
                                                                                                                                      HIVAIDS-JJ
                                                                                                                            nn:9

                                                       root:2                                                              dep:4
                                                                                                                                        an-DT
                                                                                                                           det:5


                                                                                                                           nsubj:5
                                                                                                                                                    nn:11
                                                                           nsubj:5
                                                                                                                                      Lennox-NNP
                                                                                                                                                    nn:11
                                                                                                                           nsubj:5
             root:3                aux:4                                   pobj:3                                                                   nn:8
    ROOT-0            going-VBG             ’m-VBP              with-IN              you-PRP                                                                Annie-NNP
                                  xcomp:2              prep:4                                                             advmod:3
                                                                                                          campaigner-NN               how-WRB       nn:8
                                                                                               pcomp:2
                                                       dobj:2              prep:4                                           nn:9
                                            share-VB            story-NN              as-IN     dep:2
                                                                           det:5

                                  root:3                                                                      to-TO        aux:4       have-VBP
                                                                           aux:4
                                                                                                                           cop:4
                                                                                     the-DT                                          become-VBN
                                                                           det:5
                                                                                                                                         nn:9
                                                                           nsubj:5   this-DT
                                                                                                           campaign-NN    appos:2
                                                                name-NN    cop:4
                                                                                                                                                    nn:9
                                                                           prep:4    is-VBZ    pobj:3                                Campaign-NNP           SING-NNP


                                                                            cc:2
                                                                                      of-IN


                                                                                     And-CC




                       Fig. 5. Graph-based representation for one reference document



as the initial node and each one of the words appearing in the hypothesis as the
final node.
    Since each path of the answer hypothesis contains exactly the same number
and types of components than the reference document, it is possible to calculate
the degree of similarity among each path traversed. For the purposes of this
study case, we have used the cosine similarity measure, which is calculated as
shown in Eq. 2.

                                                                                                         ∑m   −→ −→
                                                                                                              fh,i · fd,i
                                   cosine(d, h) = cosine(d∗ , h∗ ) =                                          −→ −→                                              (2)
                                                                                                         i=1 ∥fh,i · fd,i ∥

After obtaining all the similarity scores for the five hypothesis of one question,
the hypothesis achieving the highest score is selected as the correct answer. For
the experiments carried out with this methodology we have decided to answer
every question. The only case in which the question is not answered is when the
similarity score is 0.0, but this case only occurred in the Spanish language.

3            Experimental results
This section describes the data sets used for evaluating the methodologies pro-
posed in this paper. Additionally, the results obtained in the experiments carried
out are reported and discussed.

3.1          Corpus Description - QA4MRE task 2013
The features of the test dataset are detailed in Table 1.
    It is worth to mention that this year The data set was composed of a total
of 284 questions of which:
                                   Title Suppressed Due to Excessive Length      9

                        Table 1. Features of the test dataset

            Features                                       2013
            1. Topics                                        4
            2. Topic details                          Climate Change,
                                                     Music & Society,
                                                    Alzheimer and AIDS
            2. Reading tests (documents)                     4
            3. Questions per document                      15/20
            4. Multiple-choice answers per question          5
            5. Total of questions                         240/320
            6. Total of answers                          1200/1600



 – 240 are main questions
 – 44 are auxiliary questions

The difference between main and auxiliary questions resides in the presence of
a inference. In fact an auxiliary question is just a duplicate of a main question
minus the inference. The idea is that the simpler versions (auxiliary) could be
added to a main questions: if a system gets the difficult version wrong and the
easy version right, it could be that it could not perform the required inference.


3.2   Obtained Results

The main measure used in this evaluation campaign is c@1, which is defined as
shown in equation 3. This measure is defined in the QA4MRE task at CLEF
2011 with the purpose of allowing the systems to decide whether or not to answer
a given question. The aim of this procedure is to reduce the amount of incorrect
answers, maintaining the number of correct ones.

                                       1          nR
                               c@1 =     (nR + nU    )                         (3)
                                       n           n
where:
nR : number of correctly answered questions.
nU : number of unanswered questions.
n: total number of questions.

   We have sent seven runs for the English language data set and three runs for
the Spanish language data set. Table 2 present the obtained results of all runs for
both languages on the main questions set. The column N oA indicates the num-
ber of answered questions, the column N oU shows the number of unanswered
questions, the column P cD represents the percentage of correctly discarded an-
swers and finally the c@1 measure.
   In particular, the runs: buap1301enen, buap1309enen were executed using
the IR Approach. The difference relies in the addition of the semantic similarity
measure to the buap1309enen run and all questions are answered in this run.
10       Authors Suppressed Due to Excessive Length

The inclusion of the semantic similarity measure allows to overcome the results
of the buap1301enen run, but, it could achieve a better performance if we would
included the “no answer” rules.
    The runs: buap1302enen and buap1310enen were executed using the Graph-
based Approach, with the difference of the algorithm for obtaining the short-
est path between an initial and a final node. The buap1302enen run uses the
Dijkstra Algorithm, while the buap1310enen uses All shortest path Algorithm,
both implemented in the N etworkx9 tool of Python. This methodology attempts
to find the similarity between the hypothesis and the complete document, with-
out the use of IR or NLP techniques. The purpose of using this methodology was
to test a framework for this particular task, but it is still a basic approach which
can be improve including others pre processing techniques such as question anal-
ysis, and inferences mechanisms, and add more elements to the graph, like named
entity recognition, semantic relations (synonyms, hyponyms and hyperonyms).
    The runs: buap1303enen and buap1304enen are executed using a hybrid ap-
proach. It means, we mix the IR approach with the Graph-based approach.
With the IR system we recovered 5 passages for each hypothesis, which we use
instead if the reference document in order to find the similarity measure using
the Graph-based Approach. The rest of the methodology is maintained the same.
The run buap1304enen include the validation for detecting if none of the candi-
date answer is correct. Finally, the run buap1305enen is a voting system between
buap1301enen, buap1302enen and buap1304enen. If two of the three candidates
respond the same answer for a given question, that answer is selected. In other
case, the question is not respond.


             Table 2. Table of results Evaluation on the main questions.

                     Description    NoA NoU           PcD c@1
                                English Data Set
                    buap1301enen 221        19      0.84   0.27
                    buap1309enen 240         0      0.00   0.28
                    buap1302enen 240         0      0.00   0.20
                    buap1310enen 240         0      0.00   0.19
                    buap1303enen 240         0      0.00   0.24
                    buap1304enen 240         0      0.00   0.25
                    buap1305enen 198        42       0.7   0.24
                                Spanish Data Set
                    buap1306eses 233         7      0.86   0.27
                    buap1307eses 238         2      1.00   0.24
                    buap1308eses 238         2      1.00   0.23



   For the Spanish language, we have sent one run buap1306eses using the
Graph-based Approach and the other two runs buap1307eses and buap1308eses
9
     http://networkx.github.io/
                                  Title Suppressed Due to Excessive Length       11

using the hybrid Approach, similar to the English language runs. The best perfor-
mance was obtained by the Graph-based Approach in contrary with the English
results.
    Table 3 present the obtained results of all runs for both languages on the
auxiliary + main questions set. In this table we can observe that the results are
higher than the other one. This means that our system is not able to perform
the inference needed to solve the more difficult questions. The results behavior
is similar to the results of the main questions data set.


     Table 3. Table of results Evaluation on all questions (main + auxiliary).

                   Description   NoA NoU            PcD. c@1
                              English Data Set
                   buap1301enen 264      20         0.80   0.33
                   buap1309enen 284       0         0.00   0.31
                   buap1302enen 284       0         0.00   0.24
                   buap1310enen 284       0         0.00   0.24
                   buap1303enen 284       0         0.00   0.31
                   buap1304enen 284       0         0.00   0.32
                   buap1305enen 240      44         0.70   0.32
                              Spanish Data Set
                   buap1306eses 274      10         0.90   0.30
                   buap1307eses 282       2         1.00   0.28
                   buap1308enen 282       2         1.00   0.28




4   Conclusion and Future Work

We have developed two different methodologies plus a third hybrid one, as a
part of our participation of the QA4MRE task 2013. The first one was built
using a basic IR Approach, the second one was built by means of graph-based
representations and feature extraction, finally the third hybrid approach is a
combination of the two firsts approaches.
   In particular we have sent ten different runs. Our best performance for the
English language data set was obtained with the IR Approach, while the hybrid
approach was a little bit lower. In the case of the Spanish language the Graph-
based approach brings the bests results.
   We consider that, even though, the Graph methodology did not achieve the
bests results for the English language, it is an interesting framework to represent
documents and it could be improve by adding particular characteristics of this
task, such as, question analysis, question expansion (by synonym, hyponym,etc),
and some improve mechanisms to allow us to detect whether to answer or not a
given question.
12      Authors Suppressed Due to Excessive Length

References
 1. Peñas, A., Hovy, E.H., Forner, P., Rodrigo, Á., Sutcliffe, R.F.E., Forascu, C.,
    Sporleder, C.: Overview of qa4mre at clef 2011: Question answering for machine
    reading evaluation. In: CLEF (Notebook Papers/Labs/Workshop). (2011)
 2. Peñas, A., Hovy, E.H., Forner, P., Rodrigo, Á., Sutcliffe, R.F.E., Sporleder, C.,
    Forascu, C., Benajiba, Y., Osenova, P.: Overview of qa4mre at clef 2012: Ques-
    tion answering for machine reading evaluation. In: CLEF (Online Working
    Notes/Labs/Workshop). (2012)
 3. Vicedo, J.L., Ferrandez, A.: Importance of pronominal anaphora resolution in
    question answering systems. In: In Proceedings of the 38th Annual Meeting of the
    Association for Computational Linguistics (ACL. (2000) 555–562
 4. Pakray, P., Bhaskar, P., Banerjee, S., Pal, B.C., Bandyopadhyay, S., Gelbukh, A.F.:
    A hybrid question answering system based on information retrieval and answer
    validation. In: CLEF (Notebook Papers/Labs/Workshop). (2011)
 5. Bhaskar, P., Pakray, P., Banerjee, S., Banerjee, S., Bandyopadhyay, S., Gelbukh,
    A.F.: Question answering system for qa4mre@clef 2012. In: CLEF (Online Working
    Notes/Labs/Workshop). (2012)
 6. Clark, P., Harrison, P., Yao, X.: An entailment-based approach to the qa4mre
    challenge. In: CLEF (Online Working Notes/Labs/Workshop). (2012)
 7. Vilariño, D., Pinto, D., Tovar, M., León, S., Castillo, E.: Buap: Lexical and se-
    mantic similarity for cross-lingual textual entailment. In: Proceedings of the 6th
    International Workshop on Semantic Evaluation, Montréal, Canada, ACL (2012)
 8. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based
    measures of text semantic similarity. In: IN AAAI ’06. (2006) 775–780
 9. Mihalcea, R., Radev, D.: Graph-based natural language processing and information
    retrieval. Cambridge university press (2011)
10. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische
    mathematik 1(1) (1959) 269–271