=Paper=
{{Paper
|id=Vol-1180/CLEF2014wn-QA-GomezAdornoEt2014
|storemode=property
|title=Graph Based Approach for the Question Answering Task Based on Entrance Exams
|pdfUrl=https://ceur-ws.org/Vol-1180/CLEF2014wn-QA-GomezAdornoEt2014.pdf
|volume=Vol-1180
|dblpUrl=https://dblp.org/rec/conf/clef/Gomez-AdornoSPG14
}}
==Graph Based Approach for the Question Answering Task Based on Entrance Exams==
<pdf width="1500px">https://ceur-ws.org/Vol-1180/CLEF2014wn-QA-GomezAdornoEt2014.pdf</pdf>
<pre>
                  Graph-based Approach
             to the Question Answering Task
                Based on Entrance Exams

                   Helena Gómez-Adorno1 , Grigori Sidorov1 ,
                    David Pinto2 , and Alexander Gelbukh1
                               1
                                 Centro de Investigación,
                             Instituto Politécnico Nacional,
                                  México D.F., Mexico
                               helena.adorno@gmail.com,
                         sidorov@cic.ipn.mx, www.gelbukh.com
                     2
                     Facultad de Ciencias de la Computación,
                  Benemérita Universidad Autónoma de Puebla,
                                 Puebla, Mexico
                               dpinto@cs.buap.mx


      Abstract. This paper describes the approach used in the system for
      the Question Answering Task based on Entrance Exams, which was pre-
      sented at the CLEF 2014. The task aims to evaluate methods of text un-
      derstanding with reading comprehension tests. The system should read a
      given document and answer multiple-choice questions about it. Our ap-
      proach transforms the documents along with the multiple-choice answers
      into a graph-based representation that contains lexical, morphological,
      and syntactic features. After this, it traverses different paths both in
      the document itself and in the the graphs of the answers in order to find
      these features of the graphs. It is performed by counting text components:
      lemmas, PoS tags, grammatical tags. As the result of this procedure, the
      system constructs several feature vectors: one for each traversed graph.
      Finally, a cosine based similarity is calculated over these feature vec-
      tors in order to rank the multiple-choice answers and select the correct
      one—with the best similarity with the graph that corresponds to the
      text itself. Our system obtained a c@1 of 0.375, which was outperformed
      only by one system in the competition.

      Keywords: Question answering system, reading comprehension, entrance
      exams, graph-based representation, graph similarity, extraction of fea-
      tures from graphs


1   Introduction
In this paper we present the experiments carried out as part of the participation
in the Question Answering track based on Entrance Exams presented at the
CLEF 2014. The Entrance Exam task was proposed in 2013 as a pilot task [1] in


                                      1395
the Question Answering for Machine Reading Evaluation Lab (QA4MRE), which
was offered at the Conference and Labs of the Evaluation Forum (CLEF) since
2011 [2, 3]. The entrance exams task evaluates systems in the same situation, in
which high school students are evaluated for entering a university. The challenge
consists of reading a document and identifying a correct answer (from multiple
choices) for a set of questions about the information that is expressed or implied
in the text. The questions are written in the form of multiple choices; each
question has 4 different options, and only one option is the correct answer.
Exams are created by the Japanese National Center for University Admissions
Tests. The Entrance Exams corpus is provided by NII’s Todai Robot Project1
and NTCIR2 .
    Since the first edition of QA4MRE task in 2011, and later in the 2012 and
2013, a single evaluation platform for the experimentation with new techniques
and methodologies for this problem has provided. In this sense, we can take the
systems presented at this conference as state of the art work in this research
field.
    The rest of the paper is organized as follows. Section 2 describes our approach
and the system architecture. Section 3 presents the configuration of the submit-
ted runs and the evaluation results. Finally, Section 4 presents the conclusions
and outlines some directions of future work.


2     System Architecture

For many problems in natural language processing, graph structure is an in-
tuitive, natural and direct way to represent data. There exist several research
works that have employed graphs for text representation in order to solve some
particular problem [4]. We propose an approach based on a graph methodology
for document understanding, which is described in detail before in [5], and built
the corresponding system. The system consists of the following submodules:
document preprocessing, graph generation and answer validation. The general
architecture is illustrated in Figure 1.


2.1     Document Preprocessing

First we perform document (pre)processing. An XML parser receives as input
a structured corpus in XML format as it is shown in Figure 1. This XML file
contains all the documents, along with their respective questions and multiple
choice answers. An XML interpreter extracts the documents, questions and as-
sociated answers. It stores the questions and answers identifying them according
to the document, to which they belong, in order to be used in further processing.
Further, the questions associated to each document are analyzed, identifying the
“question keywords” (what, where, when, who, etc.), and the result is passed to
1
    http://21robot.org/About/
2
    http://research.nii.ac.jp/ntcir/index-en.html


                                        1396
the next module. After this, hypothesis generation module formulates several
candidate “answer hypotheses” as the modified versions of the original question,
replacing these words with one of the possible answers given in the test data.
For example, given the question: Who is the founder of the SING campaign?
and the possible answer: Annie Lennox. The obtained hypothesis is: Annie
Lennox is the founder of the SING campaign.
    Afterwards, we perform anaphora resolution in the documents using the
JavaRAP3 system. It was observed that applying anaphora resolution in QA
systems improves the precision [6].
    The output of this module is the set of answer hypotheses along with their
reference documents.


                                                                                         XML Document


                                              Questions and Answers               XML           Associated Document
                        Document Processing


                                                                               Interpreter


                                                  Question
                                                                                         - Anaphora Resolution
                                                  Analysis


                                                Hypothesis Generator
                                                  (Question + Answer)


                                                              Hypotheses +
                                                               Documents

                                                  Syntactic
                                                   Parser                            Document
                       Graph Generation


                                                                    --------                               Features
                                                Morphological       --------                              Extractor
                                                                    --------
                                                   Tagger           --------                             (traversing
                                                                                      Hypothesis           graphs)

                                                                    --------
                                                  Semantic          --------
                                                                    --------
                                                  Expansion         --------
                    Validation
                     Answer


                                                                                   Answer               Similarity
                                                Final Answer
                                                                                  Selection            Computation


                         Fig. 1. Graph-based system architecture


2.2    Graph Generation Module
In the graph generation module, text documents along with their hypotheses are
parsed to produce their graph-based representation. For the graph representation
we took into account various linguistic levels (lexical, syntactic, morphological
3
    http://wing.comp.nus.edu.sg/ qiu/NLPTools/JavaRAP.html


                                                                    1397
and semantic relationships) in order to capture the majority of features present
in the text.
    The process of the graph generation is performed by the following submod-
ules:

Syntactic Parser is the base of the graph structure. We use the Stanford
   Dependency Parser4 for producing the parsed tree for each sentence of the
   documents. In this type of parsing, we detect grammatical relation between
   words of the sentences.
Morphological Tagger obtains PoS tags of the words. For this, we used the
   Stanford Log linear Part-Of-Speech Tagger5 for English. The Lancaster stem-
   mer algorithm was used in order to obtain word stems.
Semantic Expansion uses the Wordnet taxonomy [7] in order to add semantic
   relations between nodes of the graphs, i.e., the graph is expanded using the
   taxonomy: some nodes are added.

    As the result of this process, each document is represented as a tree rooted
in a ROOT − 0 node. The vertices to sub-trees represent all sentences in the
document. The nodes of the trees represent word or lemmas of the sentences
along with their part-of-speech tags. The vertices between nodes represent the
dependency tags between these connected nodes and the frequency label shows
the number of occurrences of the pair (initial node, final node) in the graph plus
the frequency of the dependency tag of the same pair of nodes. In the same way
all answer hypotheses are represented as trees with the same characteristics.
    In Figure 2 we present the graph-based representation for the hypothesis
“Annie Lennox is the founder of the SING campaign”, whereas Figure 3 shows
the graph-based representation for the first three sentences of the reference do-
cument associated to this question.


                                                          nn:3
                                             Lennox-NNP            Annie-NNP
                                   nsubj:2

                                   cop:2       is-VBZ
             root:2
    ROOT-0            founder-NN                          det:3
                                                                                          the-DT
                                   prep:2                                        det:3
                                                          pobj:2
                                                of-IN              campaign-NN
                                                                                 nn:3
                                                                                         SING-NNP


Fig. 2. Graph-based representation of the hypothesis “Annie Lennox is the founder of
the SING campaign”


    The feature extraction starts by fixing the root node of the hypothesis graph
as the initial node, whereas the selected final nodes correspond to the rest nodes
4
    http://nlp.stanford.edu/software/lex-parser.shtml
5
    http://nlp.stanford.edu/software/tagger.shtml


                                             1398
of the hypothesis graph. We use the Dijkstra0 s Algorithm [8] for finding the
shortest paths between the initial and each final node. After this, we count the
occurrences of all the multi-level linguistic features considered in the text repre-
sentation such as PoS tags and dependencies tags found in the path. The same
procedure is performed with the document graph, using the pair of nodes identi-
fied in the hypothesis as the initial and final node. As the result of this procedure,
we obtain two feature vectors: one for the answer hypothesis and another one
for the reference document. This module was implemented in Python, using the
NetworkX6 package for creation and manipulation of graphs.


                                                                                                                                 Why-WRB
                                                                                                                      advmod:3
                                                                                                                                  am-VBP
                                                                                                                       cop:4

                                                                                                                       nn:9
                                                                                                       activist-NN     det:5
                                                                                                                                 HIVAIDS-JJ
                                                                                                                        nn:9

                                                                                                                       dep:4
                                                    root:2                                                                         an-DT
                                                                                                                        det:5

                                                                                                                       nsubj:5

                                                                        nsubj:5
                                                                                                                                 Lennox-NNP     nn:11
                                                                                                                       nsubj:5
                                aux:4                                   pobj:3    you-PRP                                                               Annie-NNP
          root:3                          ’m-VBP             with-IN                                                  advmod:3
 ROOT-0            going-VBG   xcomp:2              prep:4                                                                        how-WRB
                                                                                            pcomp:2   campaigner-NN
                                                    dobj:2              prep:4     as-IN                                nn:9
                                         share-VB            story-NN                        dep:2
                                                                        det:5
                                                                                                          to-TO         aux:4     have-VBP
                               root:3                                    aux:4
                                                                                                                       cop:4      become-VBN
                                                                                  the-DT
                                                                         det:5                                                      nn:9
                                                                        nsubj:5   this-DT
                                                                                                       campaign-NN     appos:2
                                                             name-NN     cop:4                                                                   nn:9
                                                                                             pobj:3                              Campaign-NNP           SING-NNP
                                                                         prep:4    is-VBZ

                                                                          cc:2     of-IN


                                                                                   And-CC


Fig. 3. Example of the graph-based representation of a reference document (3 first
sentences


2.3       Answer Validation Module
                                              −→
This module receives several feature vectors (ft,i ) for each text. Thus, the ref-
                                                               −−→ −−→       −−→
erence document d is now represented by m features (d∗ = {fd,1 , fd,2 , ..., fd,m }),
                                              −−→ −−→         −−→
as well as the answer hypothesis h, (h∗ = {fh,1 , fh,2 , ..., fh,m }), being m the
number of different paths that can be traversed in both graphs.
6
    https://networkx.github.io/


                                                                         1399
   We use the following cosine similarity measure for calculating the degree of
similarity in each traversed path:
                                m
                                X            −→ −→
       Similarity(h∗ , d∗ ) =         Cosine(fh,i , fd,i )
                                i=1
                                m   −→        −→
                                X   fh,i · fd,i
                           =       −→          −→
                             i=1 ||fh,i || · ||fd,i ||
                              m            P|V |
                                             j=1 (f(h,i),j × f(d,i),j )
                             X
                           =     qP                       qP                       .   (1)
                                       |V |            2×      |V |              2
                             i=1       j=1  (f(h,i),j )        j=1  (f  (d,i),j )

   After obtaining all similarity scores of the four hypotheses for each question,
the hypothesis achieving the highest score is selected as the correct answer. We
answered every question of the task using this methodology.


3   Experimental Results
This section describes the data sets of the challenge used for evaluation of the
systems. Additionally, the results obtained in the experiments are reported and
discussed.
    Following 2013 edition, the test set 2014 based on Entrance Exams was com-
posed of tests for reading comprehension taken from the Japanese Center Test,
which is a nation-wide achievement test for admission in Japanese universities.
    The organizers released two different datasets, one for training and one for
testing. Both data sets were composed of the following elements:
 – 12 test documents,
 – 60 questions (5 questions for each document),
 – 240 choices/options (4 for each question).
    The main measure used in this evaluation is c@1, which is defined as shown
in equation 2. This measure was defined in the QA4MRE task at CLEF 2011
with the purpose of allowing the systems to decide whether or not to answer a
given question. The aim of this procedure is to reduce the amount of incorrect
answers, maintaining the number of correct ones, i.e., a system is penalized for
answering incorrectly and it is better not to answer at all if not sure:
                                        1          nR
                            c@1 =         (nR + nU    ),                               (2)
                                        n           n
where:
    nR : number of correctly answered questions,
    nU : number of unanswered questions,
    n: total number of questions.
    Table 1 presents the comparative results obtained in the QA 2014 compe-
tition for Entrance Exams. Approximately thirty runs were submitted to the


                                        1400
competition from various research teams. We submitted eight runs with dif-
ferent configurations of the system. The run that ranked highest was cicnlp-8,
which obtained a c@1 of 0.375 answering 21 questions correctly. At the reading
comprehension level the run cicnlp-8 only passed 3 out of 12 tests, whereas the
run cicnlp-2 passed 4 out of 12 tests. Even though the run cicnlp-2 passed one
more test than our best run, it only obtained a c@1 of 0.303. In the last column
of Table 1 we show the quantity of tests passed by the runs (tests with c@1 >
0.5).


    Table 1. Results of participation in QA task 2014 based on Entrance Exams

             Correctly    Incorrectly                              Tests with
  Run                                      Unanswered      c@1
             Answered      Answered                                c@1 > 0.5
  cicnlp-1      16            40               0           0.285      3/12
  cicnlp-2      19            37               0           0.339      2/12
  cicnlp-3      17            39               0           0.303      4/12
  cicnlp-4      17            39               0           0.303      2/12
  cicnlp-5      13            43               0           0.232      1/12
  cicnlp-6      16            40               0           0.285      3/12
  cicnlp-7      20            36               0           0.357      2/12
  cicnlp-8      21            35               0           0.375      3/12


    Table 2 shows the features included in the graph-based representation and
which of those features were used in the feature extraction technique for each of
the eight configurations. Those configurations were selected during the evalua-
tion process with the training dataset released by the organizers.
    The cicnlp-1 run includes the words and the POS tags in the nodes of the
graph-based representation, as well as the dependency tags, which represents the
relation between each pair of nodes (vertices). Note that in this run we do not
use lemmas in graph nodes and do not consider frequencies for the vertices. The
features extracted in this run are PoS tags and dependency tags. In the case of
the cicnlp-2 the graph representation is the same, but we add the word count
to the extracted feature set. In case of the runs cicnlp-1 and cicnlp-2, we used
an algorithm for obtaining all shortest path between an initial and a final node.
The rest of the runs use the Dijkstra algorithm for obtaining only one shortest
path.
    The cicnlp-3 run includes the frequency count of the vertices (initial node to
final node + dependency tags) in the same way as it is explained in Section 2.2.
The cicnlp-4 uses the same graph representation and extracted features than
run cicnlp-2, the only difference is the traversal algorithm.
    The cicnlp-5 and cicnlp-6 runs apply the feature extraction technique when
the words in the hypothesis graph are expanded with their corresponding set
of synonyms (without applying the process of word sense disambiguation). The
run cicnlp-6 uses stems instead of the full words in the graph-based representa-
tions. The features extracted in both runs are words (count), PoS tags (count)


                                    1401
Table 2. Configurations of the graphs representation and the extracted features used
in the runs.

                          Representation schemes                                                                            Extracted features


                                                                                                                                                             Dependency tags count
                                                                                                       Hypernym expansion
                                                                                   Synonym expansion
                                       Dependency tags


                                                                                                                                            POS tags count
                                                                                                                              Words count
                                                                       Frequency
                                                         Stemming
                            POS tags
                  Words


       Run
       cicnlp-1   X         X          X                                                                                                    X                X
       cicnlp-2   X         X          X                                                                                      X             X                X
       cicnlp-3   X         X          X                               X                                                                    X                X
       cicnlp-4   X         X          X                                                                                      X             X                X
       cicnlp-5   X         X          X                                           X                                          X             X                X
       cicnlp-6   X         X          X                 X                         X                                          X             X                X
       cicnlp-7   X         X          X                                                               X                                    X                X
       cicnlp-8   X         X          X                 X                                             X                                    X                X


and dependency tags (count). The runs cicnlp-7 and cicnlp-8 apply the feature
extraction technique when the words in the hypothesis graph are expanded with
their corresponding set of hypernyms (without applying the process of word
sense disambiguation). The run cicnlp-8 uses stems instead of the full words in
the graph-based representations. The features extracted in both runs are PoS
tags (count) and dependency tags (count).
   We found that as far as the reading level is concerned, different runs were
able to pass different tests. For example, the run cicnlp-8 passed the tests 13,
16 and 18; the run cicnlp-3 passed the tests 13, 14, 15 and 16; the run cicnlp-6
passed the tests 13, 16 and 23. In future, we plan to combine different runs,
and in this manner we would be able to pass 6 out of 12 tests. Besides, the run
cicnlp-8 passed the test 13 correctly answering all questions.


4   Conclusion and Future Work

We described the approach and the system developed as a part of our partici-
pation of QA task 2014 based on Entrance Exams. The approach uses a graph
structure for representing the documents and the answer hypotheses. It extracts
linguistic features from both graphs—documents and answer hypotheses—by
traversing shortest paths. The features are further used for computing the simi-
larity between the document and the answer hypotheses.
    We sent eight runs to the competition. The best run (cicnlp-8 ) of our system
achieves a c@1 of 0.375, which was outperformed only by one system.


                                                                    1402
    For future work, we hope that the use of domain-specific techniques of ques-
tion answering will improve the performance of the algorithm for this particular
problem. Textual entailment, named entity recognition, analysis of the type of
question can improve the final results of our system in the task.


References
1. Peñas, A., Miyao, Y., Forner, P., Kando, N.: Overview of QA4MRE 2013 Entrance
   Exams task. In: CLEF (Online Working Notes/Labs/Workshop). (2013)
2. Peñas, A., Hovy, E.H., Forner, P., Rodrigo, Á., Sutcliffe, R.F.E., Forascu, C.,
   Sporleder, C.: Overview of QA4MRE at CLEF 2011: Question Answering for Ma-
   chine Reading Evaluation. In: CLEF (Notebook Papers/Labs/Workshop). (2011)
3. Peñas, A., Hovy, E.H., Forner, P., Rodrigo, Á., Sutcliffe, R.F.E., Sporleder, C.,
   Forascu, C., Benajiba, Y., Osenova, P.: Overview of QA4MRE at CLEF 2012:
   Question Answering for Machine Reading Evaluation. In: CLEF (Online Working
   Notes/Labs/Workshop). (2012)
4. Mihalcea, R., Radev, D.: Graph-based natural language processing and information
   retrieval. Cambridge University Press (2011)
5. Pinto, D., Gómez-Adorno, H., Ayala, D.V., Singh, V.K.: A graph-based multi-level
   linguistic representation for document understanding. Pattern Recognition Letters
   41 (2014) 93–102
6. Vicedo, J.L., Ferrandez, A.: Importance of pronominal anaphora resolution in ques-
   tion answering systems. In: Proceedings of the 38th Annual Meeting of the Associ-
   ation for Computational Linguistics (ACL 2000). (2000) 555–562
7. Miller, G.A.: WordNet: A lexical database for English. Communications of the
   ACM 38 (1995) 39–41
8. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische
   mathematik 1(1) (1959) 269–271


                                      1403

</pre>