<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Distributional Semantics for Answer Re-ranking in Question Answering?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Piero Molino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierpaolo Basile</string-name>
          <email>a@10</email>
          <email>a@30</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Annalina Caputo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pasquale Lops</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Semeraro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer Science - University of Bari Aldo Moro Via Orabona</institution>
          ,
          <addr-line>4 - I-70125, Bari</addr-line>
          ,
          <country country="IT">ITALY</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper investigates the role of Distributional Semantic Models (DSMs) into a Question Answering (QA) system. Our purpose is to exploit DSMs for answer re-ranking in QuestionCube, a framework for building QA systems. DSMs model words as points in a geometric space, also known as semantic space. Words are similar if they are close in that space. Our idea is that DSMs approaches can help to compute relatedness between users' questions and candidate answers by exploiting paradigmatic relations between words, thus providing better answer reranking. Results of the evaluation, carried out on the CLEF2010 QA dataset, prove the effectiveness of the proposed approach.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>This paper aims at exploiting DSMs for performing a task to which they have
never been applied before, i.e. candidate answers re-ranking in Question
Answering (QA), exploring how to integrate them inside a pre-existent QA system. Our
insight is based on the ability of these spaces to capture paradigmatic relations
between words which should result in a list of candidate answers related to the
user’s question.</p>
      <p>
        In order to test the effectiveness of the DSMs for QA, we rely on a pre-existent
QA framework called QuestionCube1 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. QuestionCube is a general framework
for building QA systems which exploits NLP algorithms, for both English and
Italian, in order to analyze questions and documents with the purpose of
allowing candidate answers obtained from the retrieved documents to be re-ranked by
a pipeline of scorers. Scores assign a score to a candidate answer taking into
account several linguistic and semantic features. Our strategy for exploiting DSMs
consists in adding a new scorer to this pipeline, based on vector spaces built
using DSMs. In particular, we propose four types of spaces: a classical Term-Term
co-occurrence Matrix (TTM) used as baseline, Latent Semantic Analysis (LSA)
applied to TTM, Random Indexing (RI) approach to reduce TTM dimension,
and finally an approach which combines LSA and RI. The scorer will assign a
score based on the similarity between the question and the candidate answers
inside the DSMs.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>
        QuestionCube is a multilingual QA framework built using NLP and IR
techniques. Question analysis is carried out by a full-featured NLP pipeline. The
passage search step is carried out by Lucene, a standard off-the-shelf retrieval
framework that allows TF-IDF and BM25 weighting. The question re-ranking
component is designed as a pipeline of different scoring criteria. We derive a
global re-ranking function combining the scores with CombSum. More details
on the framework and a description of the main scorers is reported in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The
only scorers employed in the evaluation are: Terms Scorer, Exact Sequence
Scorer and Density Scorer, a scorer that assign a score to a passage based
on the distance of the question terms inside it. All the scorers have an enhanced
version which adopts the combination of lemmas and PoS tags as features.
      </p>
      <p>Our DSMs are constructed over a co-occurrence matrix. The linguistic
context taken into account is a window w of co-occurring terms. Given a reference
corpus2 and its vocabulary V , a n n co-occurrence matrix is defined as the
matrix M = (mij ) whose coefficients mij 2 R are the number of co-occurrences of
the words ti and tj within a predetermined distance w. The term term matrix
M, based on simple word co-occurrences, represents the simplest semantic space,
called Term-Term co-occurrence Matrix (TTM). In literature, several methods to
approximate the original matrix by rank reduction have been proposed. The aim
of these methods varies from discovering high-order relations between entries to
1 www.questioncube.com
2 In our case the collection of documents indexed by the QA system.</p>
      <p>
        Distributional Semantics for Answer Re-ranking in Question Answering
improving efficiency by reducing its noise and dimensionality. We exploit three
methods for building our semantic spaces: Latent Semantic Analysis (LSA),
Random Indexing [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (RI) and LSA over RI (LSARI). LSARI applies the SVD
factorization to the reduced approximation of M obtained through RI. All these
methods produce a new matrix M^ , which is a n k approximation of the
cooccurrence matrix M with n row vectors corresponding to vocabulary terms,
while k is the number of reduced dimensions. We integrate the DSMs into the
framework creating a new scorer, the Distributional Scorer, that represents
both question and passage by applying addition operator to the vector
representation of terms they are composed of. Furthermore, it is possible to compute the
similarity between question and passage exploiting the cosine similarity between
vectors using the different matrices.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>The goal of the evaluation is twofold: (1) proving the effectiveness of DSMs
into our question answering system and (2) providing a comparison between the
several DSMs.</p>
      <p>
        The evaluation has been performed on the ResPubliQA 2010 Dataset adopted
in the 2010 CLEF QA Competition [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The dataset contains about 10,700
documents of the European Union legislation and European Parliament
transcriptions, aligned in several languages including English and Italian, with 200
questions. The adopted metric is the accuracy a@n (also called success@n),
calculated considering only the first n answers. If the correct answer occurs in the
top n retrieved answers, the question is marked as correctly answered. In
particular, we take into account several values of n =1, 5, 10 and 30. Moreover, we
adopt the Mean Reciprocal Rank (MRR) as well, that considers the rank of the
correct answer. The framework setup used for the evaluation adopts Lucene as
document searcher, and uses a NLP Pipeline made of a stemmer, a lemmatizer,
a PoS tagger and a named entity recognizer. The different DSMs and the classic
TTM have been used as scorers alone, which means no other scorers are adopted
in the scorers pipeline, and combined with the standard scorer pipeline
consisting of the Simple Terms (ST), the Enhanced Terms (ET), the Enhanced Density
(ED) and the Exact Sequence (E) scores. Moreover, we choosed empirically the
parameters for the DSMs: the window w of terms considered for computing the
co-occurrence matrix is 4, while the number of reduced dimensions considered
in LSA, RI and LSARI is equal to 1,000.
      </p>
      <p>The performance of the standard pipeline, without the distributional scorer,
is shown as a baseline. The experiments have been carried out both for English
and Italian. Results are shown in Table 1, witch reports the accuracy a@n
computed considering a different number of answers, the MRR and the significance
of the results with respect to both the baseline (y) and the distributional model
based on TTM (z). The significance is computed using the non-parametric
Randomization test. The best results are reported in bold.</p>
      <p>Run</p>
      <p>TTM
en RI
o
la LSA
ed TTM
n
ib RI
m
co LSA
0.060 0.145 0.215 0.345 0.107</p>
      <p>Considering each distributional scorer on its own, the results prove that all
the proposed DSMs are better than the TTM, and the improvement is always
significant. The best improvement for the MRR in English is obtained by LSA
(+180%), while in Italian by LSARI (+161%). Taking into account the
distributional scorers combined with the standard scorer pipeline, the results prove that
all the combinations are able to overcome the baseline. For English we obtain an
improvement in MRR of about 16% with respect to the baseline and the result
obtained by the TTM is significant. For Italian, we achieve a even higher
improvement in MRR of 26% with respect to the baseline using LSARI. The slight
difference in performance between LSA and LSARI proves that LSA applied to
the matrix obtained by RI produces the same result of LSA applied to TTM,
but requiring less computation time, as the matrix obtained by RI contains less
dimensions than the TTM matrix.</p>
      <p>Finally, the improvement obtained considering each distributional scorers on
its own shows a higher improvement than their combination with the standard
scorer pipeline. This suggests that a more complex method to combine scorers
should be used in order to strengthen the contribution of each of them. To this
purpose, we plan to investigate some learning to rank approaches as future work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Kanerva</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Sparse Distributed Memory</article-title>
          . MIT Press (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Molino</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>QuestionCube: a Framework for Question Answering</article-title>
          . In: Amati,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Carpineto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Semeraro</surname>
          </string-name>
          ,
          <string-name>
            <surname>G</surname>
          </string-name>
          . (eds.)
          <source>IIR. CEUR Workshop Proceedings</source>
          , vol.
          <volume>835</volume>
          , pp.
          <fpage>167</fpage>
          -
          <lpage>178</lpage>
          . CEUR-WS.org (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Penas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forner</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodrigo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutcliffe</surname>
            ,
            <given-names>R.F.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forascu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mota</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Overview of ResPubliQA 2010:
          <article-title>Question Answering Evaluation over European Legislation</article-title>
          . In: Braschler,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Harman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Pianta</surname>
          </string-name>
          , E. (eds.)
          <source>Working notes of ResPubliQA 2010 Lab at CLEF</source>
          <year>2010</year>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>