<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Feature Engineering for Italian Question Answering Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonio Uva</string-name>
          <email>antonio.uva@unitn.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Moschitti</string-name>
          <email>amoschitti@qf.org.qa</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DISI, University of Trento</institution>
          ,
          <addr-line>38123 Povo (TN)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Qatar Computing Research Institute</institution>
          ,
          <addr-line>5825 Doha</addr-line>
          ,
          <country country="QA">Qatar</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we propose automatic feature engineering for Italian QA systems. Our approach only requires a shallow syntactic representation of the questions and the answer passages. We apply Support Vector Machines using tree kernels to such trees for automatically generating relational syntactic patters, which signi cantly improve on BM25 retrieval models.</p>
      </abstract>
      <kwd-group>
        <kwd>Question Answering</kwd>
        <kwd>Support Vector Machines</kwd>
        <kwd>Kernel Methods</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Question Answering (QA) systems can be a valuable solution to the problem of
information overload as they are e ective tools for searching relevant information
from unstructured text. QA systems di er from traditional search engines since
they accept questions expressed in natural language. The major challenge in QA
research is the design of e ective answer search and extraction modules that can
exploit the relations between the input question and the passages containing the
answers.</p>
      <p>Such relations or patterns can be used to decide if a retrieved passage does
contain the correct answer. In past QA work, these patterns were mainly designed
manually with consequently high engineering cost. However, machine learning
has made this process much easier by enabling automatic pattern engineering.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we presented an automatic feature engineering approach based on
support vectors machines using tree kernels for ranking answer passages. This
approach consists of the following steps: (i) the set of possible candidate answers
for all the input questions are retrieved by means of a search engine; (ii) each
question is paired with all its candidate answer passages: positive pairs contain
the correct answers and all the others are considered negative pairs; (iii) the
pairs are represented with two syntactic trees: one for the question and the
other for the candidate answer; and (iv) an SVM classi er is trained for ranking
the answer passages represented as trees.
      </p>
      <p>In this paper, we present a similar system that can rerank answer passages
for factoid questions in Italian. This system is built on top of the Unstructured
Information Management Architecture (UIMA3) framework developed by IBM4.
UIMA eases the task of assembling many text annotators together for
performing di erent types of analysis over many text documents. These analytics are
then used to encode questions and answers as linguistic structures and train the
reranking module for our QA pipeline.
2
2.1</p>
      <sec id="sec-1-1">
        <title>QA system</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Learning to rank relevant documents</title>
      <p>The QA system has a simple architecture: it takes in input a question and
retrieves a list of candidates passages from the indexed dump of the Italian
Wikipedia. Such list is then reranked by its relevancy to the input question.
The analysis of the question together with its candidate answers (e.g. PoS tags,
Chunking, Named Entity, and many others) is performed by using the TextPro
suite of NLP components for the Italian language. TextPro has been integrated
as a stand-alone annotator in our UIMA pipeline. The produced annotations
are used to build the tree representations of both questions and answers. The
resulting question/answer tree pairs are used to train a classi er able to rank
candidate passages according to their relevancy with the input question. The
learned model is then used to improve the ordering of the answer passages
provided by the search engine.
2.2</p>
      <sec id="sec-2-1">
        <title>Answer reranking</title>
        <p>
          Our goal is to rank text passages containing the correct answer higher in the
list than irrelevant passages. For this purpose, we use the model we presented in
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], which is based on preference ranking [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. This treats the reranking problem
as a binary classi cation task, where each problem instance is a pair, (p1; p2),
of question/answer pairs, i.e., p1 = (q; a1) and p2 = (q; a2). Positive training
instances are pairs such that a1 is a relevant passage and a2 is an irrelevant
passage otherwise (p1; p2) is considered a negative example.
        </p>
        <p>These pairs can be used to train a binary classi er and build a reranking
model. This is later used at classi cation time for reranking the q/a pairs
representing the test instances by simply using the classi er as a voter: a positive
classi cation is a vote for a1 whereas negative outcome is a vote for a2. The
more an answer receives votes the higher its rank will be.
2.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Q/A pair representation</title>
        <p>
          In our model, questions and answer passages are encoded as shallow syntactic
trees we introduced in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. In each tree, the word lemmas constitute the terminal
nodes and the Part-of-Speech (PoS) tags associated with each word constitute
3 https://uima.apache.org/
4 http://www.ibm.com/us/en/
        </p>
        <p>Italian Question Answering systems
the pre-terminal nodes. Also, the words are organized in constituents by adding
an additional layer of chunk nodes. As the chunk of text spans several words,
the chunk node is connected to the PoS nodes of its words. The sentence node
is located at the top level and is linked to the chunk nodes. A ROOT node is
used to connect several sentence nodes. In addition, we encoded the relationships
between the question and answer trees by means of a special tag REL.</p>
        <p>
          The strategy used to establish the REL nodes is very simple: if two trees
share the same terminal node (word lemma) then we mark both the node parent
and grandparent with the REL tag. The REL approach leads to more accurate
results [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>For our experiments we used factoid questions from the open-domain corpus
TREC. We collected a subset of the questions from TREC 8, TREC 9, TREC
2000, TREC 2001 and TREC 2002 for a total of 1228 questions. An expert
annotator translated the questions and answer gold keywords from English to
Italian. The answers were searched in the Italian Wikipedia, thus we train our
reranker on such data.</p>
      <p>Speci cally, we split the Wikipedia corpus in paragraphs and considered each
of them as a separate document to be indexed by an o -the-shelf search engine.
After performing some text cleaning, we were able to collect a total of 10 million
documents. We used Lucene with the BM25 scoring function for indexing and
retrieval.</p>
      <p>We trained our rerankers with the rst 10 candidate answers retrieved by the
search engine for each question of the train set. At test time, we retrieved a list
of top 40 candidates for each test question and reranked them.
3.1</p>
      <sec id="sec-3-1">
        <title>Metrics</title>
        <p>In order to evaluate our systems we used the metrics most frequently used in
QA: Precision at rank 1 (P@1) corresponds to the percentage of relevant
documents ranked at position 1, Mean Reciprocal rank (MRR) and Mean Average
Precision (MAP). The reported metrics are computed by conducting a 3-folds
cross validation.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Results</title>
        <p>
          The following table reports the performance of the reranking models trained
using di erent strategies:
1. the baseline model using the BM25 score of the search engine;
2. the reranker model trained only using feature vectors containing text
similarity measures5;
5 We used the same String-based and Character/word n-grams features included in
the system that performed best in the Semantic Textual Similarity (STS) task at
SemEval-2012. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
3. the reranker model trained using syntactic trees and feature vectors.
        </p>
        <p>Models
BM25
Feature vectors
Tree + Feature vectors</p>
        <p>MAP
0.18
0.21
0.25</p>
        <p>MRR
23.11
26.85
30.74</p>
        <p>As can be seen from the results reported in Tab. 1, the reranking model using
structural representations yields an improvement of about 3 absolute points
in MAP, MRR and P@1 when compared with the vector model and about 7
absolute points when compared with the baseline model. It is interesting to note
that we did not operate any adjustment of the tree kernel model, we simply
build an Italian pipeline and trained our models.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>In this paper we showed an approach to QA requiring no manual feature
engineering. Its main characteristic is the use of tree kernels for exploiting syntactic
representations of question and answer passage pairs.</p>
      <p>In the future, we would like to assess the performance of the reranking model
using structural representations that can take into account additional
information such as the category and the lexical answer type of the question.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Severyn</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nicosia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Moschitti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2013</year>
          ,
          <article-title>August)</article-title>
          .
          <article-title>Learning adaptable patterns for passage reranking</article-title>
          .
          <source>In Proceedings of the Seventeenth Conference on Computational Natural Language Learning</source>
          (pp.
          <fpage>75</fpage>
          -
          <lpage>83</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2002</year>
          ,
          <article-title>July)</article-title>
          .
          <article-title>Optimizing search engines using clickthrough data</article-title>
          .
          <source>In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          (pp.
          <fpage>133</fpage>
          -
          <lpage>142</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Severyn</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Moschitti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2012</year>
          ,
          <article-title>August)</article-title>
          .
          <article-title>Structural relationships for large-scale learning of answer re-ranking</article-title>
          .
          <source>In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval</source>
          (pp.
          <fpage>741</fpage>
          -
          <lpage>750</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Pianta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girardi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zanoli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2008</year>
          , May).
          <article-title>The TextPro Tool Suite</article-title>
          . In LREC.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. BR,
          <string-name>
            <surname>Daniel</surname>
          </string-name>
          , et al. Ukp:
          <article-title>Computing semantic textual similarity by combining multiple content similarity measures</article-title>
          .
          <source>In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation. Association for Computational Linguistics</source>
          ,
          <year>2012</year>
          . p.
          <fpage>435</fpage>
          -
          <lpage>440</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>