<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Commercial Applications through Community Question Answering Technology</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonio Uvaz</string-name>
          <email>antonio.uva@unitn.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valerio Storchy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Casimiro Carrinoy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ugo Di Iorioy</string-name>
          <email>ugo.diioriog@rgigroup.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Moschittiz</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>English. In this paper, we describe our experience on using current methods developed for Community Question Answering (cQA) for a commercial application focused on an Italian help desk. Our approach is based on (i) a search engine to retrieve previously answered question candidates and (ii) kernel methods applied to advanced linguistic structures to rerank the most promising candidates. We show that methods developed for cQA work well also when applied to data generated in customer service scenarios, where the user seeks for explanation about products and a database of previously answered questions is available. The experiments with our system demonstrate its suitability for an industrial scenario.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. In questo articolo,
descriviamo la nostra esperienza nell’usare i
metodi attualmente disponibili per il
Community Question Answering (cQA) in
un’applicazione commerciale riguardante
il servizio clienti in lingua italiana. Il
nostro approccio si basa su (i) un
motore di ricerca per recuperare le domande
candidate precedentemente risposte e (ii)
metodi kernel applicati a strutture
linguistiche avanzate per riordinare i candidati
piu` promettenti. Mostriamo che i metodi
sviluppati per il cQA funzionano bene
anche quando applicati ai dati generati
nell’ambito dell’assistenza clienti, dove
l’utente cerca informazioni riguardo a dei
prodotti e una base di dati di domande
precedentemente risposte e` disponibile.
Gli esperimenti sul nostro sistema
dimostrano l’appropriatezza del suo utilizzo
in uno scenario industriale.
specialized in the insurance businesses. One
important task carried out by their help desk software
regards answering customers’ questions using a
ticket system. Already answered tickets are stored
in specialized databases but manually finding and
routing them to the users is time consuming. We
show that our approach, using standard search
engines and advanced reranker based on machine
learning and NLP technology, can achieve answer
recall of almost 85% when considering the top
three retrieved tickets. This is particularly
interesting because the experimented data and models
are completely in Italian, demonstrating the
maturity of this technology also for this language.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Related Work</title>
      <p>
        The first step for any system that aims at
automatically answering questions on cQA sites is to
retrieve a set of questions similar to the user’s
input. Over time, different approaches have been
proposed. Early methods used statistical machine
translation to retrieve similar questions from large
question archivies
        <xref ref-type="bibr" rid="ref9">(Zhou et al., 2011)</xref>
        . Other
approaches
        <xref ref-type="bibr" rid="ref1 ref4">(Cao et al., 2009; Duan et al., 2008)</xref>
        use
language models with smoothing to compute
semantic similarity between two questions. A
different approach that exploits syntactic information
was proposed in
        <xref ref-type="bibr" rid="ref8">(Wang et al., 2009)</xref>
        . The authors
find similar questions by computing similarity
between the syntactic trees of the two questions. In
this work, we use pairs of similar questions to train
our relational model, which detects if two
questions have similar semantics.
      </p>
      <p>From an industrial viewpoint, NLP (and
especially QA) is one of the hot topics of recent years,
although it is still mostly unexplored. Many
platforms are emerging in the wide area of chatbot
development, e.g., Wit.ai and Api.ai (proposed by
Facebook and Google, respectively), which
enable intent classification and entity extraction and
Meya.ai, which can be used to develop rule-based
chatbot systems. However, most of them do not
integrate QA models, with the notable exception of
Expert Systems’ Cogito Answer, recently adopted
by Ing. Direct and Responsa.</p>
    </sec>
    <sec id="sec-3">
      <title>3 The RGI application scenario</title>
      <p>The scope of the experiments for this research is
the evaluation of state-of-the-art QA models to
automatize help desk (HD) processes of RGI. RGI
is an Independent Software Vendor specialized in
the Insurance Industry, counting 800
professionals and 12 offices spread across the EMEA region
(Italy, Ireland, France, Germany, Tunisia and
Luxembourg). Its main product, PASS, is a
modular Policy Administration System that enables the
end-to-end management of Policies, Claims and
Insurance Products configuration across all the
insurance channels and business lines. With 103
installations for the insurance companies and other
300 for the brokers, RGI is a leader of its sector in
the European market.</p>
      <p>The Application Scenario described in this
paper focuses on the HD services for PASS offered
by RGI during the roll-out phase (delivery of the
new system to the clients). The use of effective
and robust QA models is indeed considered by
RGI a crucial aspect for the improvement of the
quality of its HD process, in terms of (i)
reduction of the response time, (ii) enhancement of the
coverage of the services etc., and (iii) general
customer satisfaction.
3.1</p>
      <sec id="sec-3-1">
        <title>Task description</title>
        <p>During the roll-out phase, new users from a client
company start to interact with the PASS system
and, in case of a problem, contact the HD
provided by RGI. This is structured as a hierarchical
organization of operators with different skill
levels, which provide answers to the user requests,
e.g., HD1 involves operators of Level 1 and
regards basic knowledge; HD2 (Level 2) is
managed by functional analysts with higher domain
knowledge and so on. When a request is sent to
an HD operator, a ticket is generated and stored in
a trouble ticketing system along with all the
relevant information of that request: this includes a
description of the problem and the detected
solution. Such ticket will be then managed, passed and
eventually scaled by all the operators involved in
the solution of the problem.</p>
        <p>In order to search and provide the right answer
to the customer, each HD operator may use the
following sources of information: tickets opened in
the past; Frequently Asked Questions (FAQ) and
their solutions, stored in a shared repository; a
forum, where HD operators share their knowledge;
user manuals of the PASS system released for the
client; and domain knowledge and expertise of
the operator itself.</p>
        <p>The objective of this paper is studying the
impact of advanced QA systems for the
automatization of HD1, using FAQ and tickets data stored in
the related repositories.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data description</title>
        <p>Data was gathered from the HD support
system, where technical issues are tracked and fixed.
Basically, we have tickets organized in
Question/Answer (Q/A) pairs, along with fields
related to specific information, such as ticket ID and
the domain problem. The original data size was
around 40,000 tickets but most of them do not
provide useful information. Thus, we designed
a preprocessing phase both to clean and prepare
a valid data set: first, we detected and filtered
out spurious Question-Answer pairs, concerning
unanswered problems, using basic heuristics.
Second, we extracted a subset of general-knowledge
problems by selecting only tickets belonging to
HD1 with a resolution time less than two days. In
addition, our data was also reviewed by an expert
team to further filter out invalid tickets. As a
result, the preprocessing ended with a dataset of 656
Q/A pairs spread over 10 question domains.
Examples of our data are shown in Table 1.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Our QA System</title>
      <p>Our system is constituted by (i) a search engine
to retrieve questions (along with their associated
tickets) similar to the new input question and (ii)
a reranker built with state-of-the-art NLP and
machine learning technology.
4.1</p>
      <sec id="sec-4-1">
        <title>Question and Ticket Retrieval</title>
        <p>We used a standard keyword-based Search Engine
(SE) to retrieve a list of questions from our dataset
similar to the input one. The score produced by
SE is the standard cosine similarity between the
vectors of the new and the candidate questions. In
particular, we built our SE using Lucene TF-IDF
based indexing, available in the open-source
ElasticSearch platform.</p>
        <p>In order to improve the retrieval quality, we
merged user request description (the question) and
solution fields in a single joint text to build the
ticket index. It should be noted that we only used
the question text to build the query for SE as in a
real scenario, the asked question is not associated
with any answer yet.</p>
        <p>For each question, in the filtered data mentioned
above, we created a list of Question original
Question related pairs, by querying each ticket
and collecting the first 10 relevant results. The
obtained clustered data set resulted in a list
hqoriginal; qrelatedi of 656 (tickets) x 10 (retrieved
questions). These pairs were annotated by a team
of experts with relevant vs. irrelevant labels to
create the training and test sets. For example, Table 1
shows a question pair: an original ticket with
question and answer on the left, and a similar retrieved
ticket on the right.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Reranking Pipeline</title>
        <p>
          Given the initial rank provided by SE, we apply
an advanced NLP pipeline to rerank the questions
such that those having the highest probability to be
similar to the query are ranked on the top.
NLP pipeline. We used various Italian NLP
processors of TextPro
          <xref ref-type="bibr" rid="ref7">(Pianta et al., 2008)</xref>
          and
embedded them in a UIMA pipeline, to analyze each
ticket question as well as the questions of the
tickets in the rank. The NLP components
includes, part-of-speech tagging, chunking, named
entity recognition, constituency and dependency
parsing, etc. The result of the processing is used
to produce syntactic representations of the ticket
questions, which are then enhanced by relational
links, e.g., between matching words, between two
questions of a pair. The resulting tree pairs are
then used to train a kernel-based reranker.
Kernel-based reranker. A kernel reranker is a
function r : Q Q ! R, where Q is a set of
questions. Such function tells if questions are similar
or not and can be used to sort a set of questions
qr with respect to an original one qo. These
functions can be implemented in many ways, but in this
work we used (i) a kernel function applied to the
syntactic structure of the pair questions, together
Model
IR baseline
Sim
TK
TK + Sim
with (ii) some features capturing text similarity
between two questions.
        </p>
        <p>Feature Vector model. This feature vector
embeds a set of text similarity features that
capture the relationship between two questions. More
specifically, we compute a total of 20 similarities
such as n-grams, greedy string tiling, longest
common subsequences, Jaccard coefficient, word
containment, cosine similarity and many others.</p>
        <p>
          Tree Kernel model. This model takes in
input two tickets and measures the similarity
between their syntactic trees. In particular, we
build two macro-trees, one for each ticket in the
pair, containing the syntactic trees of sentences
in each ticket question. In addition, we link
two macro-trees by connecting the phrases of
two questions, as done in
          <xref ref-type="bibr" rid="ref3">(Da San Martino et
al., 2016)</xref>
          . Then, we applied Partial Tree Kernel
          <xref ref-type="bibr" rid="ref5">(Moschitti, 2006)</xref>
          and obtain the following kernel:
K(hqo; qrii; hqo; qrij ) = T K(t(qo; qr)i; t(qo; qr)j ),
where qo is the original ticket question and qr are
the questions of similar tickets. In contrast, the
function t(x; y) extract the syntactic tree from the
text x, enriching it with REL tags.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5 Experiments</title>
      <p>
        To evaluate our approach, we performed
experiments on a dataset composed of 6; 650 pairs of
ticket questions annotated with similarity
judgment, i.e., Relevant and Irrelevant. We selected
only questions having at least one answer in the
first 10 retrieved tickets. We performed 5-fold
cross-validation and used SVM-Light-TK1
software to train 5 different reranking models.
SVMLight-TK allows us to learn a reranking model that
combines both feature vectors and Tree Kernels.
The latter are especially useful because avoid the
burden of manually engineering feature for this
task. A more detailed description of the Tree
Kernel models and Text Similarity features employed
by the model is reported in
        <xref ref-type="bibr" rid="ref3">(Da San Martino et al.,
2016)</xref>
        . Then, we used the learned model to
pre1http://disi.unitn.it/moschitti/Tree-Kernel.htm
dict similarities for all pairs of questions present
in each test fold.
5.1
      </p>
      <sec id="sec-5-1">
        <title>Results</title>
        <p>We conducted three experiments to assess the
effectiveness of the different feature sets, similarity
features (Sim), TK and TK+Sim in the reranking
model. The baseline is computed by means of the
rank given by Lucene. Following previous work of
the SemEval challenge, we evaluated our ranking
with Mean Average Precision (MAP), Mean
Reciprocal Rank (MRR) and Precision at k (P@k).</p>
        <p>The results are reported in Tab. 2. As it can
be seen, the best results are obtained by
combining Sim and TK in the reranker, which improved
the MRR and MAP of the IR baseline by 4:22
and 5:33 absolute points, respectively. In
addition, P@1, @2 and @3 improved by 3:87, 6:08
and 6:71 absolute points, respectively. This shows
the effectiveness of using syntactic structures in
powerful algorithms such as TK.</p>
        <p>We analyzed some selected errors of our
system, focusing on the cases where the search
engine performs better than our reranking model.
We note that for each cluster of question
originalquestion related pairs, when the P@1 is high, our
model does not perform better than the search
engine, or performs even worse. However, our
reranking model always tends to push relevant
results on the top.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6 Conclusions</title>
      <p>In this paper, we have described our experience in
building a QA model for an Italian help desk in the
field of insurance policies. Our main findings are:
(i) the Italian NLP technology seems enough
accurate to support advanced cQA technology based on
syntactic structures; (ii) cQA model can boost the
retrieval systems targeting text in Italian; and (iii)
the achieved accuracy seems appropriate to create
business at least in the filed of help desk
applications, although it should be considered that our
results refer to only questions having an answer in
our database.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Xin</given-names>
            <surname>Cao</surname>
          </string-name>
          , Gao Cong, Bin Cui, Christian Søndergaard Jensen,
          <string-name>
            <given-names>and Ce</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>The use of categorization information in language models for question retrieval</article-title>
          .
          <source>In Proceedings of the 18th ACM conference on Information and knowledge management</source>
          , pages
          <fpage>265</fpage>
          -
          <lpage>274</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Annalina</given-names>
            <surname>Caputo</surname>
          </string-name>
          , Marco de Gemmis, Pasquale Lops, Francesco Lovecchio,
          <source>Vito Manzari, and Acquedotto Pugliese AQP Spa</source>
          .
          <year>2016</year>
          .
          <article-title>Overview of the evalita 2016 question answering for frequently asked questions (qa4faq) task</article-title>
          . In CLiC-it/EVALITA.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Giovanni</surname>
            Da San Martino, Alberto Barro´n Ceden˜o, Salvatore Romeo, Antonio Uva, and
            <given-names>Alessandro</given-names>
          </string-name>
          <string-name>
            <surname>Moschitti</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Learning to re-rank questions in community question answering using advanced features</article-title>
          .
          <source>In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management</source>
          , pages
          <fpage>1997</fpage>
          -
          <lpage>2000</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Huizhong</given-names>
            <surname>Duan</surname>
          </string-name>
          , Yunbo Cao,
          <string-name>
            <surname>Chin-Yew Lin</surname>
            , and
            <given-names>Yong</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Searching questions by identifying question topic and question focus</article-title>
          .
          <source>In ACL</source>
          , volume
          <volume>8</volume>
          , pages
          <fpage>156</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Moschitti</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Efficient convolution kernels for dependency and constituent syntactic trees</article-title>
          .
          <source>In ECML</source>
          , volume
          <volume>4212</volume>
          , pages
          <fpage>318</fpage>
          -
          <lpage>329</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Preslav</given-names>
            <surname>Nakov</surname>
          </string-name>
          , Doris Hoogeveen, Llu´ıs Ma`rquez, Alessandro Moschitti, Hamdy Mubarak, Timothy Baldwin, and
          <string-name>
            <given-names>Karin</given-names>
            <surname>Verspoor</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Semeval-2017 task 3: Community question answering</article-title>
          .
          <source>In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)</source>
          , pages
          <fpage>27</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Emanuele</given-names>
            <surname>Pianta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Christian</given-names>
            <surname>Girardi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Zanoli</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>The textpro tool suite</article-title>
          .
          <source>In LREC.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Kai</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Zhaoyan Ming</surname>
          </string-name>
          , and
          <string-name>
            <surname>Tat-Seng Chua</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>A syntactic tree matching approach to finding similar questions in community-based qa services</article-title>
          .
          <source>In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>187</fpage>
          -
          <lpage>194</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Guangyou</given-names>
            <surname>Zhou</surname>
          </string-name>
          , Li Cai,
          <string-name>
            <given-names>Jun</given-names>
            <surname>Zhao</surname>
          </string-name>
          , and Kang Liu.
          <year>2011</year>
          .
          <article-title>Phrase-based translation model for question retrieval in community question answer archives</article-title>
          .
          <source>In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume</source>
          <volume>1</volume>
          , pages
          <fpage>653</fpage>
          -
          <lpage>662</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>