<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dialog-based Help Desk through Automated Question Answering and Intent Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonio Uvaz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierluigi Robertiy</string-name>
          <email>P@1</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Moschittiz</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Modern personal assistants require to access unstructured information in order to successfully fulfill user requests. In this paper, we have studied the use of two machine learning components to design personal assistants: intent classification, to understand the user request, and answer sentence selection, to carry out question answering from unstructured text. The evaluation results derived on five different real-world datasets, associated with different companies, show high accuracy for both tasks. This suggests that modern QA and dialog technology is effective for real-world tasks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>I moderni personal assistant richiedono di
accedere ad informazioni non strutturate
per soddisfare con successo le richieste
degli utenti. In questo articolo,
abbiamo studiato l’uso dell’ apprendimento
automatico per progettare due componenti
di un personal assistant: classificazione
degli intenti, per comprendere la richiesta
dell’utente, e la selezione della frase di
risposta per rispondere alle domande con
testo non strutturato. I risultati della
valutazione derivati da cinque diversi datasets
del mondo reale, associati a diverse
societa`, mostrano un’elevata precisione per
entrambi i modelli. Cio` suggerisce che la
moderna tecnologia di question answering
e dialogo e` efficace per attivita` reali.</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction</title>
      <p>Help-desk applications use Machine Learning to
classify user’s request into intents. The
informa0Copyright ©2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
tion owned by companies generally is in free text
form, from company’s documents or websites. For
example, corporate knowledge is typically encoded
within documents in an unstructured way. This
poses limitations on the effectiveness of standard
information access. For example, searching
documents by keywords is not a viable solution for the
users, as they seldom can find an answer to their
questions. The possibility of using QA systems to
search for information on a corpus of documents,
also through a dialogue system, offers an attractive
solution for extracting the best information from
the company knowledge bases.</p>
      <p>IMSL company offers virtual agents that can be
retrained based on the customer needs. The agent is
composed of many Natural Language
Understanding components, such as classifiers that map each
user utterance in input to their corresponding
intent. However, since it is not possible to forecast
all the intents corresponding to the questions that
the user are going to ask – which are potentially
infinite – it is of paramount importance to have an
automated QA system able to automatically
provide the best answer (paragraph) extracted from a
company owned knowledge base.</p>
      <p>Information access is becoming an increasingly
critical issue. Traditional Information Retrieval
systems, used in industry, help the user in accessing
information, but are often imprecise and
impractical. Current search engines are an example of
this. Searching for information on the web often
requires a double effort for the user: first it is
necessary to understand how to formulate a query in
the most effective manner, and then filter out the
proposed results in order to find the most relevant
information.</p>
      <p>In this paper, we described our QA system based
on answer sentence selection and intent detection,
and how we integrate them in a Conversational
agent.</p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>As today, the largest part of general-purpose QA
services are provided by big tech companies such
as Amazon Alexa, Google Home, Ask Yahoo!,
Quora and many others. Unfortunately, these types
of applications are not easily accessible for smaller
companies, as the offered QA service cannot be
easily adapted to handle corporate knowledge, which
is in form of unstructured text. To build their own
solutions SMEs can exploit QA components such
as Answer Sentence Selection.</p>
      <p>
        In recent years, deep learning approaches have
been successfully applied for automatically
modeling text pairs, e.g.,
        <xref ref-type="bibr" rid="ref11 ref20">(Lu and Li, 2013; Yu et al.,
2014)</xref>
        . Additionally, a number of deep learning
models have been recently applied to QA, e.g., Yih
et al. (2013) applied CNNs to open-domain QA;
Bordes et al. (2014) propose a neural embedding
model combined with the knowledge base for
opendomain QA. Iyyer et al. (2014) applied recursive
neural networks to factoid QA over paragraphs.
Miao et al. (2016) proposed a neural variational
inference model and a Long-short Term Memory
network for the same task. Yin et al. (2016)
proposed a siamese convolutional network for
matching sentences that employ an attentive average
pooling mechanism, obtaining state-of-the-art results in
various tasks and datasets.
      </p>
      <p>The work closest to this paper is by Yu et al.
(2014) and Severyn and Moschitti (2015). The
former presented a CNN architecture for answer
sentence selection that uses bigram convolution and
average pooling, whereas the latter use convolution
with k-max pooling.</p>
      <p>Nowadays, supporting customers in their
activities across applications and websites is becoming
always more demanding, due a large number of
customers and the variety of topics that have to be
covered.</p>
      <p>New tools, such as chatbots, able to answer
frequently asked questions, i.e., FAQs, are rising in
response to this needs. Classifying the user need
expressed in a natural question, into a predefined
set of categories, allow conversational agents to
recognize which users are asking which types of
questions and to react accordingly.</p>
      <p>
        Traditional approaches to this problem include
the use supervised approaches such as Support
Vector Machines (SVM)
        <xref ref-type="bibr" rid="ref3">(Cortes and Vapnik, 1995)</xref>
        ,
Boosting
        <xref ref-type="bibr" rid="ref14 ref5">(Iyer et al., 2000; Schapire and Singer,
2000)</xref>
        , Kernel machines operating on input
structured objects
        <xref ref-type="bibr" rid="ref10 ref13">(Moschitti, 2006; Lodhi et al., 2002)</xref>
        and Maximum Entropy models
        <xref ref-type="bibr" rid="ref17">(Yaman et al.,
2008)</xref>
        .
      </p>
      <p>
        In the latest years, new models such as Recurrent
Neural Network (RNN), Long Short Term Memory
(LSTM)
        <xref ref-type="bibr" rid="ref3">(Cortes and Vapnik, 1995)</xref>
        , Gated
Recurrent Unis (GRU)
        <xref ref-type="bibr" rid="ref2">(Chung et al., 2014)</xref>
        and
Convolutional Neural Networks (CNN)
        <xref ref-type="bibr" rid="ref8 ref9">(Lecun et al., 1998;
Kim, 2014)</xref>
        were established as state-of-the-art
approaches for text classification.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>System Description</title>
      <p>Our QA system allows for extracting portions of
text from company documents or from websites.
This information is then organized into paragraphs,
which are then used to provide an answer to the
user’s questions.</p>
      <p>One practical problem is the fact that not all PDF
files encode text, and many fail to preserve the
logical order of the text. Thus, in order to extract
paragraphs, we used pdf2text.</p>
      <p>Another practical problem we need to solve was
to keep portions of text separated by punctuation
together: such as bullet lists or very structured
paragraphs. Our designed tool automatically assigns a
reference index or summary to each paragraph to
improve subsequent searches (see Figure 1).</p>
      <p>Subsequently, each question and answer
pair must be annotated with correctness (label
TRUE/FALSE). This allows us to create a
training set to train the re-ranking network (see Figure
2).</p>
      <p>The final system, shown in Figure 3, therefore
allows for using the target company data,
appropriately reorganized into paragraphs, to provide
answers to the user’s request. On average we provide
from 3 to 5 answers for each question. However,
we also provide the reference to the document and
the summary which the paragraph refers to.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Answer Sentence Selection (AS2)</title>
      <p>The AS2 goal is to rank a list of answer candidates
by their similarity with respect to an input question
qi. We design a network that includes relational
information between questions and answers. Our
results show that CNNs reach better performance
than traditional IR models based on bag of words.
4.1</p>
      <sec id="sec-5-1">
        <title>Model</title>
        <p>The architecture of the network used for mapping
sentences in embedding vectors is showed in
Figure 4 and is inspired to the CNNs employed by
Severyn and Moschitti (2015) to perform many
classification activities over sentences. It includes
two main components:</p>
        <p>(i) an encoder that map an input document si into
a vector xsi and (ii) a feed-forward network that
computes the similarity between input sentences.</p>
        <p>Our network takes two sentences in input, i.e., a
question and a text paragraph that may contain an
answer, and it represents each of them into vectors
of fixed-size dimension xs 2 Rm.</p>
        <p>
          The sentence model is composed of a sequence
of convolutional maps followed by some pooling
operations. Such model achieves the state of the
art in many NLP tasks
          <xref ref-type="bibr" rid="ref7 ref8">(Kalchbrenner et al., 2014;
Kim, 2014)</xref>
          .
        </p>
        <p>Then, the sentence vectors, xsi corresponding
to the questions and answers, are concatenated
together and passed to the following neural network
layers. These are composed of a non-linear hidden
layer and an output layer with a sigmoid activation
unit. At the end, the network returns a value
between 0 and 1 corresponding to the relevancy of
the answer with respect to the question.</p>
        <p>
          Finally, we included word overlap embeddings
encoding relational information between words
in questions and answers
          <xref ref-type="bibr" rid="ref12 ref16">(Severyn and Moschitti,
2016)</xref>
          .
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5 Intent Classification</title>
      <p>We adopted advanced techniques, such as deep
learning models, to classify the user need, which is
semantically expressed by the user question, into a
predefined set of categories, i.e., intents.</p>
      <p>We used some common deep learning models for
solving the intent detection task. The main point
of our study is to test those models and observe
how they perform on datasets containing real user
questions addressed to a virtual agent, operating in
the banking/financial sector.</p>
      <p>
        At this stage, we dot not consider novel methods
based on transformer architecture such as BERT
        <xref ref-type="bibr" rid="ref4">(Devlin et al., 2019)</xref>
        , which require a a large
amount of resources, typically not avialable to
SMEs. Instead, we focused on lighter approaches
that can run on small GPUs. We report our
experiments and discuss the obtained results using such
lighter models.
5.1
      </p>
      <sec id="sec-6-1">
        <title>Models</title>
        <p>SVM (baseline) fed with word features, derived
from the text of the utterances.</p>
        <p>LSTM using recurrent units that take in input
the embedding xt of the current word at time step
t and the hidden vector encoding the sub-phrase
at previous step, i.e., ht 1, and return the vector
representation of the phrase at step ht</p>
        <p>CNN uses a set of convolutional filters of
different size and max pooling operations to extract
the most important features, e.g., bigrms, trigams,
etc. . . , which represent the sentence meaning.</p>
        <p>LSTM + CNN based on an architecture
composed of two layers: an LSTM layer that builds
a fixed-size vector representation of the sentence
at each word, and a convolutional layer. The
latter applies a set of convolutional operations on the
representations returned by the first layer.</p>
        <p>CNN + CNN composed of two CNN layers,
where the second layer takes the previous layer
representation as input, and applies a set of
convolutional filters and pooling operation to compute
the final vector representation of the sentence.
6</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Experiments</title>
      <p>In this section, we first describe the datasets we
used in our experiments, then we provide the results
on the answer sentence selection and the intent
classification tasks. Finally, we report an
end-toend evaluation of our system.
6.1</p>
      <sec id="sec-7-1">
        <title>Data Description</title>
        <p>We built our datasets by collecting samples of
questions asked by users to conversational agent for
either Credit Institution or Bank websites. We
collected two intent corpora from each data provider,
resulting in a total of four datasets.</p>
        <p>Istituto Credito - synthetic (ICs): This corpus
was created by expert dialog engineers. It contains
a set of utterances annotated with their
corresponding intents. The subject of questions are diverse
and spans over many topics. For example, some
questions seek information over the bank branch
locations, problems regarding how to cash checks,
and requests of availability of finance products. It
contains 2,305 training examples, and 593 test
examples, for a total of 2,898 examples.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Istituto Credito - full (ICf ): This dataset is</title>
      <p>composed of synthetic questions, generated by
language engineers. Subsequently, it has been
augmented to take into account also real sentences,
retrieved from website chat-bot of a well known
Credit Institution operating in Italy. It contains
2,898 training examples and 770 test examples, for
a total of 3,668 examples.</p>
    </sec>
    <sec id="sec-9">
      <title>Banca - Area Informativa (BancaAI ): This</title>
      <p>dataset contains real questions asked by users about
the Area Informativa of a bank. It includes 3,947
training examples, and 987 test examples, for a
total of 4,934 examples divided in 282 intents.</p>
    </sec>
    <sec id="sec-10">
      <title>Banca - Internet Banking (BancaIB): This</title>
      <p>dataset includes real questions asked by users about
the iBanking service offered from a well known
Italian bank. It includes 4,380 training instances
and 1,906 test instances divided in 251 intents.</p>
      <sec id="sec-10-1">
        <title>Answer Sentence Selection data: We used an</title>
        <p>in-house dataset called IMSL-WIKI, which
contains a list of question and answer regarding some
of the products and services sold by IM Service Lab.
For each question, a paragraph list was collected
using an off-the-shelf search engine, i.e., Lucene, and
manually annotated as either relevant or irrelevant.
The dataset is divided into two parts, i.e., a training
and test sets, which contain a total of 5,190 and
1,240 QA pairs, respectively. For each question,
we retrieved a list of 10 candidate answers.
6.2</p>
      </sec>
      <sec id="sec-10-2">
        <title>Model results</title>
        <p>In this section we report the performance of our two
main machine learning components of our system:
Answer Sentence Selection and Intent
Classification.</p>
      </sec>
      <sec id="sec-10-3">
        <title>6.2.1 Answer Sentence Reranking</title>
        <p>Table 1 reports the performance of the neural
network and the baseline system. The first row, i.e.,
BM25, shows the baseline system, while the
second row shows the performance of the CNN. The
systems are evaluated according to the Mean
Average Precision (MAP), Mean Reciprocal Rank
(MRR) and Precision at 1 (P@1). The final results
reported at the bottom is obtained as the average of
5 different models trained and evaluated on the test
set. For each measure in the table, we report both
mean and standard deviation computed on dev. and
test sets.</p>
        <p>We used a small fraction of the training set, i.e.,
15% of the data, for early stopping. As it can be
seen from the table, CNN performs about 1 point
more than the baseline algorithm (BM25) in terms
of MAP on the dev. set, and almost 10 absolute
points more of MAP on the test set.</p>
        <p>In addition, we observe an increase of 9.8
absolute points in terms of MRR, and 10.65 absolute
points of P@1 on the test set. The difference
between results on dev. and test sets can be explained</p>
        <sec id="sec-10-3-1">
          <title>Baseline (SVM) CNN CNN + CNN LSTM</title>
          <p>LSTM + CNN</p>
        </sec>
        <sec id="sec-10-3-2">
          <title>Models</title>
        </sec>
        <sec id="sec-10-3-3">
          <title>Baseline (SVM) CNN CNN + CNN LSTM</title>
          <p>LSTM + CNN</p>
          <p>ICs
by the fact that the used dev. set is very small: only
124 list of questions and 1,239 Q/A pairs, which
made it difficult to optimize the three ranking
metrics at the same time, so we focused on MAP.</p>
        </sec>
      </sec>
      <sec id="sec-10-4">
        <title>6.2.2 Intent Classification</title>
        <p>We ran state-of-the-art neural classifiers described
in Section 6.2.1 on Credit Institute and Bank
datasets. To choose the best performance, we used
30% of training data as validation set and select
the best hyperparameters. We compare the
performance of neural models with respect to strong
baseline classifiers, i.e., SVMs, and report the results
in terms of Accuracy (Table 2) and F1 (Table 3).
The tables show that the final performance heavily
depends on the used dataset and models.
Istituto Credito (IC) datasets. Regarding the IC
synthetic dataset, the best model, i.e., LSTM+CNN,
obtains Accuracy of 77.37 and a micro-avg F1 of
0.7742. This is about one absolute point of
Accuracy higher than the base SVM model (77.37 vs.
76.22) and 1.47 absolute points of F1 more than
the base model (77.42 vs. 75.95). Similarly, on the
IC full dataset, the performance of the best model,
i.e, LSTM+CNN, achieved an accuracy of 82.31%,
which is 0.66% absolute points better than the base
model (82.31 vs. 80.65) and a micro-avg F1 of
82.52, which is about one point better than the base
SVM model (82.52 vs 81.51).</p>
        <p>Banca datasets. Regarding Banca AI dataset, the
best model, i.e., LSTM obtained accuracy of 85.29,
which is about 4 absolute points better than the
base SVM model (85.29 vs. 81.97). Also, in terms
of F1, the best model obtained 3.77 absolute points
more than the baseline (82.86 vs 80.09). Regarding
the Banca IB dataset, the best model, i.e., LSTM,
obtained around 6 points more both in terms of
Accuracy (78.43 vs 72.35) and F1 (76.91 vs 71.08).
6.3</p>
      </sec>
      <sec id="sec-10-5">
        <title>End-to-End system evaluation</title>
        <p>We trained and evaluated our system using samples
of data collecting from IMSL customers.</p>
        <p>We noted that the accuracy of the system
improved because more answers are generally
provided (from 3 to 5) to the user’s question, thus
allowing to almost certainly provide the correct
answer.</p>
        <p>The only point of attention is the fact that there
is not always a valid answer to the user’s request in
company knowledge. Indeed, the questions related
to the user’s personal profile or data cannot be
precisely answered by the company documentation.</p>
        <p>Furthermore, it often happens that the company
policy prevents to provide explicit answers to
specific user problems. In all these cases, it is therefore
necessary to support the QA system with operators,
who can provide personal answers or those not
coded in the corporate knowledge.</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Conclusions</title>
      <p>In this paper, we have presented a modern dialog
system for real-world applications. We have tested
advanced technology for QA and intent
classification on several datasets derived from company data,
such as Banks and Credit Institutions. The results
show a promising direction for SMEs to build their
own effective access to unstructured data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Antoine</given-names>
            <surname>Bordes</surname>
          </string-name>
          , Sumit Chopra, and
          <string-name>
            <given-names>Jason</given-names>
            <surname>Weston</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Question answering with subgraph embeddings</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pages
          <fpage>615</fpage>
          -
          <lpage>620</lpage>
          , Doha, Qatar, October. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Junyoung</given-names>
            <surname>Chung</surname>
          </string-name>
          , C¸ aglar Gu¨lc¸ehre, KyungHyun Cho, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Empirical evaluation of gated recurrent neural networks on sequence modeling</article-title>
          .
          <source>CoRR, abs/1412</source>
          .3555.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Corinna</given-names>
            <surname>Cortes</surname>
          </string-name>
          and
          <string-name>
            <given-names>Vladimir</given-names>
            <surname>Vapnik</surname>
          </string-name>
          .
          <year>1995</year>
          .
          <article-title>Supportvector networks</article-title>
          .
          <source>In Machine Learning</source>
          , pages
          <fpage>273</fpage>
          -
          <lpage>297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          , Minneapolis, Minnesota, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Raj D Iyer</surname>
            ,
            <given-names>David D</given-names>
          </string-name>
          <string-name>
            <surname>Lewis</surname>
            , Robert E Schapire, Yoram Singer, and
            <given-names>Amit</given-names>
          </string-name>
          <string-name>
            <surname>Singhal</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Boosting for document routing</article-title>
          .
          <source>In Proceedings of the ninth international conference on Information and knowledge management</source>
          , pages
          <fpage>70</fpage>
          -
          <lpage>77</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Mohit</given-names>
            <surname>Iyyer</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
          </string-name>
          Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daume´ III.
          <year>2014</year>
          .
          <article-title>A neural network for factoid question answering over paragraphs</article-title>
          .
          <source>In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          , pages
          <fpage>633</fpage>
          -
          <lpage>644</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Nal</given-names>
            <surname>Kalchbrenner</surname>
          </string-name>
          , Edward Grefenstette, and
          <string-name>
            <given-names>Phil</given-names>
            <surname>Blunsom</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>A convolutional neural network for modelling sentences</article-title>
          .
          <source>arXiv preprint arXiv:1404</source>
          .
          <fpage>2188</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Yoon</given-names>
            <surname>Kim</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Convolutional neural networks for sentence classification</article-title>
          .
          <source>CoRR, abs/1408</source>
          .5882.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lecun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Haffner</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Gradient-based learning applied to document recognition</article-title>
          .
          <source>Proceedings of the IEEE</source>
          ,
          <volume>86</volume>
          (
          <issue>11</issue>
          ):
          <fpage>2278</fpage>
          -
          <lpage>2324</lpage>
          , Nov.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Huma</given-names>
            <surname>Lodhi</surname>
          </string-name>
          , Craig Saunders, John Shawe-Taylor, Nello Cristianini, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Watkins</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Text classification using string kernels</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>2</volume>
          (Feb):
          <fpage>419</fpage>
          -
          <lpage>444</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Zhengdong</given-names>
            <surname>Lu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hang</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A deep architecture for matching short texts</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>1367</fpage>
          -
          <lpage>1375</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Yishu</given-names>
            <surname>Miao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Lei</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Phil</given-names>
            <surname>Blunsom</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Neural variational inference for text processing</article-title>
          .
          <source>In International conference on machine learning</source>
          , pages
          <fpage>1727</fpage>
          -
          <lpage>1736</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Moschitti</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Making tree kernels practical for natural language learning</article-title>
          .
          <source>In 11th conference of the European Chapter of the Association for Computational Linguistics.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Robert E</given-names>
            <surname>Schapire</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yoram</given-names>
            <surname>Singer</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Boostexter: A boosting-based system for text categorization</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>39</volume>
          (
          <issue>2-3</issue>
          ):
          <fpage>135</fpage>
          -
          <lpage>168</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Aliaksei</given-names>
            <surname>Severyn</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Moschitti</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Learning to rank short text pairs with convolutional deep neural networks</article-title>
          .
          <source>In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval</source>
          , pages
          <fpage>373</fpage>
          -
          <lpage>382</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Aliaksei</given-names>
            <surname>Severyn</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Moschitti</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Modeling relational information in question-answer pairs with convolutional neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1604</source>
          .
          <fpage>01178</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Sibel</given-names>
            <surname>Yaman</surname>
          </string-name>
          , Li Deng,
          <string-name>
            <given-names>Dong</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ye-Yi</surname>
            <given-names>Wang</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>Alex</given-names>
            <surname>Acero</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>An integrative and discriminative technique for spoken utterance classification</article-title>
          .
          <source>IEEE Transactions on Audio, Speech, and Language Processing</source>
          ,
          <volume>16</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1207</fpage>
          -
          <lpage>1214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Scott</surname>
          </string-name>
          Wen-tau
          <string-name>
            <surname>Yih</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Chris</given-names>
            <surname>Meek</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Andrzej</given-names>
            <surname>Pastusiak</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Question answering using enhanced lexical semantic models</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Wenpeng</given-names>
            <surname>Yin</surname>
          </string-name>
          , Hinrich Schu¨tze, Bing Xiang, and
          <string-name>
            <given-names>Bowen</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Abcnn: Attention-based convolutional neural network for modeling sentence pairs</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          ,
          <volume>4</volume>
          :
          <fpage>259</fpage>
          -
          <lpage>272</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Lei</given-names>
            <surname>Yu</surname>
          </string-name>
          , Karl Moritz Hermann, Phil Blunsom, and
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Pulman</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Deep learning for answer sentence selection</article-title>
          .
          <source>arXiv preprint arXiv:1412</source>
          .
          <fpage>1632</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>