<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Challenges of Building an Intelligent Chatbot</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>ITMO University</institution>
          ,
          <addr-line>St.-Petersburg, Kronverkskiy Prospekt, 49, 197101</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>There can be no doubt that the way of human-computer interaction has changed drastically over the last decade. Dialogue systems (or conversational agents) including voice control interfaces, personal digital assistants and chatbots are examples of industrial applications developed to interact with customers in a human-like way using natural language. With continued growth in messaging applications and increasing demand for machine-based communications, conversational chatbot is likely to play a large part in companies' customer experience strategy. As systems designed for personalized interaction with users, conversational chatbots are becoming increasingly sophisticated in an attempt to mimic human dialogue. However, building an intelligent chatbot is challenging as it requires spoken language understanding, dialogue context awareness and human-like aspects demonstration. In this paper, we present the results of data-driven chatbot implementation in order to better understand the challenges of building an intelligent agent capable of replying to users with coherent and engaging responses in conversation. Developed chatbot demonstrates the balance between domain-specific responding and users' need for a comprehensive dialogue experience. The retrieval-based model, which achieved the best dialogue performance, is proposed. Furthermore, we present the datasets collected for the purpose of this paper. In addition, natural language understanding issues and aspects of human-machine dialogue quality are discussed in detail. And finally, the further studies are described.</p>
      </abstract>
      <kwd-group>
        <kwd>Natural Language Processing</kwd>
        <kwd>Dialogue Systems</kwd>
        <kwd>Conversational AI</kwd>
        <kwd>Intelligent Chatbot</kwd>
        <kwd>Retrieval-Based Chatbot</kwd>
        <kwd>Word Embeddings</kwd>
        <kwd>Text Vectorization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Intelligent dialogue agents are designed to conduct a coherent and emotionally
engaging conversation with users. Chatbots became a basis of modern personal
assistants which help users to perform everyday tasks. Among the most popular are
Apple Siri, Google Assistant, Microsoft Cortana, Amazon Alexa and Yandex.Alice.</p>
      <p>There are two major types of dialogue systems: goal-oriented (closed-domain) and
open domain (i.e., chatbots or chitchats). Goal-oriented dialog systems are primarily
* Equal contribution to the work</p>
      <p>Copyright ©2020 for this paper by its authors.</p>
      <p>Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
built to understand the user request within a finite number of pre-defined agent skills
(e.g., play music or set a reminder). Chatbots are to involve users in some kind of
intelligent conversation in order to improve their engaging experience.</p>
      <p>
        Building an intelligent conversational agent interacting with people in a
humanlike way is an extremely challenging task complex task, meanwhile it is a perspective
and promising research direction of the field dialogue systems [
        <xref ref-type="bibr" rid="ref14">1, 14</xref>
        ].
      </p>
      <p>
        Modern dialogue system architecture includes three main modules: natural
language processing (NLP), dialogue manager, and natural language generation
(NLG). The core of a dialogue system is analysis of user utterance inputted in NLP
module [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Typically, in this module, the utterance is mapped to text vector
representation (i.e., embeddings) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Then vector representations are then used by
the internal model to provide a response to the user. Chatbot could be considered
intelligent if its responses are coherent and meaningful to the user. This behavior is
highly dependent on the chatbot architecture and text vectorization methods.
      </p>
      <p>The goal of this paper is analysis of modern approaches to the development of
chatbots which could provide the user with emotionally satisfying and meaningful
responses. First, we describe the historical background of conversational agents and
consider the main data-driven architectures; in particular, we focus on the
retrievalbased approach. Next, we briefly review the state-of-the-art text vectorization models
and present the results of comparative analysis. Then we describe our experiment of
building a retrieval-based chatbot, starting with the process of train dataset collection
that provides a wide range of chatbot about a specific topic. The topic of
film/analogue photography has been chosen as an example. The basic implementation
of chatbot and its improvements are proposed. Finally, the main challenges of
building an intelligent conversational agent and future work are discussed.
1</p>
    </sec>
    <sec id="sec-2">
      <title>Chatbot Architectures</title>
      <p>
        Chatbots can be roughly divided into the following three categories based on the
response generation architectures [
        <xref ref-type="bibr" rid="ref27 ref4">4, 27</xref>
        ]:
- rule-based chatbots, which analyze key characteristics of the input utterance and
response to the user relying on a set of pre-defined hand-crafted templates;
- retrieval-based (IR-based) chatbots, which select response from a large
precollected dataset and choose the best potential response from the top-k ranked
candidates;
- generative-based chatbots, which produce a new text sequence as a response
instead of selecting if from pre-defined set of candidates.
      </p>
      <p>
        One of the most influential examples of conversational programs is ELIZA [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ],
the early dialogue system, which was designed at the MIT Artificial Intelligence
Laboratory by Joseph Weizenbaum, simulated a human-like conversation as a
psychologist. ELIZA is the rule-based chatbot that responds to the user combining
complex heuristics and "if-then-else"-rules from the set of hand-crafted templates
developed for the system specific domain. All early rule-based chatbots, including
ELIZA, required much manual human effort and experts' knowledge to build, enhance
and maintain such systems [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ].
      </p>
      <p>Thankfully, as a result of the recent progress in internet technology and data
science, full data-driven architectures were proposed. Divided by machine learning
approaches, there are two chatbot architectures using massive text collection analysis
and natural language processing: generative-based and retrieval-based.</p>
      <p>
        Generative-based chatbots reply to users applying natural language generation
(NLG). They produce new responses from scratch word by word: given a previous
conversation history, predict the most likely next utterance. The early response
generative model proposed by Ritter in 2011 was inspired by Statistical Machine
Translation (SMT) techniques [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Nowadays, the state-of-the-art in the NLG are
Encoder-Decoder Sequence-to-Sequence (seq2seq) architectures [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] based on deep
recurrent LSTM/GRU neural networks [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] with attention mechanism [
        <xref ref-type="bibr" rid="ref33 ref39">33, 39</xref>
        ]. The
first adaptation of seq2seq architecture to the task of building a conversational agent
was presented by [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ]. Unquestionably, the fundamental advantage of
generativebased chatbots is that they do not rely neither on a pre-defined set of rules nor on a
responses repository. Thus, generative models tend to be more sustainable to new
unseen input utterances and, as a result, to seem more coherent to the user. However,
due to specificity of learning procedure, there are also some weaknesses of generative
models: the problem of short informative responses (e.g. "I don't know", "okay") [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ];
text generation grammatical and semantic mistakes that humans would never make;
and dialogue inconsistency, where the model analyzes only the current user utterance
without the previous context ("context-blindness"). The above mentioned problems
are still unresolved despite attempts of researchers to handle them [
        <xref ref-type="bibr" rid="ref18 ref34">18, 34</xref>
        ].
      </p>
      <p>
        Latest works [1] show researchers' high interest in generative-based chatbot
architectures, thus rapid progress in this area can be expected. However, it is worth
noting that generative models require a huge amount of training data and
computational resources while they are still likely to respond unpredictably.
Therefore, today, most of the industrial production solutions still remained
retrievalbased [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        In this paper we focused on the features of retrieval-based architecture.
Retrievalbased chatbots do not generate new utterances but they select an appropriate
grammatically correct response from a large set of pre-collected Utterance-Response
pairs. Given a dialogue context, both input utterance and responses pairs are encoded
into some vector space representation, then the system counting semantic similarity
score for each pair (i.e. dot product or cosine similarity) selects the best response from
high-matched candidates. This approach based on information retrieval paradigm [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
became quite popular in the area of conversational agents [
        <xref ref-type="bibr" rid="ref12 ref15 ref25 ref26">12, 25, 26, 15</xref>
        ].
Considering the learning process, there are two approaches for best response selection
by retrieval-based model: supporting a single-turn conversation, matching current
user utterance with candidate pairs without any context information, or conduct a
multi-turn conversation, taking into account the previous utterances, which are
typically defined as a dialogue context. Building a retrieval-based chatbot supporting
a multi-turn conversation is a promising and challenging problem. In recent years,
there has been growing interest in this research area [
        <xref ref-type="bibr" rid="ref32 ref38 ref45">32, 45, 38</xref>
        ].
      </p>
      <p>In the next chapter we consider the concept of text similarity in detail and briefly
review various vectorization models relevant for the task of retrieval-based chatbot
implementation.</p>
    </sec>
    <sec id="sec-3">
      <title>Text Vectorization Models</title>
      <p>
        Text vectorization models that are popular today are based on the ideas of distribution
semantics [
        <xref ref-type="bibr" rid="ref10 ref24">10, 24</xref>
        ]. According to the hypothesis of distributional semantics, words
that occur in similar contexts with a similar frequency are considered semantically
close. Corresponding dense vector representations which dimensions are much
smaller than the dictionary's dimension (i.e., embeddings) are close to each other by
the cosine measure in a word vector space.
      </p>
      <p>
        One of the most basic vectorization methods is the statistical measure TF-IDF [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ],
it determines the word importance to the document in a text collection. The TF-IDF is
the product of the frequency of words in the text and the inverse frequency of the
word in the collection of documents. So the value of TF-IDF increases proportionally
to the number of times a word appears in the document. TF-IDF vectors have size
equal to the dictionary size, and it can turn out to be quite large. TF-IDF vectors will
be close only for those documents which contain the matching words [2].
      </p>
      <p>
        Text vectorization models gained a wide popularity in 2013 after Tomas Mikolov
publication [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] on the approach known as Word2Vec. This approach has two
implementations: CBOW (continuous bag of words) and Skip-Gram. CBOW model
predicts the probability of each word of text in a particular context, while the
SkipGram model calculates the probability of a context around a particular word.
Word2Vec embeddings capture semantic similarity of words, that is semantically
close words will have high cosine similarity in the model vector space.
      </p>
      <p>
        However, extension of Word2Vec vector space with new word embedding requires
retraining the model. The solution of the missing words problem was proposed in
fastText model [
        <xref ref-type="bibr" rid="ref16 ref6">16, 6</xref>
        ]. This model is Word2Vec modification, which produces
character n-gram embeddings. Also, it is worth mentioning GloVe model [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]
proposed by Stanford NLP Group at Stanford University. GloVe combines ideas of
matrix factorization and Word2Vec approach.
      </p>
      <p>
        Text vector representations described above are commonly referred to as "static
word embeddings". One of the problems of static models is polysemy. The same
words in different contexts will have the same embedding. The recent progress in
approaches to text vector representation is contextualized (dynamic) language model.
Contextualized models calculate word embeddings depending on its context. Thus,
released in late 2018 BERT model, which helped researchers to reach a new
state-ofthe-art in most NLP problems, became, undoubtedly, the key achievement of the last
years in the field of NLP. With regard to other successful contextualized language
models, ELMO [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ], XLNet [
        <xref ref-type="bibr" rid="ref43">43</xref>
        ] and GPT-2 [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] are particularly to be noted.
      </p>
      <p>
        People often use foreign words or whole phrases in the spoken language [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
Thus, multilingualism could be one of the challenges in building chatbots.
Contextualized models allow a multilingual format, but separately trained models
should be required for each language. There is another approach to multilingualism,
which transfers NLP models from one language to scores of others by preparing a
model that is able to generalize different languages in a common vector space. Then
the vectors of the same statement in any language will end up in the same
neighborhood closely placed. Developed by a group of Facebook researchers LASER
embedding model [3] is the promising method which implements this idea.
      </p>
      <p>The model maps entire sentences into the vector space, and that is the advantage in
creating embeddings for retrieval-based chatbots. In the next section, we describe the
steps of the retrieval-based chatbot implementation and present the results of
comparison between the considered text vector models applied for this task.
3
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <sec id="sec-4-1">
        <title>Data Sources</title>
        <p>Regardless of chatbot architecture, it requires a large dataset of natural language
dialogs for training. Such a dataset should include all topics that are supposed to be
discussed with the bot. Additional meta-information about the dialogs (i.e. author
name and age, message date and time, or response links) can improve chatbot
responses. The most notable conversational open data sources for Russian are the
following:
- Movies and TV Series Subtitles. Subtitles can be a source of general
conversation topics. However, the movie genre introduces the main theme of
dialogs, thus the collected dataset must be analyzed for peculiar vocabulary.</p>
        <p>Another subtitles drawback is the lack of clear separation between dialogues.
- Twitter. Twitter messages in threads contain information about authors and reply
details and conversations have clear boundaries. But Twitter users tend to
discuss multimedia content, which makes the dialogue lexically and
semantically narrow.
- Public Group Chats (i.e. Telegram, Slack). Public chats can provide a rich
source of dialogues on specific topics (programming, history, photography, etc).
However, it is necessary to remember that poorly moderated public group
messages likely contain hate speech, political statements and obscene language.
- Other Web Sources. There are many other sources of conversational data that
could be used for chatbots training: social networks discussions, forum threads,
movie transcripts, fiction (i.e. plays), etc.</p>
        <p>Depending on a practical goal, several data sources can be used for training a
retrieval-based chatbot, but it still may not be enough for supporting a coherent
conversation. Here it is also worth paying attention to ethical issues and removing
offensive utterances and obscene language from the data.</p>
        <p>The key idea of our experiment is creating a chatbot that could seem intelligent
enough, responding to the input utterance coherently, which could make a good
impression on users. The bot should behave that way both within small talk and
within some pre-selected narrow topics, which users are interested in. As a subject of
a specific topic we decided to choose analogue/film photography. Two public
Telegram chats1 and open set of subtitles2 have been chosen as the data sources. Thus,
the overall text collection consists of 358,545 records with the following columns:
message identifier, reply message identifier, author, addressee and utterance.</p>
        <sec id="sec-4-1-1">
          <title>1 https://t.me/filmpublic, https://t.me/plenkachat</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>2 http://opus.nlpl.eu/OpenSubtitles.php</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Data Preprocessing</title>
        <p>When users interact with a retrieval-based chatbot, they usually input a phrase that
does not appear in predefined responses word-to-word.</p>
        <p>
          Therefore, relevant responses could be selected only by the semantic similarity
between the user's input and conversation context of candidate utterances. In the area
of retrieval-based chatbots, various methods for defining the context have been
proposed in many research papers [
          <xref ref-type="bibr" rid="ref20 ref21 ref36 ref44">36, 20, 44, 21</xref>
          ]. Since the chat-specific
conversational data (i.e., Telegram chats) contains information about the authors and
reply_to links, our dataset can be splitted into many short conversations of the form
such as start_utterance-&gt;response-&gt;...-&gt;response-&gt;last_utterance. Figure 1
demonstrates multi-turn conversations of the dataset. The structure is a directed graph,
where each node corresponds to the utterance labeled by message identifier and each
edge corresponds to the relationship "is reply to" between messages.
After multi-turns extraction, the initial dataset was transformed into the
ContextResponse form, where the Response is the last utterance of the turn and Context is all
previous responses of that turn.
        </p>
        <p>Further, the text data was pre-processed according to the following steps:
1. tokenization;
2. removal of special characters, links and punctuation;
3. removal of stop-words;
4. tokens normalization.</p>
        <p>After the last step of data preprocessing, the final training dataset contained
134307 Context-Response pairs. The average Context length is 11 tokens
and the average Response length is 9 tokens, which is, in fact, quite short for this kind
of the retrieval-based task.
3.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Results</title>
        <p>Vector representation of text could be calculated by averaging its word embeddings.
In particular, for the Word2Vec text vectors calculations two averaging word methods
were used: simple averaging (Averaged Word2Vec) and weighted averaging W2V
over TF-IDF (TF-IDF-weighted W2V).</p>
        <p>
          For evaluation of chatbot responses based on the various text vectorization models,
we use Recall@k metric. Recalln@k (denoted Rn@k below) measures the percentage
of relevant utterances among the top-k ranked n candidate responses [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. This kind of
metric is often applied to retrieval tasks and could be calculated automatically, but
requires a validation set structured differently from training dataset. Concretely, we
have created the dataset with 134307 records, where each record corresponds to three
following columns: context, the ground truth response and the list of 9 false responses
of training Context-Response pairs which have been chosen randomly. Thus, during
the evaluation process, various R10@1, R10@2 and R10@5 measures have been
calculated. Each model should select 1, 2 and 5 best responses among 10 possible
candidates. Thus, the model's choice should be marked as correct if the ground truth
utterance is ranked in top-k. Our experimental results are shown in Table 1.
        </p>
        <p>It is worth noting that as a retrieval metric Rn@k has a significant drawback: in
practice, there could exist more than one relevant response that could be marked as
the ground truth. The appropriate responses thereby could be regarded as incorrect.
0.229
0.277
0.328</p>
        <p>
          Averaged
W2V
According to Table 1, the different results for each text vectorization method have
been demonstrated by the chatbot. For R10@1 the baseline TF-IDF has the highest
score, for R10@2 - TF-IDF-weighted W2V and for R10@5 - LASER.
TF-IDFweighted W2V and LASER could be considered as the best overall models on the
retrieval metrics. Even so, the model that performs well on the chosen retrieval
metrics is not guaranteed to achieve good performance on a new response generation.
Our assumption is that improvements on a model with regards to the Rn@k metric will
eventually lead to improvements for the generation task. One the other hand, the
human evaluation of conversational agents is still the most accurate and preferable
approach [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Therefore, we evaluated the quality of two highly performed methods
by human judgement (TF-IDF-weighted W2V and LASER). Finally, on the
generation task, the chatbot based on LASER embeddings seemed significantly
coherent, thus it has been considered as the best text vectorization model in our
experiments.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>One of the most rapidly developing subfields of dialogue systems is an area of
conversational agents (i.e. chatbots). Building an intelligent chatbot is a major issue of
current business and research interests.</p>
      <p>A strong product hypothesis is that the more conversational product interface is
humanlike and intelligent, the more customers' digital experience is engaging and
satisfactory. In this paper three main chatbot architectures have been briefly reviewed:
rule-based approach and the fully data-driven retrieval-based and generative models.
The advantages and disadvantages of the architectures have been also described.
Nowadays, retrieval-based chatbots are the most commonly used conversational
models which are built into business production solutions. Typically, retrieval-based
models learn faster compared to generative models. They are less likely to have the
problem of short general responses and more controllable for filtering grammatical
mistakes and inappropriate language.</p>
      <p>In this paper, the main challenges of data-driven conversational agents have been
considered. We present the results of retrieval-based chatbot implementation, which
keeps both a small talk conversation and conversation within a narrow topic of
analogue photography in Russian. Semantic relations between context and potential
responses are captured by text vector representation (word embeddings). It is a crucial
technique for building a retrieval-based intelligent model of chatbot. In order to create
a chatbot replying to users coherently and engagingly enough, the state-of-the-art text
vectorization models have been compared and applied for our experiment. The
LASER sentence embedding model has performed the best. The programming code
and datasets have been shared in public repository3.</p>
      <p>Furthermore, we have analyzed current open web-sources of conversational data
and outlined its main problems and features. It is essential to underline the critical
need of high-quality dataset for training a retrieval-based chatbot. It is necessary to
remember that poorly moderated conversational data likely contains offensive, toxic
and noisy utterances, which must be removed from the dataset. This issue is one of
the future research directions we plan to focus on.</p>
      <sec id="sec-5-1">
        <title>3 https://github.com/yuliazherebtsova/plenka-chatbot</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Adiwardana</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luong</surname>
          </string-name>
          , M.-T.,
          <string-name>
            <surname>So</surname>
            ,
            <given-names>D. R.</given-names>
          </string-name>
          , Hall,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Fiedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Thoppilan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Kulshreshtha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Nemade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <surname>Q. V.</surname>
          </string-name>
          :
          <article-title>Towards a Human-like Open-Domain Chatbot</article-title>
          . arXiv:
          <year>2001</year>
          .
          <fpage>09977</fpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Arora</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , T.:
          <article-title>A simple but tough-to-beat baseline for sentence</article-title>
          , https://openreview.net/pdf?id=SyK00v5xx, last accessed
          <year>2020</year>
          /02/17 (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Artetxe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
          </string-name>
          , H.:
          <article-title>Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond</article-title>
          .
          <source>CoRR</source>
          . arXiv:
          <year>1812</year>
          .
          <fpage>10464</fpage>
          . (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Almansor</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hussain</surname>
            ,
            <given-names>F.K.</given-names>
          </string-name>
          :
          <article-title>Survey on Intelligent Chatbots: State-of-the-Art and Future Research Directions</article-title>
          . In: Complex, Intelligent, and Software Intensive Systems, P.
          <fpage>534</fpage>
          -
          <lpage>543</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bellegarda</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          :
          <article-title>Large-Scale Personal Assistant Technology Deployment: the Siri Experience</article-title>
          . INTERSPEECH. P.
          <year>2029</year>
          -
          <fpage>2033</fpage>
          . (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Enriching Word Vectors with Subword Information</article-title>
          . arXiv:
          <volume>1607</volume>
          .
          <fpage>04606</fpage>
          . (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation</article-title>
          .
          <source>EMNLP</source>
          .
          <volume>1724</volume>
          -1734 pp. (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dariu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Rodrigo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Otegi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Echegoyen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosset</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cieliebak</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Survey on Evaluation Methods for Dialogue Systems</article-title>
          . arXiv:
          <year>1905</year>
          .
          <fpage>04071</fpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Galley</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Neural Approaches to Conversational AI</article-title>
          . arXiv:
          <year>1809</year>
          .
          <volume>08267</volume>
          . 95 pp. (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>Z.S.:</given-names>
          </string-name>
          <article-title>Distributional structure</article-title>
          .
          <source>Word. 10. Issue</source>
          <volume>2</volume>
          -3. P.
          <volume>146</volume>
          -
          <fpage>162</fpage>
          . (
          <year>1954</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Holger</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Douze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Learning Joint Multilingual Sentence Representations with Neural Machine Translation</article-title>
          ,
          <source>ACL workshop on Representation Learning for NLP. arXiv:1704</source>
          .
          <fpage>04154</fpage>
          . (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural network architectures for matching natural language sentences</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          . pp.
          <fpage>2042</fpage>
          -
          <lpage>2050</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Huang P.-S.</surname>
          </string-name>
          :
          <article-title>Learning Deep Structured Semantic Models for Web Search using Clickthrough Data</article-title>
          , https://posenhuang.github.io/papers/cikm2013_DSSM_fullversion.pdf,
          <source>last accessed</source>
          <year>2020</year>
          /02/17.. (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Challenges in building intelligent open-domain dialog systems</article-title>
          . arXiv preprint arXiv:
          <year>1905</year>
          .
          <fpage>05709</fpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ihaba</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takahashi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Neural Utterance Ranking Model for Conversational Dialogue Systems</article-title>
          .
          <source>In: Proceedings of the SIGDIAL 2016 Conference. Association for Computational Linguistics</source>
          . pp.
          <fpage>393</fpage>
          -
          <lpage>403</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Bag of Tricks for Efficient Text Classification</article-title>
          . arXiv:
          <volume>1607</volume>
          .
          <fpage>01759</fpage>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>J. H.</given-names>
          </string-name>
          :
          <source>Title Speech and Language Processing. 2nd edition</source>
          . Prentice Hall.
          <volume>988</volume>
          p. (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Li</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Galley</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brockett</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dolan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A Diversity-Promoting Objective Function for Neural Conversation Models</article-title>
          . arXiv:
          <volume>1510</volume>
          .
          <fpage>03055</fpage>
          . (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. Liu C.-W.,
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Serban</surname>
            ,
            <given-names>I. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noseworthy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Charlin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pineau</surname>
          </string-name>
          , J.:
          <article-title>How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation</article-title>
          . arXiv:
          <volume>1603</volume>
          .
          <fpage>08023</fpage>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pow</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Serban</surname>
            ,
            <given-names>I. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pineau</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems</article-title>
          . arXiv:
          <volume>1506</volume>
          .
          <fpage>08909</fpage>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , W.,
          <string-name>
            <surname>Cui</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shao</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Zhang, W.-N.,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
          </string-name>
          , G.:
          <article-title>TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots</article-title>
          . arXiv:
          <year>1909</year>
          .
          <fpage>10666</fpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C. D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghavan</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schütze</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>An Introduction to Information Retrieval</article-title>
          . Stanford NLP Group, Cambridge University Press. URL: https://nlp.stanford.edu/IRbook/pdf/irbookonlinereading.pdf (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <article-title>Distributed Representations of Words and Phrases and their Compositionality</article-title>
          .
          <source>In Proceedings of Workshop</source>
          at ICLR, https://papers.nips.cc/paper/ 5021-distributed
          <article-title>-representations-of-words-and-phrases-and-their-compositionality</article-title>
          .pdf,
          <source>last accessed</source>
          <year>2020</year>
          /02/17. (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Osgood</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suci</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tannenbaum</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>The measurement of meaning</article-title>
          . University of Illinois Press. 354 p. (
          <year>1957</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Nio</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sakti</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neubig</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toda</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Developing Non-goal Dialog System Based on Examples of Drama Television</article-title>
          . In:
          <article-title>Natural Interaction with Robots, Knowbots and</article-title>
          <string-name>
            <surname>Smartphones. P.</surname>
          </string-name>
          355-
          <fpage>361</fpage>
          . (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Parakash</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brockett</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Emulating Human Conversations using Convolutional Neural Network-based IR</article-title>
          . arXiv:
          <volume>1606</volume>
          .
          <fpage>07056</fpage>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>X..:</given-names>
          </string-name>
          <article-title>A survey on construction and enhancement methods in service chatbots design</article-title>
          .
          <source>CCF Transactions on Pervasive Computing and Interaction. 10.1007/s42486-019-00012-3</source>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amodei</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Language Models are Unsupervised Multitask Learners</article-title>
          .
          <source>Technical Report OpenAi</source>
          , https://d4mucfpksywv.cloudfront.
          <article-title>net/better-languagemodels/language_models_are_unsupervised_multitask_learners</article-title>
          .pdf (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Ritter</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Data-Driven Response Generation in Social Media</article-title>
          .
          <source>Conference on Empirical Methods in Natural Language Processing</source>
          . Edinburgh. P.
          <volume>583</volume>
          -
          <fpage>593</fpage>
          . (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C. D.:
          <article-title>GloVe: Global Vectors for Word Representation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <article-title>Association for Computational Linguistics</article-title>
          . pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>arXiv preprint arXiv: 1802</source>
          .
          <fpage>05365</fpage>
          . (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Serban</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pow</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , ,
          <string-name>
            <surname>Pineau</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems</article-title>
          .
          <source>arXiv preprint arXiv:1506</source>
          .
          <fpage>08909</fpage>
          . (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Shang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Neural responding machine for short-text conversation</article-title>
          .
          <source>In Proc. of ACL-IJCNLP</source>
          . pp.
          <fpage>1577</fpage>
          -
          <lpage>1586</lpage>
          . (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Shao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gouws</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Britz</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldie</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strope</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurzweil</surname>
          </string-name>
          , R.:
          <article-title>Generating HighQuality and Informative Conversation Responses with Sequence-to-Sequence Models</article-title>
          . arXiv:
          <volume>1701</volume>
          .
          <fpage>03185</fpage>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Sountsov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarawagi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Length bias in Encoder Decoder Models and a Case for Global Conditioning</article-title>
          . arXiv:
          <volume>1606</volume>
          .
          <fpage>03402</fpage>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Sordoni</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Galley</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brockett</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            ,Y., Mitchell,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Nie1</given-names>
            , J.-Y.,
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Dolan</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>A Neural Network Approach to Context-Sensitive Generation of Conversational Responses</article-title>
          . arXiv:
          <volume>1506</volume>
          .
          <fpage>06714</fpage>
          . (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>Sutskiever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q. V.</given-names>
          </string-name>
          :
          <article-title>Sequence to Sequence Learning with Neural Networks</article-title>
          .
          <source>arXiv:1409</source>
          .
          <fpage>3215</fpage>
          . (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <surname>Tao</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>W.:</given-names>
          </string-name>
          <article-title>Multi-Representation Fusion Network for Multi-Turn Response Selection in Retrieval-Based Chatbots</article-title>
          . In: ACM International Conference. pp.
          <fpage>429</fpage>
          -
          <lpage>437</lpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39.
          <string-name>
            <surname>Vaswani</surname>
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Attention Is All You Need</article-title>
          . arXiv:
          <volume>1706</volume>
          .
          <fpage>03762</fpage>
          . (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40.
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          :
          <article-title>A neural conversational model</article-title>
          .
          <source>arXiv preprint arXiv:1506</source>
          .
          <fpage>05869</fpage>
          . (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41.
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <source>The Elements of AIML Style. ALICE A.I Foundation</source>
          ,
          <volume>86</volume>
          pp. (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          42.
          <string-name>
            <surname>Weizenbaum</surname>
          </string-name>
          , J.:
          <article-title>ELIZA - A computer program for the study of natural language communication between man and machine</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <fpage>36</fpage>
          -
          <lpage>45</lpage>
          . (
          <year>1966</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          43.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , Carbonell, J.,
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q. V.</given-names>
          </string-name>
          :
          <article-title>XLNet: Generalized Autoregressive Pretraining for Language Understanding</article-title>
          .arXiv:
          <year>1906</year>
          .
          <fpage>08237</fpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          44.
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , R.,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polymenakos</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Addressee and Response Selection in Multi-Party Conversations with Speaker Interaction RNNs</article-title>
          . arXiv:
          <volume>1709</volume>
          .
          <fpage>04005</fpage>
          . (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          45.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots</article-title>
          .
          <source>ACL. arXiv:1612</source>
          .
          <fpage>01627</fpage>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>