=Paper=
{{Paper
|id=Vol-2693/paper5
|storemode=property
|title=Domain-Independent, Task-Oriented Chatbot Creation
and Conversation Policy Management Framework
|pdfUrl=https://ceur-ws.org/Vol-2693/paper5.pdf
|volume=Vol-2693
|authors=Tolga Çekiç,Yusufcan Manav,Enes Burak Dündar,Osman Fatih Kılıç,Onur Deniz,Seçil Arslan
|dblpUrl=https://dblp.org/rec/conf/ecai/CekicMDKDA20
}}
==Domain-Independent, Task-Oriented Chatbot Creation
and Conversation Policy Management Framework==
<pdf width="1500px">https://ceur-ws.org/Vol-2693/paper5.pdf</pdf>
<pre>
              Proceedings of the Workshop on Hybrid Intelligence for Natural Language Processing Tasks HI4NLP (co-located at ECAI-2020)
                                        Santiago de Compostela, August 29, 2020, published at http://ceur-ws.org


    Domain-Independent, Task-Oriented Chatbot Creation
      and Conversation Policy Management Framework
         Tolga Çekiç12 and Yusufcan Manav13 and Enes Burak Dündar4 and Osman Fatih Kılıç5 and
                                        Onur Deniz6 and Seçil Arslan7


Abstract. In this paper, we present a chatbot creation framework                ageable to create rules that cover more cases. Thus new methods are
that can help people with no technical expertise to design and cre-             devised to create more effective chatbots. IRIS[3], a general conver-
ate chatbots for any domain. This framework enables the creation                sation chatbot, uses data from many conversations and extracts the
of highly customizable chatbots that can range from simple question             most appropriate responses to user utterances. For information re-
answer systems to chatbots that can handle more complex dialogue                trieval, IRIS uses a vector space model. Gandhe and Traum also pro-
flows. In order to be domain independent we created a general Turk-             posed a TF-IDF model to retrieve answers for chatbot from a dataset
ish language model using ELMo architecture and intent detection                 of text dialogue scripts.
models for chatbots are trained using embeddings generated via this                With the advancements in deep neural networks, machine learn-
language model. Additionally, in order to make conversations more               ing based chatbots have become increasingly successful. Vinyals and
seamless and cohesive, dialogue act classification is integrated into           Le introduced sequence to sequence (seq2seq) learning for conversa-
conversation policy management. The framework also includes an                  tion systems for generating dynamic responses to each user utter-
additional tool that allows monitoring of past chatbot conversations            ance [11]. Although they are very successful in generating human-
and provides analytic tools supported by clustering algorithms.                 like sentences as pointed out Sordoni et al. They are actually limited
                                                                                in context sensitive conversations and carrying on information from
1   Introduction                                                                previous utterances [10].
                                                                                   While machine learning methods can help create human-like con-
Conversational systems, or as they are commonly called chatbots are             versations for general conversation, task-oriented chatbots usually
ubiquitous nowadays as they are used for simple general conversa-               require more complex tools for conversation management. Since
tion or more specialized tasks such as customer services. With their            task-oriented chatbots are generally deployed commercially and in-
popularity much research is focused on chatbots to make them more               teract with customers, their responses must be more precise and care-
effective in solving users’ problems and ensure conversations with a            fully constructed. Thus, unpredictability of seq2seq model sentences
chatbot are akin to conversation between humans.                                can sometimes be undesirable for such chatbots. Also, some infor-
   While general conversation chatbots usually consider a few utter-            mation must be specifically collected from users and must be kept
ances from users without coming to a point, for task-oriented chat-             to complete task. For instance a chatbot that sells flight tickets must
bots sometimes the system must keep information from a much pre-                collect departure and arrival locations as well as date so as to be able
vious utterance and should try to steer conversation toward a certain           to present an offer and sell a ticket. In order to create such chatbots
point, thus completing its appointed task such as booking a ticket.             intent-slot model is used [5]. Intents are what a user wants to do with
While the two approaches have much in common, considering the                   a chatbot and slots are actually entities that must be collected or filled
research and the techniques used, they present different challenges             for the chatbot to complete its task related to the intent. In order to
[7].                                                                            create chatbots with intent-slot model, machine learning methods for
   Research into chatbots started decades ago with the advent of                intent detection and entity extraction are mixed with other techniques
ELIZA[13] which was designed to mimic psychotherapy and uses                    such as state tracking and policy management.
pre-programmed rules to generate answers. It receives user utter-                  Task-oriented chatbots with intent-slot model can be used for
ances and by using certain words and word-of-speech tags, it pre-               many different domains and they would have similar designs. In or-
pares a response by following predetermined rules. Other chatbots               der to create a scalable system for developing task-oriented chatbots
following similar rule-based design to ELIZA have also been devel-              for multiple domains new frameworks were devised. These frame-
oped over the years with their own advancements such as PARRY[4]                works such as Amazon Lex, Google Dialogflow, Microsoft Luis help
and ALICE[12].                                                                  create chatbots with little requirements for programming expertise.
   In order to create chatbots that can perform more complex conver-            In this paper we offer a novel chatbot creation framework equipped
sations, rule-based approach can be limiting since it can be unman-             with tools to handle common problems that can be encountered by
1 Equal Contribution                                                            conversation systems. In order to have a powerful intent detection
2 YapıKredi Teknoloji, Turkey, tolga.cekic@ykteknoloji.com.tr                   mechanism that can work in multiple domains, our framework uses a
3 YapıKredi Teknoloji, Turkey, yusufcan.manav@ykteknoloji.com.tr                general ELMo based language model and uses a deep neural network
4 YapıKredi Teknoloji, Turkey, enesburak.dundar@ykteknoloji.com.tr
                                                                                classifier based that uses ELMo embeddings [8]. Furthermore, an ad-
5 YapıKredi Teknoloji, Turkey, osmanfatih.kilic@ykteknoloji.com.tr
6 YapıKredi Teknoloji, Turkey, onur.deniz@ykteknoloji.com.tr
                                                                                ditional hybrid intent detection classifier is built with rule-based in-
7 YapıKredi Teknoloji, Turkey, secil.arslan@ykteknoloji.com.tr                  tent detection as well as a machine learning so as to find intents even


         Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


                                                                           35
if training data is sparse or unevenly distributed. Our framework also            while providing the capability to complete tasks independent of do-
has a recurrent neural network (RNN) based dialogue act classifier                mains.Also as can be see from Figure 1, domain model also has con-
that is used in addition to the intent detection and the entity extrac-           figurations satisfy conversation flow needs for different domains.
tion to have a flexible policy management that can handle complex                    Chatbot models are created with defining one or more intents ac-
conversations. Additionally, an actively used chatbot may need to be              cording to the topics that have been determined by the content su-
updated to understand and perform new tasks over time. In order to                pervisors. While creating each intent, possible user utterances that
effectively find these new tasks and to collect their related data the            can specify the intent are added as training sentences. If any entity
machine learning models, we have developed an analytics tool to re-               is needed to be collected for an intent the corresponding type of the
inforce capabilities of our chatbot generation framework. With this               entities are inserted to the intent, and their collection type is selected.
tool, new intents can be discovered or data sets for existing intents             After that, any integration needed are added as functions. Eventually
can be expanded.                                                                  the response actions added to the intents. There can be single or mul-
                                                                                  tiple response actions and they may have different conditions which
                                                                                  provide the dynamic responses with regarding to users input to the
2   Dialogue Design
                                                                                  entities.
Generally chatbot content is gathered and structured by the people
who work in public relations or marketing. They mostly have very
little to none coding experience and need people with programming
expertise to maintain the chatbot content for them. We desired people
who supervise the chatbot content also manage the dialogue design.
Thus, the content design is decoupled from technical parts that man-
ages the conversation to simplify creating the chatbots. We developed
an interface for users to create new topics, and structure the content
according to their needs like edit, delete, create flows swiftly.
    To provide this user interface and ease the creation of chatbots,
we streamlined the process and decided to use basic building blocks.
These are:

1. Intent
Intents are the tasks users want to accomplish. They are the main
building blocks of the chatbots in this framework, and they contain
some or all the following components in them. They can be used in
solitude or chained to each other to create flows for more complex
tasks.
2. Entity
They are used if information is needed to be collected from the user
to complete a task associated to an intent. Two types of collection
method can be used to collect entities; prompt; a question is asked                                 Figure 1: Domain Model Structure
to user and waiting for input or choice; the answer is selected from
predetermined set of choices. There are multiple built-in entity types
                                                                                      Each of the chatbot models that are deployed, are uniquely iden-
to satisfy the users’ needs like, phone number, date etc. as well as
                                                                                  tified in the conversation manager so that users can go back to an
ability for chatbot content supervisors to create their own custom
                                                                                  older model if they want to withdraw their changes and continue on
entities.
                                                                                  top of that older model. This uniquely identified models also pro-
3. Training Sentences
                                                                                  vide to serve multiple models in a domain and channel agnostic way.
Training sentences are used in the training of the intent detection
                                                                                  Users from different domains are directed to their respective chatbot
classifier. They are the possible sentences which customers use to
                                                                                  models, and given an answer from that model.
state a specific intent.
4. Response Actions
This component determines the responses given by chatbot for a spe-               3    Intent Detection
cific intent. After these responses given, either chatbot can revert to
its default state waiting for user utterances to find an intent or another        Task-oriented chatbots need to understand users to help them with
follow-up intent can be set and chatbot prompts the user according                their problems. Chatbots created with this framework can contain
to this new intent. Before determining the responses of chatbot some              multiple intents. Smaller scale chatbots that are tailored to one spe-
other actions may need to be taken first, these actions are determined            cific task could be created with this framework, but in many domains
by functions component.                                                           more complex chatbots with multiple intents are desired. Hence, hav-
5. Functions                                                                      ing a powerful and flexible intent detection classifier is a must in task-
Functions are actions taken by chatbot to complete the required task              oriented chatbots. We developed a two stage hybrid structure for this
of an intent. These actions generally uses collected entities as inputs           task.
and they can have connection to external systems such as when mak-                   First stage is a machine learning based intent classifier that uses
ing money a transfer.                                                             ELMo contextual embeddings[8]. For Turkish, we have trained an
                                                                                  ELMo language model from scratch using Turkish Wikipedia dump
The organized and compact chatbot structure has shorten the time                  and Bogazici Web Corpus as data sets [9]. Our combined training
and eliminated the coding knowledge needed to create chatbots,                    data contains more than 500 million tokens. Character based nature


                                                                             36
of the ELMo is very helpful for obtaining better word representations             the user. However, users may have provided values for entity with-
for Turkish since it is an agglutinative language.                                out being prompted and chatbot need not ask for that entity again.
   The ML-based intent classifier is trained with the sample sen-                 So that after finding intent, entity finder module starts to extract en-
tences on the chatbot model. Contextual embeddings of these sen-                  tities related to that intent, even for the entities that it did not prompt
tences are extracted using ELMo and then those embeddings are fed                 explicitly to provide users a more realistic conversation.
to a multi-layer perceptron with softmax output layer. For chatbot us-
age three most probable intents of a message are retrieved using this
                                                                                  5    Conversation Management
model.
   The rule-based intent classifier is stationed as the second part of            Managing the conversation meticulously in the task oriented chatbots
the hybrid intent detection module. This part is used for the intents             is important for helping users. Conversation management is about
with less than minimum amount of training sentences or for specific               tracking chatbot’s state as well as adapting to changing purposes of
intents that can be identified by very distinct keywords or phrases.              the users to create more natural conversations. Users may change
For these intents, basic phrases are defined, and these patterns are              their question while a chatbot is trying to collect entities for an intent,
searched in the user utterances. The matched patterns ranked accord-              or a user may give negative feedback about an answer provided, and
ing to their token counts, with the assumption longer phrases are                 chatbots should handle those user utterances like human to human
more specific in describing an intent. Some intents could be reliably             conversation rather than being stuck in the same state of conversa-
determined at this stage. However, this method does not consider se-              tion.
mantic or contextual information. It makes only pattern based match-                 Chatbot state is determined by found intent and chatbot’s tasks in
ing and can miss words with similar meanings that are not present in              this intent like entity collecting, or responding. Chatbot states are:
the list of pre-determined phrases.
   In the general flow of the intent detection those two methods are              1. Idle: Default state of the chatbot. Chatbots in this state wait for
used in tandem. The possible intents from the first stage is checked              user utterances to detect an intent.
if any of them returned with enough confidence then this intent is                2. Slot Filling: If an entity needed in an intent, chatbot actively
selected, but if any of them does not have enough confidence, then                tries to collect entity values by asking questions and if possible giving
the second stage is used and checked if any of the phrase patterns                choices to the user.
matched in the user utterance. If a single intent is matched then this            3. Confirmation: Some intents require user confirmation before
intent is selected. If zero or more than one intents matched, then up to          taking an action. In this state chatbot asks confirmation to users for
a number(which is a configurable property) of possible intents from               proceeding with the task and waits for their response.
the both stages are presented to the user and ask if they would choose            4. Return Action: In this state chatbot returns an answer to the
any of the choices.                                                               user or takes action using functions inserted from the UI.
                                                                                  5. Next Intent: If current intent will be followed by another intent
                                                                                  in return action state, chatbot transitions to this state.
4   Entity Finder                                                                 6. User Refine: If chatbot is unsure about the intent of the user
                                                                                  and it has some possible intents then a question is asked the user to
Task oriented chatbots sometimes may need to extract information                  clarify if they meant any of the possible intents found
from user utterances to complete their tasks. The entities needed to
be collected have many different types and also users can specify                    Those different states alter the actions taken by the chatbots to
the given type of entities in a different ways, for example users can             complete tasks, but that is a linear structure and may not work for
specify the date as “next Monday” or “06.01.2020” and the entity                  some cases. Sometimes users can change their questions, or give a
finder should determine both are entities that denotes a date.                    negative feedback to a chatbot reply. To accommodate such user be-
   First step of the entity finder is spell correction for utterances. For        haviors intent detection alone would not be enough. Besides the in-
spell correction in Turkish Language, we utilized Zemberek, a natu-               tents of an utterance, dialogue acts of the user utterances should also
ral language processing library [2]. If the word is not in the dictio-            be extracted. Dialogue act is the function of an utterance in a conver-
nary, it is then converted to the word with the nearest Levenshtein               sation context, like question, statement or command. We tailored a
distance to the original one [6]. Then, the numbers in letter are con-            distinct dialogue act class set for the chatbot system by analyzing the
verted to the digits to simplify the extraction of entities with numeric          user behavior in the utterances collected from real-life conversations.
parts.                                                                            These are: Statement, Wh-Question, Yes-No Question, Answer Ac-
   There are entity types with different challenges in extracting. For            cepted, Answer Rejected, Not Understanding Feedback, Command,
the basic types like numerical, time etc. we use regular expressions              Greeting.
that cover the possible notations which users can write. To extract                  The dialogue act classifier uses the same ELMo based general
more complex types like dates we created a module that detect part                Turkish language model used in intent detection for word represen-
of speech tags (POS) using the aforementioned Zemberek library and                tations. These word representations are fed to a bi-directional LSTM
then extract entities using those POS tags and regular expressions.               with attention layer. At the end of the pipeline there is a multi layer
   There can be also custom types of entities which can have a finite             perceptron to decide the dialogue act class. The data set we used for
number of values that are predetermined by chatbot content supervi-               training the dialogue act classifier is created by manually tagging the
sors. Those types of entities have choices and also different possible            utterances received by our chatbots.
words or phrases added as synonyms. Then those choices and their                     Figure 2 shows the architecture of the conversation management
synonyms searched in the user utterances, in a fuzzy manner with                  and analytics. The chatbot state, the dialog act of the user and the
Levenshtein distance. If a similar or exact pattern is found then this            intent found is fed to a policy manager to decide the next action of the
entity value is extracted.                                                        chatbot in a dialogue. For example if a user rejected an answer, giving
   Each entity if they are required to be collected in their related in-          that answer to the next user utterance again would be frustrating for
tent, has a prompt to ask the user specifically to get a value from               the user. Also, the chatbot state is readjusted according to dialogue


                                                                             37
    Domain           Channels                    Intent Count       Content
                          WhatsApp                                   Answers frequently asked questions in banking also makes calculations
    Banking          Facebook Messenger          234                and gives information to users. (Topics are coherent around the channels
                     Yapi Kredi Web Page                             only the responses are customised to channels’ visualisation capacity.)
    Help Desk        Help Desk Page Pilot        346                Helps bank employees with their questions and technical problems
    Tourism          POC                         24                 Help users to find hotel or transportation, make and edit reservations
    HR               POC                         36                 Helps employees to get answer about personal benefits like leaves
                                    Table 1: Chatbots From Different Domains Created Using this Framework

                                                                                  perspective, by grouping the similar utterances together.
                                                                                     We create an n-gram based clustering method for this task. Utter-
                                                                                  ances to be clustered are spell corrected then lemmatized to lessen the
                                                                                  diversity of the n-grams due to the agglutinative Turkish Language.
                                                                                  Later, stop words are eliminated, and then the n-grams are created. In
                                                                                  the next step by using co-occurence of these n-grams, the sentences
                                                                                  are grouped together. This step iterates a number of times (which can
                                                                                  be set by supervisors) with increasing token co-occurence threshold
                                                                                  for merging clusters. This threshold is not constant because at first
                                                                                  iteration each sentence is basically their own cluster and they have
                                                                                  limited number of n-grams and each co-occurence is more impor-
                                                                                  tant than the later steps with bigger clusters. Higher threshold in the
                                                                                  beginning iterations prevents sentences from creating clusters while
                                                                                  lower thresholds in the later stage gravitate towards one big cluster
                                                                                  that covers all utterances. At the last stage created clusters are vi-
                                                                                  sualized. Those utterances can be imported as training sentences to
                                                                                  intents in the model.
                                                                                     The clustering can be used with the search in a way that supervi-
                                                                                  sors can select a set of the utterances with the search. They can then
                                                                                  execute the clustering algorithm over this set to have more specific
                                                                                  analysis. Furthermore this clustering algorithm can be triggered au-
                                                                                  tomatically for each domain in fixed time intervals. This step can be
                                                                                  performed over the utterances with no intent for the detection of the
                                                                                  emerging topics.
          Figure 2: Conversation Management Architecture

acts. For instance, if a user asks a question while chatbot is in the slot        7   Conclusion
filling state, the state is changed to idle and that user utterance sent          In this paper, we presented our chatbot creation framework that can
to intent detection module to find the new intent of the user.                    generate chatbots for any domain. Our framework is supported with
                                                                                  state-of-the-art NLP techniques to better understand user messages
6     Analytics                                                                   and to be flexible and adaptable in generating replies to mimic hu-
                                                                                  man conversation behavior better. Also, by making every step of the
After a chatbot is created, chatbot content must be maintained conti-             chatbot generation configurable, chatbots with widely different be-
nously with updated or new intents in order to respond to the chang-              haviors can be generated and used.
ing demands of the users. Content supervisors for chatbots should                    Our framework has been in active use since December 2018 and
analyse previous dialogues in existing chatbots to determine when                 chatbots for different domains have been created. First chatbot is cre-
this should be done. Such an analysis would take a sheer amount of                ated as Yapı Kredi Bank’s customer service chatbot to serve cus-
time and workforce if done manually. In order to address these issues             tomers on WhatsApp. Since its inception, this chatbot had 765,000
we developed an analysis tool for chatbot conversations that employs              conversations and replied to 2.1 million messages. With the help of
unsupervised clustering algorithms and search capabilities.                       our framework the content managers also updated their models daily
   Each utterance received by the chatbots are indexed with their                 for better bot coverage rate, increasing intent count from 55 to 234.
metadata in Lucene; a search library written in Java which has capa-              This chatbot model also extended to other channels with just tweak-
bilities like, fuzzy, phrase and wild card searches [1]. Using indexed            ing the responses of the bot to channels multimedia capabilities. Sec-
information supervisors can search utterances thoroughly. They can                ond one is chatbot for bank’s help desk to help employees with their
look for utterances with no intent to see what sort of questions chat-            technical and bussiness problems, which is now in pilot phase serv-
bots cannot answer. Supervisors can also search for a specific phrase,            ing 100 branches of the bank, with 346 intents. Other use cases are
keyword to see user utterances around that topic This allows super-               in the POC phase for tourism and HR domains.
visors to gain granular insights from the data from the general topics.
   Despite being helpful, examining each utterance individually is
                                                                                  8   Acknowledgements
not enough for a complete analysis and may lead to wrong conclu-
sions. Clustering methods are useful in inspecting vast amount of ut-             The authors would like to thank Atılberk Çelebi for his contributions
terances, because they help humans to look to the topic in a broader              to the work presented in this paper.


                                                                             38
REFERENCES
 [1] Apache Lucene. https://lucene.apache.org/. Accessed: 2020-03-10.
 [2] Zemberek-NLP. https://github.com/ahmetaa/zemberek-nlp. Accessed:
     2020-02-20.
 [3] Rafael E. Banchs and Haizhou Li, ‘IRIS: a chat-oriented dialogue sys-
     tem based on the vector space model’, in Proceedings of the ACL 2012
     System Demonstrations, pp. 37–42, Jeju Island, Korea, (July 2012). As-
     sociation for Computational Linguistics.
 [4] Kenneth Mark Colby, Artificial Paranoia: A Computer Simulation of
     Paranoid Processes, Elsevier Science Inc., USA, 1975.
 [5] Daniel Jurafsky and James H. Martin, Speech and Language Process-
     ing: An Introduction to Natural Language Processing, Computational
     Linguistics, and Speech Recognition, Prentice Hall PTR, USA, 1st edn.,
     2000.
 [6] Vladimir I Levenshtein, ‘Binary codes capable of correcting deletions,
     insertions and reversals’, Soviet Physics Doklady, 10, 707, (February
     1966).
 [7] Maali Mnasri, ‘Recent advances in conversational NLP : Towards the
     standardization of chatbot building’, CoRR, abs/1903.09025, (2019).
 [8] Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner,
     Christopher Clark, Kenton Lee, and Luke Zettlemoyer, ‘Deep contex-
     tualized word representations’, CoRR, abs/1802.05365, (2018).
 [9] Haşim Sak, Tunga Güngör, and Murat Saraçlar, ‘Turkish language re-
     sources: Morphological parser, morphological disambiguator and web
     corpus’, in Advances in Natural Language Processing, eds., Bengt
     Nordström and Aarne Ranta, pp. 417–427, Berlin, Heidelberg, (2008).
     Springer Berlin Heidelberg.
[10] Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett,
     Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill
     Dolan, ‘A neural network approach to context-sensitive generation of
     conversational responses’, in Proceedings of the 2015 Conference of the
     North American Chapter of the Association for Computational Linguis-
     tics: Human Language Technologies, pp. 196–205, Denver, Colorado,
     (May–June 2015). Association for Computational Linguistics.
[11] Ilya Sutskever, Oriol Vinyals, and Quoc V Le, ‘Sequence to sequence
     learning with neural networks’, in Advances in Neural Information Pro-
     cessing Systems 27, eds., Z. Ghahramani, M. Welling, C. Cortes, N. D.
     Lawrence, and K. Q. Weinberger, 3104–3112, Curran Associates, Inc.,
     (2014).
[12] Richard Wallace, The anatomy of A.L.I.C.E, 181–210, 01 2009.
[13] Joseph Weizenbaum, ‘Eliza a computer program for the study of natural
     language communication between man and machine’, Commun. ACM,
     9(1), 36–45, (January 1966).


                                                                               39

</pre>