=Paper=
{{Paper
|id=Vol-2693/paper5
|storemode=property
|title=Domain-Independent, Task-Oriented Chatbot Creation
and Conversation Policy Management Framework
|pdfUrl=https://ceur-ws.org/Vol-2693/paper5.pdf
|volume=Vol-2693
|authors=Tolga Çekiç,Yusufcan Manav,Enes Burak Dündar,Osman Fatih Kılıç,Onur Deniz,Seçil Arslan
|dblpUrl=https://dblp.org/rec/conf/ecai/CekicMDKDA20
}}
==Domain-Independent, Task-Oriented Chatbot Creation
and Conversation Policy Management Framework==
Proceedings of the Workshop on Hybrid Intelligence for Natural Language Processing Tasks HI4NLP (co-located at ECAI-2020) Santiago de Compostela, August 29, 2020, published at http://ceur-ws.org Domain-Independent, Task-Oriented Chatbot Creation and Conversation Policy Management Framework Tolga Çekiç12 and Yusufcan Manav13 and Enes Burak Dündar4 and Osman Fatih Kılıç5 and Onur Deniz6 and Seçil Arslan7 Abstract. In this paper, we present a chatbot creation framework ageable to create rules that cover more cases. Thus new methods are that can help people with no technical expertise to design and cre- devised to create more effective chatbots. IRIS[3], a general conver- ate chatbots for any domain. This framework enables the creation sation chatbot, uses data from many conversations and extracts the of highly customizable chatbots that can range from simple question most appropriate responses to user utterances. For information re- answer systems to chatbots that can handle more complex dialogue trieval, IRIS uses a vector space model. Gandhe and Traum also pro- flows. In order to be domain independent we created a general Turk- posed a TF-IDF model to retrieve answers for chatbot from a dataset ish language model using ELMo architecture and intent detection of text dialogue scripts. models for chatbots are trained using embeddings generated via this With the advancements in deep neural networks, machine learn- language model. Additionally, in order to make conversations more ing based chatbots have become increasingly successful. Vinyals and seamless and cohesive, dialogue act classification is integrated into Le introduced sequence to sequence (seq2seq) learning for conversa- conversation policy management. The framework also includes an tion systems for generating dynamic responses to each user utter- additional tool that allows monitoring of past chatbot conversations ance [11]. Although they are very successful in generating human- and provides analytic tools supported by clustering algorithms. like sentences as pointed out Sordoni et al. They are actually limited in context sensitive conversations and carrying on information from 1 Introduction previous utterances [10]. While machine learning methods can help create human-like con- Conversational systems, or as they are commonly called chatbots are versations for general conversation, task-oriented chatbots usually ubiquitous nowadays as they are used for simple general conversa- require more complex tools for conversation management. Since tion or more specialized tasks such as customer services. With their task-oriented chatbots are generally deployed commercially and in- popularity much research is focused on chatbots to make them more teract with customers, their responses must be more precise and care- effective in solving users’ problems and ensure conversations with a fully constructed. Thus, unpredictability of seq2seq model sentences chatbot are akin to conversation between humans. can sometimes be undesirable for such chatbots. Also, some infor- While general conversation chatbots usually consider a few utter- mation must be specifically collected from users and must be kept ances from users without coming to a point, for task-oriented chat- to complete task. For instance a chatbot that sells flight tickets must bots sometimes the system must keep information from a much pre- collect departure and arrival locations as well as date so as to be able vious utterance and should try to steer conversation toward a certain to present an offer and sell a ticket. In order to create such chatbots point, thus completing its appointed task such as booking a ticket. intent-slot model is used [5]. Intents are what a user wants to do with While the two approaches have much in common, considering the a chatbot and slots are actually entities that must be collected or filled research and the techniques used, they present different challenges for the chatbot to complete its task related to the intent. In order to [7]. create chatbots with intent-slot model, machine learning methods for Research into chatbots started decades ago with the advent of intent detection and entity extraction are mixed with other techniques ELIZA[13] which was designed to mimic psychotherapy and uses such as state tracking and policy management. pre-programmed rules to generate answers. It receives user utter- Task-oriented chatbots with intent-slot model can be used for ances and by using certain words and word-of-speech tags, it pre- many different domains and they would have similar designs. In or- pares a response by following predetermined rules. Other chatbots der to create a scalable system for developing task-oriented chatbots following similar rule-based design to ELIZA have also been devel- for multiple domains new frameworks were devised. These frame- oped over the years with their own advancements such as PARRY[4] works such as Amazon Lex, Google Dialogflow, Microsoft Luis help and ALICE[12]. create chatbots with little requirements for programming expertise. In order to create chatbots that can perform more complex conver- In this paper we offer a novel chatbot creation framework equipped sations, rule-based approach can be limiting since it can be unman- with tools to handle common problems that can be encountered by 1 Equal Contribution conversation systems. In order to have a powerful intent detection 2 YapıKredi Teknoloji, Turkey, tolga.cekic@ykteknoloji.com.tr mechanism that can work in multiple domains, our framework uses a 3 YapıKredi Teknoloji, Turkey, yusufcan.manav@ykteknoloji.com.tr general ELMo based language model and uses a deep neural network 4 YapıKredi Teknoloji, Turkey, enesburak.dundar@ykteknoloji.com.tr classifier based that uses ELMo embeddings [8]. Furthermore, an ad- 5 YapıKredi Teknoloji, Turkey, osmanfatih.kilic@ykteknoloji.com.tr 6 YapıKredi Teknoloji, Turkey, onur.deniz@ykteknoloji.com.tr ditional hybrid intent detection classifier is built with rule-based in- 7 YapıKredi Teknoloji, Turkey, secil.arslan@ykteknoloji.com.tr tent detection as well as a machine learning so as to find intents even Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 35 if training data is sparse or unevenly distributed. Our framework also while providing the capability to complete tasks independent of do- has a recurrent neural network (RNN) based dialogue act classifier mains.Also as can be see from Figure 1, domain model also has con- that is used in addition to the intent detection and the entity extrac- figurations satisfy conversation flow needs for different domains. tion to have a flexible policy management that can handle complex Chatbot models are created with defining one or more intents ac- conversations. Additionally, an actively used chatbot may need to be cording to the topics that have been determined by the content su- updated to understand and perform new tasks over time. In order to pervisors. While creating each intent, possible user utterances that effectively find these new tasks and to collect their related data the can specify the intent are added as training sentences. If any entity machine learning models, we have developed an analytics tool to re- is needed to be collected for an intent the corresponding type of the inforce capabilities of our chatbot generation framework. With this entities are inserted to the intent, and their collection type is selected. tool, new intents can be discovered or data sets for existing intents After that, any integration needed are added as functions. Eventually can be expanded. the response actions added to the intents. There can be single or mul- tiple response actions and they may have different conditions which provide the dynamic responses with regarding to users input to the 2 Dialogue Design entities. Generally chatbot content is gathered and structured by the people who work in public relations or marketing. They mostly have very little to none coding experience and need people with programming expertise to maintain the chatbot content for them. We desired people who supervise the chatbot content also manage the dialogue design. Thus, the content design is decoupled from technical parts that man- ages the conversation to simplify creating the chatbots. We developed an interface for users to create new topics, and structure the content according to their needs like edit, delete, create flows swiftly. To provide this user interface and ease the creation of chatbots, we streamlined the process and decided to use basic building blocks. These are: 1. Intent Intents are the tasks users want to accomplish. They are the main building blocks of the chatbots in this framework, and they contain some or all the following components in them. They can be used in solitude or chained to each other to create flows for more complex tasks. 2. Entity They are used if information is needed to be collected from the user to complete a task associated to an intent. Two types of collection method can be used to collect entities; prompt; a question is asked Figure 1: Domain Model Structure to user and waiting for input or choice; the answer is selected from predetermined set of choices. There are multiple built-in entity types Each of the chatbot models that are deployed, are uniquely iden- to satisfy the users’ needs like, phone number, date etc. as well as tified in the conversation manager so that users can go back to an ability for chatbot content supervisors to create their own custom older model if they want to withdraw their changes and continue on entities. top of that older model. This uniquely identified models also pro- 3. Training Sentences vide to serve multiple models in a domain and channel agnostic way. Training sentences are used in the training of the intent detection Users from different domains are directed to their respective chatbot classifier. They are the possible sentences which customers use to models, and given an answer from that model. state a specific intent. 4. Response Actions This component determines the responses given by chatbot for a spe- 3 Intent Detection cific intent. After these responses given, either chatbot can revert to its default state waiting for user utterances to find an intent or another Task-oriented chatbots need to understand users to help them with follow-up intent can be set and chatbot prompts the user according their problems. Chatbots created with this framework can contain to this new intent. Before determining the responses of chatbot some multiple intents. Smaller scale chatbots that are tailored to one spe- other actions may need to be taken first, these actions are determined cific task could be created with this framework, but in many domains by functions component. more complex chatbots with multiple intents are desired. Hence, hav- 5. Functions ing a powerful and flexible intent detection classifier is a must in task- Functions are actions taken by chatbot to complete the required task oriented chatbots. We developed a two stage hybrid structure for this of an intent. These actions generally uses collected entities as inputs task. and they can have connection to external systems such as when mak- First stage is a machine learning based intent classifier that uses ing money a transfer. ELMo contextual embeddings[8]. For Turkish, we have trained an ELMo language model from scratch using Turkish Wikipedia dump The organized and compact chatbot structure has shorten the time and Bogazici Web Corpus as data sets [9]. Our combined training and eliminated the coding knowledge needed to create chatbots, data contains more than 500 million tokens. Character based nature 36 of the ELMo is very helpful for obtaining better word representations the user. However, users may have provided values for entity with- for Turkish since it is an agglutinative language. out being prompted and chatbot need not ask for that entity again. The ML-based intent classifier is trained with the sample sen- So that after finding intent, entity finder module starts to extract en- tences on the chatbot model. Contextual embeddings of these sen- tities related to that intent, even for the entities that it did not prompt tences are extracted using ELMo and then those embeddings are fed explicitly to provide users a more realistic conversation. to a multi-layer perceptron with softmax output layer. For chatbot us- age three most probable intents of a message are retrieved using this 5 Conversation Management model. The rule-based intent classifier is stationed as the second part of Managing the conversation meticulously in the task oriented chatbots the hybrid intent detection module. This part is used for the intents is important for helping users. Conversation management is about with less than minimum amount of training sentences or for specific tracking chatbot’s state as well as adapting to changing purposes of intents that can be identified by very distinct keywords or phrases. the users to create more natural conversations. Users may change For these intents, basic phrases are defined, and these patterns are their question while a chatbot is trying to collect entities for an intent, searched in the user utterances. The matched patterns ranked accord- or a user may give negative feedback about an answer provided, and ing to their token counts, with the assumption longer phrases are chatbots should handle those user utterances like human to human more specific in describing an intent. Some intents could be reliably conversation rather than being stuck in the same state of conversa- determined at this stage. However, this method does not consider se- tion. mantic or contextual information. It makes only pattern based match- Chatbot state is determined by found intent and chatbot’s tasks in ing and can miss words with similar meanings that are not present in this intent like entity collecting, or responding. Chatbot states are: the list of pre-determined phrases. In the general flow of the intent detection those two methods are 1. Idle: Default state of the chatbot. Chatbots in this state wait for used in tandem. The possible intents from the first stage is checked user utterances to detect an intent. if any of them returned with enough confidence then this intent is 2. Slot Filling: If an entity needed in an intent, chatbot actively selected, but if any of them does not have enough confidence, then tries to collect entity values by asking questions and if possible giving the second stage is used and checked if any of the phrase patterns choices to the user. matched in the user utterance. If a single intent is matched then this 3. Confirmation: Some intents require user confirmation before intent is selected. If zero or more than one intents matched, then up to taking an action. In this state chatbot asks confirmation to users for a number(which is a configurable property) of possible intents from proceeding with the task and waits for their response. the both stages are presented to the user and ask if they would choose 4. Return Action: In this state chatbot returns an answer to the any of the choices. user or takes action using functions inserted from the UI. 5. Next Intent: If current intent will be followed by another intent in return action state, chatbot transitions to this state. 4 Entity Finder 6. User Refine: If chatbot is unsure about the intent of the user and it has some possible intents then a question is asked the user to Task oriented chatbots sometimes may need to extract information clarify if they meant any of the possible intents found from user utterances to complete their tasks. The entities needed to be collected have many different types and also users can specify Those different states alter the actions taken by the chatbots to the given type of entities in a different ways, for example users can complete tasks, but that is a linear structure and may not work for specify the date as “next Monday” or “06.01.2020” and the entity some cases. Sometimes users can change their questions, or give a finder should determine both are entities that denotes a date. negative feedback to a chatbot reply. To accommodate such user be- First step of the entity finder is spell correction for utterances. For haviors intent detection alone would not be enough. Besides the in- spell correction in Turkish Language, we utilized Zemberek, a natu- tents of an utterance, dialogue acts of the user utterances should also ral language processing library [2]. If the word is not in the dictio- be extracted. Dialogue act is the function of an utterance in a conver- nary, it is then converted to the word with the nearest Levenshtein sation context, like question, statement or command. We tailored a distance to the original one [6]. Then, the numbers in letter are con- distinct dialogue act class set for the chatbot system by analyzing the verted to the digits to simplify the extraction of entities with numeric user behavior in the utterances collected from real-life conversations. parts. These are: Statement, Wh-Question, Yes-No Question, Answer Ac- There are entity types with different challenges in extracting. For cepted, Answer Rejected, Not Understanding Feedback, Command, the basic types like numerical, time etc. we use regular expressions Greeting. that cover the possible notations which users can write. To extract The dialogue act classifier uses the same ELMo based general more complex types like dates we created a module that detect part Turkish language model used in intent detection for word represen- of speech tags (POS) using the aforementioned Zemberek library and tations. These word representations are fed to a bi-directional LSTM then extract entities using those POS tags and regular expressions. with attention layer. At the end of the pipeline there is a multi layer There can be also custom types of entities which can have a finite perceptron to decide the dialogue act class. The data set we used for number of values that are predetermined by chatbot content supervi- training the dialogue act classifier is created by manually tagging the sors. Those types of entities have choices and also different possible utterances received by our chatbots. words or phrases added as synonyms. Then those choices and their Figure 2 shows the architecture of the conversation management synonyms searched in the user utterances, in a fuzzy manner with and analytics. The chatbot state, the dialog act of the user and the Levenshtein distance. If a similar or exact pattern is found then this intent found is fed to a policy manager to decide the next action of the entity value is extracted. chatbot in a dialogue. For example if a user rejected an answer, giving Each entity if they are required to be collected in their related in- that answer to the next user utterance again would be frustrating for tent, has a prompt to ask the user specifically to get a value from the user. Also, the chatbot state is readjusted according to dialogue 37 Domain Channels Intent Count Content WhatsApp Answers frequently asked questions in banking also makes calculations Banking Facebook Messenger 234 and gives information to users. (Topics are coherent around the channels Yapi Kredi Web Page only the responses are customised to channels’ visualisation capacity.) Help Desk Help Desk Page Pilot 346 Helps bank employees with their questions and technical problems Tourism POC 24 Help users to find hotel or transportation, make and edit reservations HR POC 36 Helps employees to get answer about personal benefits like leaves Table 1: Chatbots From Different Domains Created Using this Framework perspective, by grouping the similar utterances together. We create an n-gram based clustering method for this task. Utter- ances to be clustered are spell corrected then lemmatized to lessen the diversity of the n-grams due to the agglutinative Turkish Language. Later, stop words are eliminated, and then the n-grams are created. In the next step by using co-occurence of these n-grams, the sentences are grouped together. This step iterates a number of times (which can be set by supervisors) with increasing token co-occurence threshold for merging clusters. This threshold is not constant because at first iteration each sentence is basically their own cluster and they have limited number of n-grams and each co-occurence is more impor- tant than the later steps with bigger clusters. Higher threshold in the beginning iterations prevents sentences from creating clusters while lower thresholds in the later stage gravitate towards one big cluster that covers all utterances. At the last stage created clusters are vi- sualized. Those utterances can be imported as training sentences to intents in the model. The clustering can be used with the search in a way that supervi- sors can select a set of the utterances with the search. They can then execute the clustering algorithm over this set to have more specific analysis. Furthermore this clustering algorithm can be triggered au- tomatically for each domain in fixed time intervals. This step can be performed over the utterances with no intent for the detection of the emerging topics. Figure 2: Conversation Management Architecture acts. For instance, if a user asks a question while chatbot is in the slot 7 Conclusion filling state, the state is changed to idle and that user utterance sent In this paper, we presented our chatbot creation framework that can to intent detection module to find the new intent of the user. generate chatbots for any domain. Our framework is supported with state-of-the-art NLP techniques to better understand user messages 6 Analytics and to be flexible and adaptable in generating replies to mimic hu- man conversation behavior better. Also, by making every step of the After a chatbot is created, chatbot content must be maintained conti- chatbot generation configurable, chatbots with widely different be- nously with updated or new intents in order to respond to the chang- haviors can be generated and used. ing demands of the users. Content supervisors for chatbots should Our framework has been in active use since December 2018 and analyse previous dialogues in existing chatbots to determine when chatbots for different domains have been created. First chatbot is cre- this should be done. Such an analysis would take a sheer amount of ated as Yapı Kredi Bank’s customer service chatbot to serve cus- time and workforce if done manually. In order to address these issues tomers on WhatsApp. Since its inception, this chatbot had 765,000 we developed an analysis tool for chatbot conversations that employs conversations and replied to 2.1 million messages. With the help of unsupervised clustering algorithms and search capabilities. our framework the content managers also updated their models daily Each utterance received by the chatbots are indexed with their for better bot coverage rate, increasing intent count from 55 to 234. metadata in Lucene; a search library written in Java which has capa- This chatbot model also extended to other channels with just tweak- bilities like, fuzzy, phrase and wild card searches [1]. Using indexed ing the responses of the bot to channels multimedia capabilities. Sec- information supervisors can search utterances thoroughly. They can ond one is chatbot for bank’s help desk to help employees with their look for utterances with no intent to see what sort of questions chat- technical and bussiness problems, which is now in pilot phase serv- bots cannot answer. Supervisors can also search for a specific phrase, ing 100 branches of the bank, with 346 intents. Other use cases are keyword to see user utterances around that topic This allows super- in the POC phase for tourism and HR domains. visors to gain granular insights from the data from the general topics. Despite being helpful, examining each utterance individually is 8 Acknowledgements not enough for a complete analysis and may lead to wrong conclu- sions. Clustering methods are useful in inspecting vast amount of ut- The authors would like to thank Atılberk Çelebi for his contributions terances, because they help humans to look to the topic in a broader to the work presented in this paper. 38 REFERENCES [1] Apache Lucene. https://lucene.apache.org/. Accessed: 2020-03-10. [2] Zemberek-NLP. https://github.com/ahmetaa/zemberek-nlp. Accessed: 2020-02-20. [3] Rafael E. Banchs and Haizhou Li, ‘IRIS: a chat-oriented dialogue sys- tem based on the vector space model’, in Proceedings of the ACL 2012 System Demonstrations, pp. 37–42, Jeju Island, Korea, (July 2012). As- sociation for Computational Linguistics. [4] Kenneth Mark Colby, Artificial Paranoia: A Computer Simulation of Paranoid Processes, Elsevier Science Inc., USA, 1975. [5] Daniel Jurafsky and James H. Martin, Speech and Language Process- ing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice Hall PTR, USA, 1st edn., 2000. [6] Vladimir I Levenshtein, ‘Binary codes capable of correcting deletions, insertions and reversals’, Soviet Physics Doklady, 10, 707, (February 1966). [7] Maali Mnasri, ‘Recent advances in conversational NLP : Towards the standardization of chatbot building’, CoRR, abs/1903.09025, (2019). [8] Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer, ‘Deep contex- tualized word representations’, CoRR, abs/1802.05365, (2018). [9] Haşim Sak, Tunga Güngör, and Murat Saraçlar, ‘Turkish language re- sources: Morphological parser, morphological disambiguator and web corpus’, in Advances in Natural Language Processing, eds., Bengt Nordström and Aarne Ranta, pp. 417–427, Berlin, Heidelberg, (2008). Springer Berlin Heidelberg. [10] Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan, ‘A neural network approach to context-sensitive generation of conversational responses’, in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies, pp. 196–205, Denver, Colorado, (May–June 2015). Association for Computational Linguistics. [11] Ilya Sutskever, Oriol Vinyals, and Quoc V Le, ‘Sequence to sequence learning with neural networks’, in Advances in Neural Information Pro- cessing Systems 27, eds., Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 3104–3112, Curran Associates, Inc., (2014). [12] Richard Wallace, The anatomy of A.L.I.C.E, 181–210, 01 2009. [13] Joseph Weizenbaum, ‘Eliza a computer program for the study of natural language communication between man and machine’, Commun. ACM, 9(1), 36–45, (January 1966). 39