-

Task-oriented Conversational Agent Self-learning Based on Sentiment Analysis

Serena Leggeri

leggeri.1228424@studenti.uniroma1.it 1

Andrea Esposito

andrea.esposito@badgebox.com 0

Luca Iocchi

iocchi@diag.uniroma1.it 1 0 BadgeBox srl , Roma , Italy 1 Sapienza University of Rome , Italy

4 15

One of the biggest issues in creating a task-oriented conversational agent with natural language processing based on machine learning comes from size and correctness of the training dataset. It could take months or even years of data collection and the resulting static resource may get soon out of date thus requiring a signi cant amount of work to supervise it. To overcome these di culties, we implemented an algorithm with the ability of improving learning e ciency based on the emotions and reactions arising from the conversation between a user and the bot, automatically and in real time. To this end, we have studied an error function that, as in any closed loop control system, corrects the input to improve the output. The proposed method is based on both calibrating the interpretation given to the initial dataset and expanding the dictionary with new terms. Thanks to this innovative approach, the satisfaction of the interlocutors is higher if compared to algorithms with a static dataset or with semi-automatic self-learning rules.

task-oriented conversational agent supervised learning

Task-oriented conversational agents are software components based on arti cial intelligence that are able to simulate an intelligent conversation with the user on a chat and o er a functional support service through the main messaging platforms such as Slack, Telegram and Facebook Messenger. Conversational agents are created for various purposes: from customer care, to the dissemination of news, o ers, promotions and as support for the activation of a service. The strength of these solutions is in being autonomous, available 24 hours a day to o er help to the user who requests it.

When a developer designs a task-oriented conversational agent, its main purpose is to make sure that it ful lls all the user requests based on the speci c topic for which it was designed, trying to nd the most relevant answer to the question that was sent, without the intervention of human operators. Agents based on machine learning techniques make use of a training dataset. Initially, the dataset contains a nite number of contexts that describe the topic for which the bot was created and for each context there is a nite number of sentences describing the user's intention. As the dataset is created to satisfy certain types of requests, there may be limited ways in which the user can ask a question and limited types of answers. In some cases, the answer may not be adequate to the question and it is important to keep improving the e cacy of the agent during the operation. However, this requires manual operations in labelling new samples for re ning the learning process.

The idea of the method described in this paper is to exploit the analysis of user satisfaction, to improve the e ectiveness of the learning process of the agent. More speci cally, we aim at on-line and automatic generation of new labelled samples to be included in the training dataset to re ne the agent learned model.

Starting from this idea, we have developed a method that allows to increase the dataset automatically and in real time inserting new terms and recalibrating those already present in the dataset thus improving the recognition of the user's intentions. To do this, we analyzed the emotionality generated in the users by the bot answers.

The proposed approach has been deployed and validated on a real use case, coming from a commercial application of a chatbot acting as customer care helping on timesheet and employee management in a company [ 1 ]. The evaluation process also contains a comparison with other techniques. The results show that, when compared with other techniques not using such analysis, the proposed method can automatically increase the dataset in real time and improve the quality of the chatbot's answers. The proposed method is also faster in recognizing the contexts compared with other techniques.

Although the deployment and experimental evaluation have been focussed on a particular real use case, the proposed method has no domain speci c components or assumptions and thus we believe that it can be properly applied to other domains as well. 2

Related work

Over the course of time, numerous chatbots have been created to provide information, help making decisions, allow services or simply for entertainment [ 2 ].

Initially, the development of a bot was based on two fundamental components [ 3 ]: { Natural Language Understanding module, used by the Dialogue Manager, that processes the user input to search for keywords through which to understand the action to be taken. { Natural Language Generation module that generates answers from the information gathered by the Dialogue Manager.

Over time, we have faced a real evolution in the development of task-oriented conversational agents thanks to the availability of deep learning techniques [ 4 ] [ 5 ]. These agents are typically trained for several years with the use of human intervention supervising them and verifying the correctness of the answers. In this way, it was able to carry out operations that the other assistants could not perform such as making payments or booking a trip.

The most di cult challenge for a task-oriented conversational agent is to be able to incorporate the linguistic context (all that is said during the conversation and that gives meaning to each taking of the turn in the dialogue) and the physical context (for example user information, place, date and time of the conversation, etc.). A good exploitation of the linguistic context is possible only when the bot is trained on a good training dataset supervised by human experts in order to minimize errors.

In addition to task-oriented conversational agents realized assuming a large amount of training data and the availability of human experts during all the training phases, it is interesting to study development of agents with minimal requirements in terms of availability of training data and human supervision. This goal is motivated by the need of developing chatbots by small companies for which high amount of data and human supervision have a too high costs compared to the commercial bene t of the chatbot product.

Therefore, our goal is to study the de nition of a task-oriented conversational agent to understand user intents and perform actions starting from a limited dataset and improving its performance over time in a semi-automatic way.

Some approaches for machine learning classi cation exploiting both labeled and unlabeled data include: { Co-training [ 6 ] [ 7 ], where a labeled dataset is used to train two classi ers A and B, while the unlabeled dataset is divided in two subsets: each subset is classi ed through one classi er (e.g., A) and the con dent values are used to train the other classi er (e.g., B). { Re-weighting [ 8 ] and Common Components Using EM [ 9 ] [ 10 ] aim at nding a function using the labeled dataset to rede ne the unlabeled dataset.

Although these approaches provide for interesting ways of exploiting unlabeled data, they are not appropriate to our problem of creating agents from a limited dataset and with minimal human supervision. In the rst case, we do not have a function to estimate the error, thus a scheduled human intervention would be required to check for it. In the other two cases, our labeled set is too small to de ne a real and e ective function to apply at the unlabeled set.

The method proposed in this paper is inspired by the co-training approach combined with an estimation of error by sentiment analysis for the bot to learn what is right and what is wrong. Moreover, we want to rely on the user's satisfaction without asking for explicit feedback, but analyzing user answers through a sentiment analysis.

A novel contribution of this paper is thus the use of sentiment analysis to drive the on-line automatic learning of an agent.

Proposed method

De nition 1 (Intents). An intent is a semantic label representing an intention of the end-user.

For each intent, we have de ned a set of sentences that represent it. Each sentence that describes an intent contains entities that are attributes speci c to the given intent.

De nition 2 (Entities). Entities are the parameters of the intent that help in de ning the speci c user request.

An example taken from the dataset is shown below. Example 1. Request: "I need days o from tomorrow to the day after tomorrow".

Intent: LEAV E REQU EST Entities: { start date: tomorrow. { end date: day after tomorrow.

In this example, the scenario of an employee requesting holidays is represented by the LEAV E REQU EST intent and by the start date and end date entities.

An example of how the proposed method works in case of positive sentiment is shown below.

Example 2. Time o request

Request: "Hi" fintent: [HELLO] detected: [Hi] sentiment: [neutral]g Bot: "Hi, Dave" Request: "I'm stuck in tra c, I'll be there soon" fintent: [TIMEOFF REQUEST] detected: [there,soon] new words: [stuck,tra c] sentiment: [neutral]g Bot: "Ok, do you want to create a time o request?" Request: "Yes, thank you!" fintent: [CONFIRM] detected: [Yes, thank, you] sentiment: [positive]g

In this example the agent detects the correct intent by the words there and soon and enriches the dictionary with stuck and tra c. In the future, if these words are often used for a time o request, they will become characteristic for this intent. (i.g. Dave in the future he will can write "I'm stuck in tra c" or "There is tra c" to request a time o ).

An example of how the proposed method works in case of negative sentiment is shown below.

Example 3. Time o request

Request: "Hi" fintent: [HELLO] detected: [Hi] sentiment: [neutral]g Bot: "Hi, Dave" Request: "Tomorrow I'll be busy" fintent: [LEAVE REQUEST] detected: [tomorrow,busy] sentiment: [neutral]g Bot: "Ok, do you want to create a leave request?" Request: "No, that's not what I want!" fintent: [NOT CONFIRM] detected: [No, that's not, what, I, want] sentiment: [negative]g

In this example the agent detects the uncorrect intent by the words tomorrow and busy. In the future, if the bot will always receive a negative response to the request that he proposes then the words found will no longer be characteristics of the intent found and can be totally eliminated.

Our approach is frame-based, conversational agent extracts from the text the main informations to ll the user's request and if them aren't enough it can directly ask the missing information.

We have also de ned a dictionary that the bot uses to translate the type of some words. For example, the terms "tomorrow" and "day after tomorrow" are assigned to the type date.

Notice that the concepts of intents and entities are domain independent, while of course the values associated to them must be provided by an expert of the chatbot application domain. 3.1

Classi cation of intents

The classi cation problem considered in our method is determining the intent and the entities associated to a given user sentence.

User sentences are represented with bag of words [ 18 ], without considering the order of the words. To improve classi cation accuracy, we also use a vocabulary of n-words with an N-gram model [ 20 ].

The classi cation algorithm is based on Naive Bayes Text Classi er [ 21 ], a statistical technique able to estimate the probability of an element belonging to a certain class. The Naive Bayes technique estimates the conditional probabilities of each word given the classi cation category by associating every word that convey the same meaning in the intents, a numerical value that we will consider as a weight. The words that characterize an intent will have greater weight because they will only be found within that intent, so their occurrence is limited compared to non-characterizing words that we nd in numerous intents.

More formally, let Z1; : : : ; Zn be the words that form the user input, the classi cation process aims at retrieving the intent In such that P fInjZ1; : : : ; Zng is maximum.

Example 4. Given an intent Leave representing requests from a user regarding leaves, we would like sentences such as "I want go to holidays", "I'm tired, I need to rests", "I want holidays for this month", etc. to be classi ed as Leave. 3.2

Self-learning based on sentiment analysis

The main idea developed in this paper is to provide the agent with the ability to automatically collect feedback about its answers in order to improve its knowledge base. To this end, we experimented the use of sentiment analysis. At the beginning, the agent acts according to a model derived from the initial training set, but during the use the model is updated according to the self-learning process explained in this section. In particular, we have de ned an error function for the agent exploiting the sentiment analysis derived from the dialogue between the user and the bot.

To detect the sentiment from user sentences, we have de ned another classi cation problem from user input to three classes: Positive, Negative and Neutral and use again a Naive Bayes approach to train this classi er on a speci c dataset [ 22 ]. For any user's sentence, we keep track of local and global sentiment score, local score is about the last sentence, global score is an average value across the dialogue. Furthermore, to improve the idea, we can de ne some particular intents An that act as modi ers. For example, when the user corrects the bot with phrases like "I'm sorry I did not mean this", this is considered as a negative feedback, while phrases containing speci c thanks, such as "Thank you! I was trying to do exactly this!" provide for positive feedback.

Based on the result of the sentiment analysis, we can recalibrate the calculated weights of words wij for the Intent classi er and, if necessary, add new terms to the dataset. We have implemented a low-pass lter [ 28 ] for smoothing the high frequence in update function in order to mitigate possible errors.

The main algorithm for self-learning is summarized below.

Let: 1. Ii be the i-th detected intent in the user input U 2. wij be the value of the weight of the j-th word in U 3. c be a value between 0 and 1 that represents the sentiment for the user during the dialog and it's computed by the results of negative or positive nearest of the words 4. mv and Mv be constant values (in our experimental sessions we set mv = 0:1 and Mv = 0:3) 5. v be a variable set to mv by default, and set to Mv if Ii 2 A (where A is a know positive or negative intent) 6. k be a constant (in our experimental session we set k = 0:4). { If the sentiment analysis is positive, a new word is added to the vocabulary (if it does not exist) and the weight for every word in the user input (if it is already present) is recalculated, according to this formula: wij = wij (1 v) + nij v (1) where nij = wij + ck. { If the sentiment analysis is neutral, no changes are made. { If the sentiment analysis is negative than the weight for every word in the user input (if it is present) is recalculated according to the formula (1) where nij = wij ck. When wij becomes negative, the word is removed from the detected intent. 4

Experimental results

The task-oriented conversational agent described in this paper was created to be used as a virtual assistant to help people understand the use of BadgeBox [ 1 ] and to help them to perform actions directly and easily. The experiments reported in this section have been performed during normal operations of the chatbot in a real operational setting. More speci cally, to verify the e ectiveness of selflearning through the use of sentiment analysis, four groups of six people were recruited to interact with the chatbot for 28 days. We have selected users between 20 and 50 years old, with an almost equal gender distribution (13 males and 11 females). The groups were randomic but with a fair distribution of genders and ages. We have created 4 instances of the bot and invited them to use the correspondent instance. Each day, at 18 o'clock, we used to send them a survey about the use of the system and how it was useful for the scope. The survey asks an evaluation from -5 to 5 about how the bot was responsive for their requests. With these evaluations we have been able to elaborate their satisfaction. None of the chosen people has worked on the project development, so none of the users were not aware of the phrases present in the dataset. The four variants of the chatbot tested were:

1. a baseline without self-learning 2. a method using a random self-learning 3. supervision of a human who updates the dataset manually 4. the proposed self-learning method based on sentiment analysis

The experimental results are summarized in Table 1, where the columns have the following meaning: { Day: day on which the experimental data are collected. { Method : variant of the chatbot tested. { Interactions: the number of requests made to the bot during the testing period. { Satisfaction: the average satisfaction score of the user surveys. { Identi ed intents: correctness of the bot answers compared to the user requests. We have checked them by analyzing the operations run on the application during the test. { Variance: satisfaction variance.

Table 1 shows the results obtained after 7, 14, 21 and 28 days of tests. We can see how the system proposed in this paper (Method 4) improves the level of satisfaction compared to the baseline methods (Methods 1 and 2) and has similar or superior recognition performances compared to the manually supervised one (Method 3). It is interesting to notice also that the self-learning method retains the advantage over the manually supervised system, probably because the manual supervision was updated not in real time but on a weekly basis. This result further con rms the bene ts of on-line self-learning.

At the moment, we are running the test in production environment and we are constantly monitoring how the novel approach is performing. After a month of testing, with about 1.500 users, we have noticed an increment of 10% of correctness of the answers of the bot with respect to the right intent (from 65% to 75,4%).

From the results obtained during these experiments, we can conclude that the agent is actually able to self-learn through sentiment analysis and achieves performance that are similar to the agents that learn through human supervision, but with a signi cant less e ort in human resources necessary for training purposes. 5

Conclusions and future works

In this paper, we have presented a self-learning method for a chatbot, using the analysis of the emotionality of the responses sent by the user as an error function and as a self-learning balancer.

One of the di culties overcomed and solved was precisely the nature of the discursive data source that can not be de ned before, leaving the domain of the unlabeled set unknown until it is written by the user.

With our implementation and our experimental results, we have shown that the sentiment analysis approach is able to improve the initial dataset in real time and automatically, choosing on the basis of the answers what to learn continuously and in a collaborative way among all the users who interact with the bot.

The results also show that our approach is better, in some cases, or similar to the one supervised by a human expert, where answers and learning are carried out thanks to the intervention of a person who corrects and improves the dataset manually.

Consequently, the proposed method, in addition to signi cantly reducing maintenance costs, paves the way for many applications aimed at customer satisfaction and make the software closer and more similar to the people who have to use it.

Learning slang, new speci c words and consequently improving and expanding the training set, makes its service to users more e ective, including more complex vocabularies and adapting to the target audience.

After the various tests performed, we have implemented the bot in a production application by tracing the quantity of actions performed and the degree of satisfaction. The system is still on-line and is able to guarantee the use of the software via chat in a natural and increasingly comprehensive way compared to the requests.

Among future applications, we can think of working on the parameters and give personality to the bot, make it more surly or more helpful, or to adapt to the user. We can also increase its learning ability by giving it the opportunity to learn new intents through the use of certain actions that expand its skills. Finally, we can also use this approach to evaluate and measure the skills of bots without self-learning.

1. BadgeBox, https://www.badgebox.com/en/.

Shawar , E. Atwell: Chatbots: are they really useful? . Journal for Language Technology and Computational Linguistics , vol. 22 , no. 1. GSCL German Society for Computational Linguistics , 2007 , pp. 29 - 49 .

Jurafsky , H. James: Speech and language processing an introduction to natural language processing, computational linguistics, and speech . Pearson Education , 2000 .

Collobert , J. Weston: A uni ed architecture for natural language processing: Deep neural networks with multitask learning , in Proceedings of the 25th international conference on Machine learning. ACM , 2008 , pp. 160 - 167 .

LeCun , Y. Bengio,

G. E.

Hinton : Deep learning . in Nature: International weekly journal of science , vol. 521 , no. 7553. Macmillan , 2015 , pp. 436 - 444 .

6. Blum , A., Mitchell, T.: Combining labeled and unlabeled data with co-training . In Proceedings of the Workshop on Computational Learning Theory , pp. 92 - 100 , Madison, WI , 1998 .

7. Nigam , K. Ghani , R.: Analyzing the e ectiveness and applicability of co-training . In Proceedings of Ninth International Conference on Information and Knowledge Management , 2000 , pp. 86 - 93 .

8. Crook , J. Banasik , J.: Sample selection bias in credit scoring models . International Conference on Credit Risk Modeling and Decisioning , Philadelphia, PA, 2002 .

9. Ghahramani , Z. , Jordan , M.I. : Learning from incomplete data . Technical Report 108 , MIT Center for Biological and Computational Learning , 1994 .

10. Miller , D.

Uyar , S.:

A mixture of experts classi er with learning based on both labeled and unlabeled data . Advances in Neural Information Processing Systems 9 , 1997 , pp. 571 - 578 , MIT Press.

11. Zhang , T. , Oles , F.J.: A probability analysis on the value of unlabeled data for classi cation problems . In Proceedings of Seventeenth International conference on Machine Learning , 2000 , pp 1191 - 1198 , Stanford, CA.

12. Seeger , M. : Learning with labeled and unlabeled data . Technical report , Institute for ANC, Edinburgh, UK, 2000 . http://www.dai.ed.ac.uk/ seeger/papers.html.

13. Kremer , S. , Stacey , D.. NIPS 2001 Workshop and Competition on unlabeled data for supervised learning , 2001 . http://q.cis.guelph.ca/ skremer/NIPS2001/.

14. Karakoulas , G. , Salakhutdinov , R.: Semi-supervised Mixture of Experts Classi cation . In Proceedings of the Fourth IEEE International Conference on Data Mining , 2003 , pp. 138 - 145 , Brighton UK.

15. Goldman , S. , Zhou , Y. : Enhancing supervised learning with unlabeled data . In Proceedings of the Seventeenth International Conference on Machine Learning , 2000 , pp. 327 - 334 , San Francisco, CA.

16. Provost , F. , Fawcett , T. : Robust classi cation for imprecise environments . Machine Learning , 42 , 2001 , 203 - 231 .

17. Tomoya

Sakai

, Marthinus Christo el du Plessis , Gang Niu, Masashi Sugiyama: Semi-Supervised Classi cation Based on Classi cation from Positive and Unlabeled Data .

18.

George , S. Joseph: Text Classi cation by Augmenting Bag of Words (BOW) Representation with Co-occurrence Feature . IOSR Journal of Computer Engineering ( IOSR-JCE) e-ISSN: 2278 - 0661 , p- ISSN : 2278 - 8727Volume 16, Issue

, Ver . V ( Jan . 2014 ), PP 34 -38 www.iosrjournals.org

19. Goldberg Yoav: Neural Network Methods in Natural Language Processing . 1st edn. Morgan and Claypool publishers, 2017 .

20. Daniel Jurafsky, James H. Martin: Speech and Language Processing An Introduction to Natural Language Processing , Computational Linguistics, and Speech Recognition. 2nd edn. Pearson , 2008 .

21.

Zhang , D. Li: Naive Bayes Text Classi er . Granular Computing , 2007 . GRC 2007 . IEEE International Conference on ( 2007 ).

22. Russo , Irene; Frontini, Francesca and Quochi, Valeria, 2016 ,

OpeNER

Sentiment Lexicon Italian - LMF , ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli" , National Research Council, in Pisa, http://hdl.handle. net/20.500 .11752/ILC-73.

23. Paul Prasse, Christoph Sawade, Niels Landwehr, Tobias Sche er: Learning to Identify Concise Regular Expressions that Describe Email Campaigns . Journal of Machine Learning Research 16 ( 2015 ) 3687 - 3720

24. Jugal

Kalita

, Marc Moreno Lopez: Deep Learning applied to NLP . arXiv:1703.03091v1 [cs.CL] 9 Mar 2017 .

25. Zhang , Y. , Jin , R. , Zhou , Z. H. ( 2010 ). Understanding bag-of-words model: A statistical framework . International Journal of Machine Learning and Cybernetics , 1 ( 1-4 ), 43 - 52 . DOI: 10 .1007/s13042-010-0001-0

26. Jonathan J. Webster , Chunyu Kit: Tokenization as the initial phase in nlp . Proceeding COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4 , Pages 1106- 1110 .

27. Wilson, T. Wiebe, J. Ho mann, P.: Recognizing Contextual Polarity in PhraseLevel Sentiment Analysis , Proceeding HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Pages 347 - 354 .

28. Low pass lter , https://en.wikipedia.org/wiki/Low-pass lter.