Injecting Designers’ Knowledge in Conversational Neural Network Systems Giancarlo A. Xompero1 , Cristina Giannone1 , Fabio Massimo Zanzotto2[0000−0002−7301−3596] , Andrea Favalli1 , and Raniero Romagnoli1 1 Language Technology Lab, Almawave srl, Rome, Italy [first name initial].[last name]@almawave.it 2 ART Group, University of Rome Tor Vergata, Rome, Italy fabio.massimo.zanzotto@uniroma2.it Abstract. Sequence-to-sequence neural networks are redesigning dialog man- agers for Conversational AI in industries. However, industrial applications im- pose two important constraints: training data are often scarce and the behavior of dialog managers should be strictly controlled and certified. In this paper, we pro- pose the Conversational Logic Injected Neural Network (CLINN). This novel net- work merges dialog managers “programmed” using logical rules and a Sequence- to-Sequence Neural Network. We experimented with the Restaurant topic of the MultiWOZ dataset. Results show that injected rules are effective when training data set are scarce as well as when more data are available.3 1 Introduction Sequence-to-sequence neural networks are giving an unprecedented boost to dialog sys- tems and to the adoption of Conversational AI in industries. Sequence-to-sequence dia- log systems based on Recurrent Neural Networks (RNNs) have been used to train open domain [9, 7] as well as task-oriented [12] dialog systems. These RNN-based dialog systems have reached interesting results given a sufficiently big set of training data. Transformer-based systems, instead, are less demanding as these can be pre-trained on large datasets and, then, adapted to carry out specific task-oriented dialogs [4, 10, 2]. Due to its interesting performance, Conversational AI is becoming an integral part of business practice across industries4 . More and more companies are adopting the advan- tages dialog systems or chatbots bring to customer service, sales as well as workplace assistant. However, the adoption of conversational AI in industries impose two important con- straints on the design of dialog systems: (1) the scarcity of training data and (2) the need for an extreme control on the behavior of dialog systems. In fact, in industrial applications, the scarcity and, sometimes, the complete absence of pre-existing conver- sation data is the norm. Generally, the Wizard-of-Oz approach for data collecting [11] is adopted to generate training data. This is an expensive process and it is generally 3 Copyright (c) 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 4 https://www.gartner.com/smarterwithgartner/chatbots-will-appeal-to-modern-workers/ 170 not able to provide high quality datasets [8]. On the other hand, the need for an ex- treme control of dialog systems is generally solved by using dialog systems that can be “programmed” with explicit rules. Undoubtedly, these dialog systems offer extremely precise dialog control in business scenario need and, at the same time, guarantying a satisfying experience for users in covered cases. In this context, design conversational experience is done by defining rules depending on the dialog context and on inter- pretations of user inputs [6]. Hand-crafted rules ensure generally more control in the conversation flow but do not guarantee scalabily and the generalization given by learn- ing approaches. If dialog interactions are not explicitly modeled, the interaction may miserably fail. In this paper, we propose to empowering Seq-to-Seq Neural Networks with Conver- sational Logic Instructions, to satisfy the two industrial constraints on these sequence- to-sequence dialog systems. We adopt a neural dialog manager, based on the Domain Aware Multi-Decoder network [14], adding to it explicit conversational logic instruc- tions to keep human-in-the-loop [13]. The Conversational Logical Injection in Neural Network (CLINN) system combines the generalized power of neural architectures with the control on specific conversational patterns defined by the designers. We experi- mented with the Restaurant topic of the MultiWOZ dataset [1]. We used two different sets of dialogs to allow conversational designers to generate explicit rules. Results show that rules injected are effective in the situation when training data are scarce and, more- over, the defined behaviors on specific conversational patterns are preserved. 2 Method and System 2.1 Domain Aware Multi-Decoder (DAMD) network Fig. 1. Architecture of the Domain Aware Multi-Decoder (DAMD) network In this study, we use an end-to-end dialog architecture that includes the concept of belief span [5]. The belief span is a sequence of symbols that expresses the belief state at each turn of the dialog. In particular, we rely on the pipeline realized by Zhang et al. [14] that consists of four seq-to-seq modules plus the access to an external database (Fig. 1). The pipeline is applied for each turn of the dialog. It, globally, takes four inputs 171 (Ut , Rt−1 , Bt−1 , At−1 ) and produces three outputs (Rt , Bt , At ) where t is the actual turn, Ut is the user utterance, Rt−1 and Rt are the previous and the current system responses, Bt−1 and Bt are the previous and the current belief state spans, At−1 and At are the previous and the produced system actions. The four modules behave as follows. The context encoder encodes the context of the turn (Ut , Rt−1 ) in a context vector ct . The belief span decoder decodes the previous belief span Bt−1 and, along with the context vector ct produces the belief span Bt of current turn. This Bt is used to query the database DB and the answer DBt is concatenated with Bt to form the internal (i) state St of the turn. Then, the action span decoder produces the current action At by taking into consideration the current state St and the previous action At−1 . Finally, the response decoder emits the final response Rti taking into consideration the current state (i) St and the corresponding action At . In [14], multiple actions and multiple responses are produced to increase variability in dialogues and, for this reason, the framework is called multi-action data augmentation. 2.2 Injecting Hand-Crafted Knowledge in DAMD DAMD network offers a tremendous opportunity to inject external knowledge. In fact, the belief span decoder transforms the internal context vector ct and an explicit sym- bolic previous belief span Bt−1 in an explicit belief span Bt . In the same way, the action span decoder takes in input an explicit, symbolic previous action At−1 . As Bt−1 and At−1 are explicit, these can be easily controlled by an external, symbolic module. Fig. 2. Injecting External Knowledge in DAMD with CLINN We then propose an external knowledge injector module, that is, our Conversational Logical Injection in Neural Network (CLINN), that allows conversational designers to control the dialog flow with symbolic rules. CLINN acts in between turns, that is, it takes the output and the input of the DAMD network at a given turn t and gives an input to the next step (Fig. 2). CLINN aims to control the next belief state Bt and the action At given the previous belief state Bt−1 , the previous action At−1 and the current user utterance Ut . We integrated the CLINN approach into a rule based dialog management system [3]. The rules are derived from the state machine diagram designed by the conversational 172 designers when they defined the interaction experience in term of tasks and behaviors of the conversational agent. Within the diagram the conversation is defined in term of system actions (i.e. the states) and user input and belief span (in the edges), i.e. the preconditions for changing the state. These are a convenient way for designers to express the conversation behavior they want to mould5 . In our setting, these diagrams become logical rules that fire when preconditions are matched in the conversation turn. Designing the behaviors for all the possible interactions is very hard and unfruitful. Then, training a neural network can be the solution. However, training a neural network requires a lot of data. Writing symbolic rules is way to inject knowledge in CLINN to boost neural network learning. 3 Experiments 3.1 Experimental Set-Up We evaluated CLINN on the MultiWOZ dataset [1] as in Zhang et al. [14]. This dataset is widely used and it has been designed as a human-human task-oriented dialog dataset collected via the Wizard-of-Oz framework. One participant plays the role of the system. The dataset contains conversations on several domains in the area of touristic informa- tion (hotel, train, restaurant, taxi,...). Each domain has a set of dialog acts in addition to some general acts such as greeting or goodbye. Users’ and system’s interactions are described in term of these dialog acts. We focused on the restaurant domain of the MultiWOZ dataset that consists of 1200 dialogs for the training set, 61 dialogs for the testing set and 50 dialogs for the validation set. We used two different settings for the training set: (1) a small set of 150 randomly selected dialogs; (2) the full set of 1200 dialogs. These two settings are relevant to study the behavior of our system with few training examples. In order to simulate the delivering in production environment of a conversational agent, we modeled a state transition diagram, which describes the expected conver- sational behavior of the agent. The diagram is defined observing some conversational examples in the training set. For the evaluation we have two different models designed using two set of dialogs: the small model is designed using 5 training conversations and the medium model has been designed adding other 10 conversation examples to the small. From the diagram model we obtained two sets of rules: bs rules for the produc- tion of the belief state Bt and action rules for the production of the system action At . We also used bs rules in two different configurations, that is, with or without the use of constraint on the previous action At−1 and we used action rules in two different configurations, that is, with or without the constraint on the belief Bt . We evaluated CLINN and the DAMD architecture [14] to determine their ability to recreate the inner states: the action span At and the belief span Bt as we aim to verify that our model can control the flow in the dialog states. To evaluate the ability to replicate At , we used the F1-measure that is the harmonic mean of recall and precision of produced actions with respect to gold actions. For what concerns the belief span we used the Joint Goal Accuracy that is the percentage of turns in a dialogue where the 5 For an exhaustive description of the dialogue modeling please refer to [3] 173 user’s informed joint goals are identified correctly. Joint goals are accumulated turn goals up to the current dialog turn. Injection Type Action Span Belief Span System Rule Set Belief Action Action/Belief Train Set Test Set F1 joint goal DAMD gold 150 full 36.5 69.4 CLINN small no belief gold 150 full 39.8 71.9 CLINN small use belief gold 150 full 39.5 62.6 CLINN small no action gold 150 full 37.9 66.2 CLINN small use action gold 150 full 44.1 66.9 DAMD gold 1200 full 42.2 75.9 CLINN small no belief gold 1200 full 37.2 78.1 CLINN medium no belief gold 1200 full 47.2 82.4 DAMD gen 150 full 37.3 40.6 CLINN small no belief gen 150 full 39.6 54.3 CLINN small use belief gen 150 full 39.4 42.1 CLINN small no action gen 150 full 37.7 48.6 CLINN small use action gen 150 full 45.3 48.9 DAMD gen 1200 full 42.9 64 CLINN small no belief gen 1200 full 36.8 64.7 CLINN medium no belief gen 1200 full 48.8 69.4 DAMD gen 150 reduced 44.4 71.8 DAMD gen 1200 reduced 41.4 71.1 CLINN medium no belief gen 150 reduced 48.7 74.6 CLINN medium no belief gen 1200 reduced 53.4 84.5 Table 1. Comparison of the performances of DAMD and the CLINN system with different con- figurations. The type gold or gen in Action/Belief denotes if previous Action/Belief are taken from the ground truth (gold) or are generated by the system (gen). 3.2 Results and discussion The first set of the experimental results (Table 1 - Test Set ”Full”) shows that CLINN positively inject symbolic rules in sequence-to-sequence neural networks when training data are scarce. CLINN outperforms DAMD in nearly all the configurations when com- pared on the Action Span F1 and in some configuration when compared on the joint goal on the Belief Span. More importantly, CLINN seems to obtain interesting results in situations with data scarcity. With a small training set with 150 dialogs, one config- uration of CLINN outperforms DAMD of more than 7.5% on the Action Span F1 both in the gold setting (44.1 vs. 36.5) and in the gen setting (45.3 vs. 37.5). The increase in the joint goal for the Belief Span is less impressive in the gold setting where only one configuration – with rule injection type Action without using belief constraints – outperforms DAMD (71.9 vs. 69.4). Instead, the performance increase of CLINN in the joint goal is more stable in the gen setting. Moreover, the difference between DAMD 174 and the best system is more than 13% (54.3 vs. 40.6). Moreover, CLINN is an effec- tive model to include hand-crafted rules when the training set is relatively large. We selected the best configuration selected with the training set of 150 dialogs (Injection Type Action with no belief) and we experimented with 1,200 dialogs as training. By using a larger rule set, that is, the medium rule set, CLINN outperforms DAMD for the action spans and for the joint goal of the belief span in the gold and in gen setting. The second set of experimental results (Table 1 - Test Set ”reduced”) gives the im- portant indication that CLINN can help in controlling the behavior of dialog systems in specific and critical situations. The reduced test set is composed only with the conver- sations used for building the medium rule set (15 conversations). Although the DAMD model contains these conversations in the training set, its performance drops when in- creasing the training set. CLINN instead improves its performance of both metrics when the training set increases. Hence, CLINN offer a better stability for critical dialogs that are used to design rules. The two sets of experiments demonstrates the applicability of CLINN on industrial real cases. 4 Conclusions Critical industrial applications such as banking or medical applications impose impor- tant constraints on Conversational AI systems: data scarcity and need for certified di- alogs. We proposed Conversational Logic Injected Neural Network that allow to posi- tively include logical rules to control a sequence-to-sequence dialog manager. Our sys- tem shows a possible approach towards a more effective integration of neural network conversational AI in industrial applications. 175 References 1. Budzianowski, P., Wen, T.H., Tseng, B.H., Casanueva, I., Ultes, S., Ramadan, O., Gašić, M.: MultiWoz - A large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue mod- elling. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (2020) 2. Dinan, E., Roller, S., Shuster, K., Fan, A., Auli, M., Weston, J.: Widzard Of Wikipedia: Knowledge-powered conversational agents. In: 7th International Conference on Learning Representations, ICLR 2019 (2019) 3. Giannone, C., Bellomaria, V., Favalli, A., Romagnoli, R.: Iride R : an Industrial Perspec- tive on Production Grade End To End Dialog System. In: Proceeting of the Italian Confer- ence of Computational Linguistics (CLIC). Bari (2019), https://www.gartner.com/ smarterwithgartner/4-trends- 4. Henderson, M., Vulic, I., Gerz, D., Casanueva, I., Budzianowski, P., Coope, S., Spithourakis, G., Wen, T.H., Mrkšic, N., Su, P.H.: Training neural response selection for task-oriented dialogue systems. In: ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference. pp. 5392–5404 (6 2019), http://arxiv. org/abs/1906.01543 5. Lei, W., Jin, X., Ren, Z., He, X., Kan, M.Y., Yin, D.: Sequicity: Simplifying task-oriented dialogue systems with single sequence-to-sequence architectures. In: ACL 2018 - 56th An- nual Meeting of the Association for Computational Linguistics, Proceedings of the Con- ference (Long Papers). vol. 1, pp. 1437–1447. Association for Computational Linguis- tics (2018). https://doi.org/10.18653/v1/p18-1133, http://github.com/WING-NUS/ sequicity 6. Lison, P., Kennington, C.: OpenDial: A toolkit for developing spoken dialogue sys- tems with probabilistic rules. In: 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - System Demonstrations. pp. 67–72 (2016). https://doi.org/10.18653/v1/p16-4012, http://www.opendial-toolkit.net 7. Serban, I.V., Sordoni, A., Bengio, Y., Courville, A., Pineau, J.: Building end-To-end dialogue systems using generative hierarchical neural network models. 30th AAAI Conference on Artificial Intelligence, AAAI 2016 pp. 3776–3783 (2016) 8. Shah, P., Hakkani-Tür, D., Tür, G., Rastogi, A., Bapna, A., Nayak, N., Heck, L.: Building a Conversational Agent Overnight with Dialogue Self-Play (1 2018), http://arxiv. org/abs/1801.04871 9. Sordoni, A., Bengio, Y., Vahabi, H., Lioma, C., Simonsen, J.G., Nie, J.Y.: A hierarchi- cal recurrent encoder-decoder for generative context-aware query suggestion. In: Interna- tional Conference on Information and Knowledge Management, Proceedings. vol. 19-23- Oct-, pp. 553–562. Association for Computing Machinery, New York, New York, USA (10 2015). https://doi.org/10.1145/2806416.2806493, http://dl.acm.org/citation. cfm?doid=2806416.2806493 10. Vlasov, V., Mosig, J.E.M., Nichol, A.: Dialogue Transformers (2019), http://arxiv. org/abs/1910.00486 11. Wen, T.H., Su, P.H., Budzianowski Pawełand Casanueva, I., Vulić, I.: Data Collection and End-to-End Learning for Conversational AI. Proceedings of the 2019 Conference on Empir- ical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): Tutorial Abstracts (2019) 12. Williams, J.D., Asadi, K., Zweig, G.: Hybrid code networks: Practical and efficient end-to- end dialog control with supervised and reinforcement learning. In: ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). vol. 1, pp. 665–677 (2017). https://doi.org/10.18653/v1/P17-1062 176 13. Zanzotto, F.M.: Viewpoint: Human-in-the-loop Artificial Intelligence. J. Artif. Intell. Res. 64, 243–252 (2019). https://doi.org/10.1613/jair.1.11345, https://doi.org/10.1613/ jair.1.11345 14. Zhang, Y., Ou, Z., Yu, Z.: Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context. Proceedings of the AAAI Conference on Artifi- cial Intelligence 34(05), 9604–9611 (4 2019). https://doi.org/10.1609/aaai.v34i05.6507, www.aaai.orghttp://arxiv.org/abs/1911.10484https://www.aaai. org/ojs/index.php/AAAI/article/view/6507 177