Towards Continual Learning in Interactive Digital Assistants for Process Automation (Short Paper) Praveen Venkateswaran, Vinod Muthusamy, Yara Rizk and Vatche Isahagian IBM Research AI, Cambridge, MA, USA 1. Introduction Business process automation (BPA) has recently emerged as a multi-billion dollar industry, revolutionizing how businesses manage their processes 1 . It develops solutions for automated decision making, data management, and workflow optimization. Many of these solutions lever- age digital assistants that can speak the language of business users to make these automations more accessible to non-tech-savvy end-users [1, 2, 3]. In enterprise settings, conversational assistants may be used by teams of people to automate role-based tasks (e.g., in HR, sales, or accounting) and they may want to refine them over time to meet changing business needs. However, existing commercial offerings of digital assistants for process automation from companies like IBM, Microsoft, and UIPath provide a fixed set of automation capabilities. Expanding their functionality is often restricted to domain experts with programming skills. Hence, most end-users are unable to customize or create new process automation capabilities to suit their process management needs. This, in-turn, can result in user frustration and decreased adoption of process automation tools. One possible approach to increase adoption is to empower end-users to intuitively and naturally develop customized and improved automation capabilities from existing automations within these digital assistants by teaching them. Unlike the personal chatbots in [4], digital assistants in business automation settings may be more role-driven. Hence, they would be used by multiple end-users who need to customize them based on company processes, roles and team preferences. In this paper, we posit that the next generation of digital assistants within enterprise settings will be iteratively taught in-situ through intuitive multi-modal interactions dependent on the type of task and skill of the teacher(s). These assistants should be characterized by their ability to continually learn, customizability by end-users with minimal coding expertise, evolution with changing business processes, and generalizability to new process automation tasks by understanding their context. We also discuss some of the challenges of realizing these assistants, focusing on challenges in machine learning, process management, and interactive task learning. PMAI@IJCAI22: International IJCAI Workshop on Process Management in the AI era, July 23, 2022, Vienna, Austria © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop CEUR Workshop Proceedings (CEUR-WS.org) Proceedings http://ceur-ws.org ISSN 1613-0073 1 https://www.gartner.com/en/newsroom/press-releases/2021-04-28-gartner-forecasts-worldwide- hyperautomation-enabling-software-market-to-reach-nearly-600-billion-by-2022 Traditional offline refinement Iterative in-situ refinement Iterative in-situ refinement (separate Use and Build phases) (Mode 1: resume after failure) (Mode 2: restart after failure) Assistant v1 Assistant v1 Assistant v2 Assistant v1 User Use User Use Use User Use Failure Failure Failure Failure (triggered (triggered Resume task (logged) (logged) by guardrail by guardrail Teach or user) or user) Build Assistant v2 Restart task Teach Assistant v2 Dev Use Figure 1: An improved iterative in-situ approach for process automation by teaching digital assistants. 2. The Next Generation of Digital Assistants The traditional development cycle, in Figure 1 (left), splits the use and build phases of digital assistants for process automation. Run-time failures encountered by users are logged, and used by expert developers to eventually build and deploy new versions of the assistants with improved automation skills. This is a relatively longer process, which due to lack of developer resources, can result in developers prioritizing the creation of some skills and resolution of certain issues over others, leaving behind frustrated and disconnected users. We instead envision an iterative in-situ refinement of digital assistants, as in Figure 1 (right), where users are empowered to address automation shortcomings by teaching the assistant new capabilities. There are three aspects to this approach that differ from the traditional way that assistants are built. First, failures while using a digital assistant interrupt the user’s task and the system enters a teaching mode. These failures may occur due to developer or business-defined guardrails that limit the execution bounds for the assistant, or because of the user explicitly indicating they were dissatisfied with the assistant’s actions. For example, an assistant used to book a trip may only know what to do for domestic trips, and a guardrail can be triggered when a user tries to book an international trip. Second, the user is repairing a specific failure in the context of the task they were in. Since teaching happens in the context of the failures, it enables the user to quickly address the issue. We believe this is an easier way for non-developers to teach and refine the assistant. Third, the same user interacting the assistant is responsible for teaching the assistant how to handle the failure. This empowers users to evolve and customize the assistant according to their needs. There can be two variants of the iterative in-situ refinement approach: Mode 1 where the user resumes their task after repairing the failure; and Mode 2 where the user restarts their task with the new version of the automation. The latter may be appropriate if there have been no side effects so far, or if the automation can roll back or compensate the steps taken so far. This in-situ teaching mode needs to be intuitive to non-tech-savvy end-users by leveraging various input modalities tailored to automation tasks and users’ teaching abilities. For some tasks or users, it may be easier to “show” the assistant how to perform the task and thus requires screen recording capabilities. Other tasks may be more easily taught through natural language instructions. While [5] discuss multi-modality in the context of basic interactions, many of the concepts can be adapted for our envisioned teaching mode. Furthermore, some users may be more articulate than others in communicating their automation needs and hence, the assistant must identify when more iterations with the user are required to fill information gaps. 3. Challenges Realizing this next generation of digital assistants with multi-modal, iterative, in-situ teach- ing requires addressing various research challenges and integrating multiple technological advancements before successful deployment is possible. Metacognition: An important pre-requisite for learning in digital assistants is their ability to recognize and understand what they do, and do not know [6]. End-users may trigger multiple automation skills, often in specific sequences, making it critical for assistants to identify gaps in their capabilities. Without this, users may have to repeatedly teach the same skill, or assistants may try to execute tasks that they are incapable of. Natural language understanding: Understanding the underlying intentions of user inputs is critical for assistants to execute the right skills or determine when to get more information and learn. Analyzing inputs in isolation can result in misinterpretation since information may be provided through different modalities, combined across multiple inputs, or be missing altogether. Prior work [7] limits end-user vocabulary, which is not practical for process automation settings where users have different backgrounds, business needs, and expertise. For instance, the terminology to automate the same skill can differ between IT support and HR, and novice users may not articulate their needs as clearly as experts. Improving existing language models (e.g., BERT) is required to realise learning-enabled automation assistants, including the ability to interpret domain specific terminology and process model concepts, learn ontologies, interpret task signatures and inter-dependencies, etc. Interactive and informative experience: The teaching process can result in assistants behaving differently from user expectations (e.g., learning incorrect skills or misinterpreting key information), which can prove costly. Hence, it is important to provide informative responses reflecting the assistant’s understanding at every stage of the teaching process. This includes “failing gracefully” by defining mechanisms to provide interpretable error messages and seek appropriate user feedback, when users’ expectations are violated. The vast space of multi-modal responses makes generating the appropriate response a difficult problem. Learning new skills: Automation skills vary in complexity, scope, and usage, challenging assistants to learn new skills. Some may require incrementally updating to existing skills, or learning a model through user-provided input/output examples, or even synthesising new code from specifications, making it important to identify what and how to learn. Large teams of users may teach overlapping skills, requiring assistants to disambiguate. Also, new skills must be immediately available for use. Analyzing interaction logs to provide capabilities such as auto-complete, and continually updating skills with changing business and user needs are important capabilities that can make the assistant’s teaching model more intuitive. Customization and generalization: Existing assistants often require users to provide individ- ual inputs to execute each skill and its parameters, incurring significant overhead for repeated executions of large workflows. The ability to teach customizations such as macros to execute groups of skills with a single input, pre-filling process parameters, and modifying process execution should be enabled in assistants using efficient representations (e.g., user profiles). However, since skills will be used by multiple users with different business needs, assistants should be able to generalize the taught instructions and extrapolate to similar tasks. This will enable them to perform an ever-increasing number of tasks, require fewer instructions, and learn continually from subsequent interactions. Furthermore, new skills may be taught to the assistant by multiple users (with different “teaching” abilities) which requires identifying similar instructions and generalizing from them. While previous work focused on generalization in specific domains [8], expanding the scope is challenging to achieve and requires identifying entities, their values and the tasks that they can be incorporated in. Catastrophic forgetting: Teaching digital assistants new tasks runs the risk of conflicts between new capabilities and old ones or even “forgetting” how to perform previously known tasks. Existing literature [9] has studied this phenomenon in applications like transfer learning where neural networks evolved their learnt weights to a point where they were no longer able to perform previously learned tasks. In essence, the assistant would be replacing previous knowledge with new knowledge as opposed to combining and augmenting their understanding of the tasks. This would create a frustrating experience for end-users who may find themselves re-teaching the system previously learned tasks. Hence, to create a digital assistant that can truly learn from user interactions, we need to address this problem. References [1] J. Sen, F. Ozcan, A. Quamar, G. Stager, A. Mittal, M. Jammi, C. Lei, D. Saha, K. Sankara- narayanan, Natural language querying of complex business intelligence queries, in: Proc. Int. Conf. Management of Data, 2019, pp. 1997–2000. [2] S. Mani, N. Gantayat, R. Aralikatte, M. Gupta, S. Dechu, A. Sankaran, S. Khare, B. Mitchell, H. Subramanian, H. Venkatarangan, Hi, how can i help you?: Automating enterprise it support help desks, in: Proc. AAAI Conf. Artificial Intelligence, volume 32, 2018. [3] Y. Rizk, V. Isahagian, S. Boag, Y. Khazaeni, M. Unuvar, V. Muthusamy, R. Khalaf, A conver- sational digital assistant for intelligent process automation, in: Int. Conf. Business Process Management, Springer, 2020, pp. 85–100. [4] F. Daniel, M. Matera, V. Zaccaria, A. Dell’Orto, Toward truly personal chatbots: On the development of custom conversational assistants, in: Proc. 1st Int. Workshop on Software Engineering for Cognitive Services, 2018, pp. 31–36. [5] J. O. Kephart, Multi-modal agents for business intelligence, in: Proc. 20th Int. Conf. Autonomous Agents and MultiAgent Systems, 2021, pp. 17–22. [6] M. T. Cox, Metacognition in computation: A selected research review, Artificial intelligence 169 (2005) 104–141. [7] A. Azaria, S. Srivastava, J. Krishnamurthy, I. Labutov, T. Mitchell, An agent for learning new natural language commands, AAMAS 34 (2020) 1–27. [8] M. Nicolescu, M. Mataric, Natural methods for robot task learning: Instructive demonstra- tions, generalization and practice, in: AAMAS, 2003, pp. 241–248. [9] R. M. French, Catastrophic forgetting in connectionist networks, Trends in cognitive sciences 3 (1999) 128–135.