<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Improving Task-Oriented Dialogue Systems In Production with Conversation Logs</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
				<date type="published" when="2020-08">August 2020,</date>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Alon</forename><surname>Jacovi</surname></persName>
							<email>alonjacovi@gmail.com</email>
						</author>
						<author>
							<persName><forename type="first">Ori</forename><forename type="middle">Bar</forename><surname>El</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Ofer</forename><surname>Lavi</surname></persName>
							<email>oferl@il.ibm.com</email>
						</author>
						<author>
							<persName><forename type="first">David</forename><surname>Boaz</surname></persName>
							<email>davidbo@il.ibm.com</email>
						</author>
						<author>
							<persName><forename type="first">Inbal</forename><surname>Ronen</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">IBM Research Bar Ilan University</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="institution">IBM Research</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="institution">IBM Research</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="institution">IBM Research</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff4">
								<orgName type="institution">IBM Research</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Improving Task-Oriented Dialogue Systems In Production with Conversation Logs</title>
					</analytic>
					<monogr>
						<imprint>
							<date type="published" when="2020-08">August 2020,</date>
						</imprint>
					</monogr>
					<idno type="MD5">975D0A34D4D6AF72B7A0449212AE573F</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T05:52+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>dialogue systems</term>
					<term>task oriented</term>
					<term>closed domain</term>
					<term>virtual agent</term>
					<term>rule based systems</term>
					<term>machine learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this work we propose a solution to a significant limitation of task-oriented dialogue systems -their inability to learn and improve over time during deployment. Although current popular taskoriented systems are implemented as rule-based execution graphs, the available solutions for improvement incorporate neural network modules, either fully or partially, despite the poor performance of neural architectures for the task-oriented use-case. We present an algorithm to modify the graph-based system directly, in a manner which improves the system automatically and is simultaneously easy to understand by the system expert. To our knowledge, this is the first method of this type towards automatically improving a dialogue system's coverage in production, without additional explicit labels. Though the system is still evidential, our experiments already show promising results in its ability to usefully modify an existing dialogue system, while improving its coverage.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>CCS CONCEPTS</head><p>• Computing methodologies → Learning from demonstrations; Rule learning; Discourse, dialogue and pragmatics; • Humancentered computing → Natural language interfaces.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Escalation Node</head><note type="other">Our Solution</note><p>Is there an error message?</p><p>The error message is #666. Please restart your computer.</p><p>That worked! ??? I don't know how to handle that! I'll escalate it to a human agent.</p><p>Figure <ref type="figure">1</ref>: Example of an escalation log and how we adopt it in our solution. The dialogue system fails, causing an escalation to a human who resolves the case; The system then learns from the human's response for similar cases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Dialogue systems, or virtual assistants, are automated systems for interacting with users through a natural language interface. Taskoriented<ref type="foot" target="#foot_0">1</ref> dialogue systems are not only concerned with maintaining coherent interaction with another party (e.g., chit-chat agents, or chatbots), but also leading the interaction towards some goal <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b10">11]</ref>. These systems have a variety of useful applications, such as customer support <ref type="bibr" target="#b34">[35]</ref>, restaurant or hotel reservation <ref type="bibr" target="#b23">[24]</ref>, online shopping <ref type="bibr" target="#b33">[34]</ref>, and many others.</p><p>Recent advances in Natural Language Understanding (NLU), via neural networks, have shown promise to facilitate drastic improvements in such virtual task-oriented dialogue agents <ref type="bibr" target="#b0">[1]</ref> -as a major bottleneck in the past has been correct interpretation of the user's natural language utterances. However, the scope of these dialogue systems is still limited by their inability to handle new types of interactions after deployment (e.g., new software product in IT support, or new categories in online shopping) <ref type="bibr" target="#b15">[16]</ref>.</p><p>The dominating task-oriented dialogue systems follow a rulebased architecture where machine learning NLU techniques interpret the user utterances (Figure <ref type="figure" target="#fig_0">2</ref>), with an execution graph backbone for the dialogue path management <ref type="bibr" target="#b7">[8]</ref>. Modelling such a system requires expertise in both the backbone system and the domain the system is planned to operate in (i.e., the concrete use case). This combined knowledge of both the use-case and the system engineering is rare, and requires training. Consequently, as the dialogue management system is rule-based, improving the system's performance based on post-deployment usage requires manual updates by such an expert, as well.</p><p>Often, the dialogue management backbone is based on a dialogue graph (Figure <ref type="figure" target="#fig_2">3B</ref>). Each node in the graph represents a dialogue state, and each edge a possible transition from one state to another according to the user's utterances and the condition derived from it by the NLU system (Figure <ref type="figure" target="#fig_0">2</ref>). Changing the dialogue system's behavior involves altering the dialogue system's structure and transition table. But how can we acquire supervision for the changes necessary for these improvements?</p><p>Towards this end, we point to a key property of our use-case: Virtual assistants which are the topic of this work are deployed as part of customer support centers. They work in tandem with a fallback to human agents in cases of failure -as a way of maintaining a sufficient service level to customers (users). At any point during the virtual assistant to user interaction, a failure can occur, either when the virtual assistant detects its inability to continue, or when the user directly requests the escalation to a human agent. In these cases, the human agent will assume control of the interaction to properly assist the user. Naturally, a record of such interactions is collected during the deployment of the support system, and is used by an expert to manually modify and improve the automatic dialogue system. We refer to these records as escalation logs, detailing interactions where the dialogue system assumed initial control, subsequently failed, and control was escalated to a human agent to resolve the case (Figure <ref type="figure">1</ref>).</p><p>In this work, we propose to leverage these escalation logs for completing missing functionality in the dialogue system automatically, by introducing new nodes to the dialogue execution graph. A notable attribute of the dialogue systems discussed in this work, based on execution graphs, is their human-readability, as they are easy to read and understand by humans (since they are actively designed by humans). Thus, modifying them automatically requires maintaining the system's human-readability by proposing modifications which are also rule-based. This enables the dialogue system developer to thoughtfully handle these updates -adapt them and alter them as necessary. As these systems are designed to be deployed and serve a large sector, this will allow the developer a sufficient degree of confidence in the automatic modifications to allow their usage in production. We are addressing this aspect in the design of our algorithm and assess some readability measures of its results.</p><p>The contributions of this paper are three-fold: First, we formulate the node-completion problem for the dialogue execution graph based on escalation logs; Next, we propose a method for automatically deriving node transition rules based on user-to-human NLU system: connection_error = TRUE Figure <ref type="figure" target="#fig_0">2</ref>: A schema for one step (response to a user utterance) in the dialogue system <ref type="bibr" target="#b7">[8]</ref>. Following the user's utterance, the NLU system interprets it to derive various values and flags. This serves the dialogue system to decide on the response.</p><p>escalation logs; Finally, we present an automatic evaluation setup in order to assess the quality of the suggested updates to the solution, which can also serve other future dialogue system methods in this area.</p><p>The rest of this paper is structured as follows: In Section 2 we provide background on different types of dialogue systems and scope the discussion to the more prevalent type we deal with in this paper. Then, in Section 2.2 we establish the importance of improving such dialogue systems based on post-deployment execution logs. In Section 3 we introduce our solution for automatically improving these systems by means of learning from logs, a solution which we provide implementation details for in Section 4. We evaluate our solution in Section 5 and sum up with a short discussion and conclusions in Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">BACKGROUND: IMPROVING DIALOGUE SYSTEMS IN PRODUCTION</head><p>We give a brief overview on learning-based methodologies for improving and updating dialogue systems without manual annotation by an expert.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Terminology and Notation</head><p>Execution Graph Dialogue System (Figure <ref type="figure" target="#fig_2">3B</ref>). We focus on the prevalent dialogue systems where the system is a directed "execution graph", in which each node edge represents a binary decision function (or condition) and an action. The decision function, based on the current state of the environment (conversation), results in a decision on whether to perform the action. If so, a change in the environment is observed as a result of the action, and the execution flow proceeds to the children of the node, in a pre-defined order. If the condition is not satisfied, the action is not performed, and the execution flow proceeds to the next sibling of the current node. The action to perform may be a communication with the user, or a concrete action to perform to help the user, and the observable result will be the user's response to the action.</p><p>Escalation Logs (Figure <ref type="figure">1</ref>). The core supervision to drive learning in production is collected in escalation logs -logs of interactions where the deployed system assumed initial control of handling the case, and subsequently it failed to complete the goal of the interaction. This resulted in escalation of the case to a human agent, who properly handled the case to its conclusion. In this work, we propose a method to utilize the human agent's handling of the dialogue system's failure in order to improve the dialogue system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Motivation</head><p>In this section we elaborate on the core motivation behind this work -namely, the answer to the question: Why is it valuable to develop a method of updating dialogue systems after their deployment? We give two central answers, detailed below.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1">Distribution Shift Over Time.</head><p>The main motivation is simple indeed, and uncontroversial: Even in the event where the initially manually designed dialogue system is perfect for its use case, as time goes by and new capabilities are required, we would like the system to be able to manifest them automatically. This motivation also shares common themes with the areas of lifelong machine learning <ref type="bibr" target="#b26">[27]</ref> and never-ending learning <ref type="bibr" target="#b6">[7]</ref>.</p><p>As an example, consider the case of a technical customer support virtual agent -which attempts to help incoming users with technical issues and requests regarding a specific software product. The virtual agent, although properly designed at deployment time, must be continuously augmented with additional information to reflect updates in the software product, as these updates introduce new capabilities and issues.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.2">Reference</head><p>Logs Are Naturally-Occurring. Another key motivation relates to the ease of obtaining these reference escalation logs. Evidently, the system has been expertly designed to be used in some practice, and thus, it will be deployed. As a result, instances of escalated conversations where the bot has failed will be gathered. These reference conversations can be considered "free": they will exist during production phase by default, and if they can be utilized, no additional effort is necessary to gather supervision for the improvement of the deployed system.</p><p>Unfortunately, as explained in Section 2, there is currently no method available for making use of this supervision to improve a non-neural dialogue system (the prevailing type of virtual agents in task-oriented settings). In other words, there exists a gap between the relative ease of obtaining reference supervision for the improvement of the currently deployed solution and the lack of available techniques to make use of it.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Related Work</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.1">Execution Graph</head><p>Solutions. An execution graph <ref type="bibr" target="#b17">[18]</ref> is one of the most popular methods for modeling task-oriented dialogue systems. The vast majority of solutions of this type are created manually by an expert <ref type="bibr" target="#b7">[8]</ref>, and to our knowledge, after being deployed, they are either static, or manually updated by an expert. One notable exception is by Volkova et al. <ref type="bibr" target="#b30">[31]</ref>, which attempts to create an initial graph-based model by using explicit naturallanguage instructions on how the execution graph should act. This method can be used to update the graph by redoing the process with additional instructions. Additionally, <ref type="bibr" target="#b22">[23]</ref> have proposed a system designed for multi-domain sets of slot values in order to remain scalable to new domains of conversations (we elaborate on slots later).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.2">Neural</head><p>End-to-End Solutions. Recent advances in deep learning has caused a surge in proposed neural solutions for dialogue systems in the open-domain chit-chat setting <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b20">21]</ref>. Unfortunately, although these end-to-end models can be improved relatively easily using reference conversation logs, current solutions are ill-equipped to deal with the challenging setting of task-oriented conversation -where the automatic solution must achieve some purpose at the end of the interaction, via a natural language interface and performing actions -and the insufficient quantity of data which can be gathered <ref type="foot" target="#foot_1">2</ref> . Typically these neural solutions involve a component of generating responses <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b31">32]</ref> or ranking and retrieving them from data <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b28">29,</ref><ref type="bibr" target="#b32">33]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.3">Hybrid Solutions.</head><p>As previously mentioned, neural models under-perform in task-oriented settings. However, the standout quality of these models is their ability to learn by their design from reference conversation logs. As such, hybrid models have been proposed to combine the strengths of an execution graph backbone with a neural fall-back which can learn to adapt and improve after deployment. For example, Tammewar et al. <ref type="bibr" target="#b27">[28]</ref> propose a hybrid model in which every decision of the execution graph has a neural fall-back in case of no appropriate response.</p><p>Although these models are indeed able to learn from escalation logs after deployment, in truth the only component which is able to learn is the neural model. As mentioned before, these models are as of yet unconvincing in their ability to uphold the task-oriented usecase -due to their inability to rigorously conform to completing the goal of the conversation, and requiring a significant amount of data to learn on any level.</p><p>Another alternative to the neural fall-back is a hybrid model that offers redirection of the misunderstood utterance to a search engine and returning its result, relying on an up-to-date search index such as the search skill described in <ref type="bibr" target="#b25">[26]</ref>. However, a search user experience is substantially different from a conversation one.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Conclusion</head><p>We have discussed three possible solutions for task-oriented systems, and their ability to learn automatically from reference logs after deployment. Specifically, while execution graph-based models are the most robust solutions, they are also rigid and require updates by a manual expert to be continuously improved. Neural models go to the other extreme, and are able to learn freely at any point by optimizing their performance against reference logs. However their overall performance at the task-oriented use-case is severely lacking in comparison to the execution graph based models.</p><p>In order to bridge the gap, hybrid models have been proposed to embody the best of both worlds, such that they employ an execution graph backbone and a neural fallback in case of failure. However, the only component which is able to learn and improve in these models is the neural component -which is anyway of negligible value in the overall usefulness of the model -and so they suffer the same issues as all of the previous solutions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">OUR SOLUTION</head><p>We elaborate on our proposed solution in order to concretely improve an existing execution graph dialogue system, by using reference escalation logs, obtained after deployment of the existing virtual assistant.</p><p>The procedure is conceptually divided into five steps. At the end of the procedure, the algorithm recommends new edges and nodes (composed of decisions and actions) to be integrated into the execution graph currently in production. These new nodes can be integrated as-is into the execution graph, to be evaluated in a test environment, or they can be verified by an expert before being integrated in order to guarantee their relevance before deployment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Step 1: Gathering Failure Points</head><p>As mentioned in Section 2.2, to update the existing execution graph, we utilize escalation logs obtained following its deployment.</p><p>(1) The before-escalation section of the log describes the dialogue between the user and the dialogue system and ends at a failure point. A failure point in a conversation is the point where the control is escalated to a human agent. This conversation corresponds to a single path in the dialogue execution graph, terminating at some node we refer to as the escalation node -a graph node from which some failure points escalated to a human agent. Figure <ref type="figure" target="#fig_2">3A</ref> illustrates a single escalated conversation. The dialogue system understood that the user wishes to transfer money and escalated to a human agent in the next node. (2) The after-escalation section of the log describes the interaction from the failure point on, occurring between the user and the human agent. Since this part of the conversation is external to the dialogue system, there is no path corresponding to it in the execution graph (Figure <ref type="figure" target="#fig_2">3B</ref>).</p><p>Our goal is to derive new nodes to attach to the execution graph at the escalation node, so that failure points corresponding to that node, occurring in multiple conversations, will be handled, or at minimum delayed by an addition step in the execution graph. For a single conversation we look at the execution path up to the escalation node, and at the first response of the human agent after the failure point. In order to generalize we gather multiple conversations that were escalated at that specific escalation node. We thus obtain a set of conversations along with their matching path up to the escalation node, and the appropriate response for this conversation as given by the human agent. We refer to these responses as gold responses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Step 2: Clustering Gold Responses Into Response Types</head><p>Given the collection of human agent responses we obtained in the previous step, it is necessary to divide this collection into categories: Although all of these conversations passed through the same escalation node in the execution graph, they have each possibly originated from different paths, and thus each of the human agents' responses may be different based on the context of the interaction.</p><p>For this reason, we cluster the human agent responses into response types based on semantic similarity. Figure <ref type="figure">4</ref> illustrates clustering of multiple conversations based on the agent's responses into 3 response types.</p><p>In the case of textual responses, we utilize a neural model to encode the text in a continuous embedding space <ref type="bibr" target="#b13">[14]</ref> for clustering. The clustering algorithm attempts to divide the human agents' responses into different response types.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Step 3: Affixing Actions to Response Types</head><p>Each response type will be attributed by a concrete action -such as a text message, value retrieval from a database, and/or miscellaneous actions. This representative action can be derived in one of multiple possible methods:</p><p>(1) The action can be chosen by some metric (such as quantity of similar occurrences in the cluster) from among the actions in the response type.</p><p>(2) The action can be chosen as the closest response to the centroid of the cluster (Figure <ref type="figure">5</ref>). (3) In the case of a text message, the response can be generated via some text generation component by utilizing the collection of text responses in the cluster for the generation process. <ref type="foot" target="#foot_2">3</ref></p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Step 4: Deriving Boolean Conditions</head><p>Our next goal is to derive boolean conditions that will correctly map a conversation to its response type, and trigger the chosen action.</p><p>In dialogue systems that use an execution graph as their dialogue management backbone this is equivalent to adding one node per response type with a decision function that takes the dialogue state and context as its input.</p><p>In Figure <ref type="figure" target="#fig_5">6</ref> we illustrate eight conversations clustered by the agent's response into three response types. Each cluster is marked by a different type of line (solid, dashed, dotted). Within each cluster, every conversation holds its own different dialogue state captured when the conversation passed through the escalation node. The table illustrates the state of each conversation represented as a set of features, together with the assigned cluster for each conversation. A decision function is then learned, taking the dialogue state as input to discriminate between the three clusters. In the illustration we can see three boolean conditions taking into account the payment amount and customer VIP flag to differentiate between the three response types based on the dialogue state. Note that the boolean conditions ignore the account number feature.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">Step 5: Recommending New Nodes</head><p>At the final step of the procedure, various nodes are derived to model the responses of human agents at various failure points. This step attempts to rank these nodes so that only a confident subset of the suggested nodes will be recommended for integration in the deployed dialogue system. This is done for two reasons:  Step 2: Clustering Gold Responses (1) By choosing a specific amount 𝑘 of nodes as the top-𝑘 nodes in the recommendation ranking, the balance between precision and recall can be controlled: It is up to the expert to prioritize quality of responses at the failure points versus the potential coverage of failures. (2) In the event that the expert will be interested in verifying the suggested nodes before they are integrated in the deployed dialogue system, to guarantee their validity, the procedure must filter the nodes by confidence to alleviate the workload of the expert.</p><p>We consider the quality of the suggested nodes (and specifically their conditions) via several heuristics that conform to notions of human-readability <ref type="foot" target="#foot_3">4</ref> for two main purposes: (i) Decision functions that are easier to understand will be preferred, as the expert may still attempt to understand them and verify their functionality to gain confidence in their integration in the deployed product; (ii) The human-readability of the boolean conditions can be viewed as regularization to mitigate overfitting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6">Solution Summary</head><p>We propose a five-step procedure for improving an execution graph's ability to handle failure points by using escalation logs as the source of supervision. To our knowledge, this is the first method of this type towards automatically improving a dialogue system's coverage after deployment, without labels that require external feedbackoutside of the already available escalation logs -and without manual annotation by an expert. As mentioned, the procedure requires a collection of escalation logs and results in a set of new nodes to be integrated in the current dialogue system's execution graph. These nodes are ranked by some metric, and can be further verified by an expert with minimal overhead to guarantee their behavior for a deployed model. At the end of the integration of the new nodes, the execution graph will be able to progress an additional step beyond what were considered its failure points previously, thus increasing its coverage. Once the new execution graph is deployed, more escalation logs can be gathered to iteratively improve the system by repeating the procedure.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">IMPLEMENTATION</head><p>To verify our suggested approach we implemented each of the 5 steps in our solution on top of IBM Watson Assistant (WA) <ref type="bibr" target="#b12">[13]</ref>. However, we stress that while we exemplify our approach on top of IBM Watson Assistant, it can be comfortably generalized to other popular competing execution graph systems, such as Google Dialogflow <ref type="bibr" target="#b24">[25]</ref> and Microsoft Bot Framework <ref type="bibr" target="#b4">[5]</ref>.</p><p>WA uses an execution graph as its dialogue management backbone. The graph is designed by the system's author such that at each visited node, the system interprets a user's utterance in the context of the current conversation using natural language understanding and chooses the appropriate transition to the next node based on the execution graph design and the current dialogue state.</p><p>The dialogue state is encoded with a set of contextual variables characterizing the user's intents (e.g., opening a new account), identity (e.g., account number or country of origin) and relevant details from the user utterances (dates, times, names, etc.). Some of these contextual variables are extracted by WA automatically from the user's utterances and others are "injected" from outside the system (e.g. the account number of the logged in user). Additional variables can be calculated based on the values of exitsing ones during the conversation.</p><p>Each node in WA's execution graphs contains a boolean condition over the set of contextual variables and an action (e.g., a system response). When the system arrives at a specific node during a conversation, the next action in the conversation is chosen to be the action attached to the first child node whose condition is satisfied.</p><p>Below we describe our implementation in accordance with the 5 steps of Section 3):</p><p>(1) Step 1: Gathering Failure Points. Escalation nodes in WA's execution graph are nodes from which dialogues were escalated to a human agent. These nodes are in fact sink nodes for all points in conversations that did not satisfy any condition of the children of the current node. For each escalation node we gather all conversations that were escalated in that node. <ref type="bibr" target="#b1">(2)</ref> Step 2: Clustering Gold Responses Into Response Types. We first embed the agent's response following the escalation in a continuous space. For this purpose we use BERT <ref type="bibr" target="#b11">[12]</ref> based embedding. Specifically, we use the [CLS] token which is the output of employing the BERT model over the responses.</p><p>We then cluster the resulting vectors using the Mean Shift <ref type="bibr" target="#b9">[10]</ref> clustering algorithm<ref type="foot" target="#foot_4">5</ref> . (3) Step 3: Affixing Actions to Response Types. Each node in the execution graph is the combination of both an entry condition and an action to follow. Each cluster from the previous phase is associated with a centroid. For the recommended nodes' actions we use the human response of the nearest neighbor to the centroid inside the cluster. (4) Step 4: Deriving Boolean Conditions. Every point in the conversation is associated with a dialogue state constituting a feature vector defined by the values of its contextual variables. For each cluster obtained in step 2, we train a binary decision tree classifier over the dialogue state at the escalation node. The label of each conversation is 1 (positive) if the decision tree associated it with the cluster, and 0 (negative) otherwise. Specifically, we used the implementation offered by scikit-learn <ref type="bibr" target="#b18">[19]</ref>. This decision tree is then converted into a boolean expression by collapsing sibling sub-trees as or and collapsing parent-children sub-trees as and. Optionally, the decision tree or boolean expression can be pruned or simplified to increase generalization and readability <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b16">17]</ref>. We implemented pruning using the min-leaf-size parameter of scikit-learn. Notably, the decision trees are trained to classify between a given cluster and all other clusters, mitigating any issue with order-dependent movement along the execution graph. <ref type="bibr" target="#b4">(5)</ref> Step 5: Recommending New Nodes. Clustering high-dimensional vectors is likely to result in a long tail of very small clusters pertaining to outlier responses. To mitigate this, we bound the minimum size of a cluster (as a percentage of the number of responses) to be considered for new node recommendation. Our recommendations constitute 𝑘 nodes resulting from the 𝑘 largest clusters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">EVALUATION</head><p>Qualitative evaluation of dialogue systems, and particularly taskoriented systems, is a very challenging open problem <ref type="bibr" target="#b10">[11]</ref>. Deriu et al. <ref type="bibr" target="#b10">[11]</ref> emphasize the need for automated evaluation methods as collecting human judgement for the quality of a dialogue system is laborious and costly. To this end, we devised an automated evaluation method for our solution which does not involve measurement of the dialogue system performance during deployment, but rather utilizes current dialogue systems' logs without the escalation to human agents for evaluating the method itself. Instead of adding a new node and evaluating its quality, we take an existing dialogue system as reference and destructively modify it by choosing a node (which we refer as simulated escalation node) and removing all its outgoing nodes (descendants) from the execution graph. We then use our method to predict the removed outgoing nodes, and compare the behavior of the system prior to the removal with its behavior after adding the predicted nodes. We then measure the quality of our recommendations in an automated manner. A "high quality" node should capture a previously unhandled case, properly act upon it, and be human-readable. Our automatic evaluation is based on the following observation: in the original (unmodified) graph, the removed nodes induce a partition of the conversations that went through the simulated escalation node. We call this partition the reference partition.</p><p>Similarly, the predicted nodes induce a partition on the same set of conversations. The nodes' conditions and execution order may not necessarily resemble the original ones, but the functionality of the system should be preserved. This preservation can be measured by the level of similarity of the two partitions, the one induced by the removed nodes, and the one induced by the predicted nodes. Our simulated escalation node can be viewed as an escalation node in the human agent escalation case. Once we remove the outgoing nodes, we consider only the conversation log before escalation, ignoring the dialogue state and the continuation of the paths in the execution graph.</p><p>We also use the original node conditions to assess the quality of our recommendations for example by comparing the length of the recommended conditions to the original ones in terms of the number of variables in the condition.</p><p>Our experiments include two evaluation methods: (1) Automatic evaluation of our solution to assess the quality of the partition and the readability of the conditions. We experiment with different hyperparmeters and implementations of the components in our solution. This evaluation is performed on an internal dataset, using our method for simulating escalation nodes. (2) Human evaluation of the recommended conditions and clustering. This evaluation is performed on a public dataset.</p><p>The evaluation method proposed in this paper is standalone, and is neither contingent upon the dialogue system nor the embedding, clustering, and condition inference techniques.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Datasets</head><p>For our evaluation we use different datasets for each evaluation method. For the automatic evaluation we use an internal real-life (non-public) dataset from the banking domain. The dataset includes 7605 real-world conversations of users with a WA dialogue system without escalations to human agents during a period of 10 days of operation. Each conversation includes an average of 6.05 turns between a user and the dialogue system. The execution graph includes 135 intents with 62 entities. It has 1528 nodes and an average depth of 2.59. The dataset handles several customer service issues, such as opening a new account and transferring money. We use this dataset by simulating escalation nodes as explained above. We consider only escalation nodes with at least 50 conversations passing through them. This results in a total of 39 escalation nodes with an average of 535.98 conversations passing through each of them (stdev: 760.12, min: 55, max: 3386) and an average of 2.46 child nodes each. The feature vector used for training the decision tree in step 4 of our solution includes 1070 features.</p><p>In order to experiment with a different type of data, which reflects a prevailing use case of task oriented dialogue systems, we use the MultiWOZ dataset. <ref type="bibr" target="#b5">[6]</ref>. The dataset contains 10,000 conversations of humans in multiple domains (including hotels, taxi and restaurant booking). Each conversation in MultiWOZ is labeled using contextual variables similar to those of WA. Moreover, each agent response is labeled with the actual agent's action. For example, many agent responses ask the user, in different ways, to specify a certain area. All these responses are labeled as an "area" action in the dataset. Despite the fact that MultiWOZ does not include a built-in backbone execution graph, we simulated the state of conversations by querying the agent actions' labels.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Experimental Setup</head><p>As we noted earlier, our solution is to the best of our knowledge the first to tackle the problem of improving dialogue systems' coverage in production, without explicit external feedback. We thus have no baselines to compare our solution to.</p><p>Our solution contains (in step 4) a decision tree (DT) classifier. We compare it to reference solutions employing other classification models -Random Forest (RF) and the state-of-the-art XGBoost (XGB) <ref type="bibr" target="#b8">[9]</ref>. Note that both of these models do not fit our complete solution, as they do not offer an interpretable mechanism from which node conditions can be derived. Nevertheless we use these references as an unrealistic upper-bound for the classification part.</p><p>To evaluate various aspects of our solution we experimented with different values of the the hyper-parameter 𝜏 in the decision tree and random forest models defining the ratio of minimum number of samples required to be at a leaf node.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Automatic Evaluation</head><p>In this section we detail an experimental setup for automatically evaluating our solution. These automatic methods allow a straightforward verification of the effectiveness of our solution.</p><p>We use the following evaluation metrics: (1) Adjusted Rand Index (ARI). <ref type="bibr" target="#b21">[22]</ref> To evaluate the partition induced by our model's recommended conditions, we use the ARI between the recommended partition and the gold reference partition, which measures the level of similarity between the two clusterings. (2) Clustering Coverage. Ratio of failure points that were eventually mapped to one of the response type clusters. Note that our solution does not require that every failure point is mapped.</p><p>(3) #Child Nodes. Compares the number of recommended nodes to the original number of nodes in the execution graph. (4) COND-Length. Evaluates the level of readability of the conditions in our solution (this is relevant only for the decision tree model). We compare the length of the recommended conditions of the nodes to the original conditions of the nodes in the execution graph. The length is calculated by the number of variables in the condition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.1">Results</head><p>. We evaluated different versions of our model and different reference classifiers as mentioned in Section 5.2 over the banking dataset as shown Table <ref type="table" target="#tab_1">1</ref>.</p><p>In spite of the decision tree being the weakest classification model in our comparison, it outperformed all other models in terms of ARI. Moreover, in contrast to the random forest model, the decision tree got consistently high ARI scores independently of 𝜏. The clustering coverage of all variants was above 0.9, with the decision tree model only slightly worse than the other models. Our decision tree solution also outperformed the other models in terms of the number of child nodes, being closest to the expected average number of nodes in the dataset, 2.46. Regarding the condition length measure, only a high value of 𝜏 achieved conditions with length close to the original length of the conditions. However, our experiments showed that this metric tended to have a high variance due to extreme outliers. These outliers were conditions corresponding to "outlier clusters" of all conversations that did not map to any of the other clusters. When discarding in step 5 all nodes with conditions of length ≥ 10 with 𝜏 = 0.01, our coverage of conversations decreased to 95% of the original clustering coverage. In this case the average condition length was only 1.88. As expected, the lower 𝜏, the more aggressive our pruning becomes, which results in less number of child nodes, shorter conditions, but also lower clustering coverage.</p><p>Figure <ref type="figure" target="#fig_6">7</ref> shows the distribution of the Adjusted Rand Index (ARI) for the decision tree for 𝜏 = 0.01. Our solution achieved high ARI scores for most of the escalation nodes. Note that in our scenario the number of child nodes is quite small (as can also be seen in Table <ref type="table" target="#tab_1">1</ref> in comparison to the number of conversations that are clustered (at least 50). This fact sometimes results in low ARI scores (and even a score of 0) and is a known drawback of ARI <ref type="bibr" target="#b29">[30]</ref>. Nevertheless, we use the ARI measure as it is the de facto standard to estimate the level of similarity between two clusterings. Note that our findings are consistent for all decision tree configurations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4">Experimenting with human-to-human logs</head><p>Our proposed solution for suggesting conditions assumes a dialog graph as a backbone model. We are aware that this is not the only dialog system backbone representation possible and that humanto-human conversation logs may not reflect any backbone at all. Yet, we wanted to both evaluate our solution on human-to-human logs, and extend the method so we can learn such a backbone from human-to-human logs. We started with the modest task of recovering conditions for single nodes.</p><p>To this end, we used the MultiWOZ dataset which contains both the dialog utterances, and context variables extracted throughout the conversation by human annotators and is aligned with each turn in the conversation. We simulated a single node by collecting all agent utterances asking for a specific detail based on annotations  supplied with the dataset. In particular in the MultiWOZ hotel booking scenario, we use the action annotation "area" to collect all utterances where the agent asks about the booking area. We declare all turns in this collection as if they are assigned to the same simulated node, e.g. "ask area node". Our task then is to create additional nodes corresponding to actions taken in the consecutive turn following that node in different conversations, and to recover the conditions to be used for directing a dialogue system towards the correct action.</p><p>The actions taken consider the user's answer and the context of the conversation so far. For example, one action could be to ask for more constraints from the user such as hotel grade, another could be to suggest a small set of specific hotels matching the user's constraints supplied so far, and a third option could be to ask the user to relieve a constraint because no hotels matching the constraints were found. Applying our solution, it clusters the agent responses to their types, and then discovers the condition, based on the context of the conversation and the user's response, that would lead to each of these types.</p><p>We created such simulated nodes and found that clustering the agent responses resulted with a small number of clusters, and the corresponding conditions turned out to be long and hard to interpret. Inspecting them we saw that they consist of conjunctions of clauses connected with an "or" operator. This reflects multiple, and sometimes disjoint paths reaching our simulated node, with very different contexts leading to the same node. We suspect this is due to the slot-filling nature of the MultiWOZ dataset, where different combinations of filled slots lead to the same question asking for a specific slot not filled yet. This result led us to add a calculated context feature, counting the number of hotels that satisfy all constraints set by the filled slots so far. Adding this feature yielded clear conditions that separate the conversations into distinct actions based on this feature.</p><p>While we saw that following our solution in this hotel booking use case data set resulted in hard to interpret conditions, the process taught us how to analyze conversations with respect to the context variables, and come up with an extra variable that may lead to an interpretable condition, albeit some additional manual analysis. A complete fully automated solution would probably employ means to automatically detect these missing variables for example by analyzing agents' actions which could be queries to an external back-end system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">CONCLUSION AND FUTURE WORK</head><p>We presented a method to automatically improve goal-oriented dialogues after deployment. The method offers a way for ongoing learning, utilizing the data that is collected in the customer care center. We challenge a fatal limitation of deployed task-oriented dialogue systems: These systems, while initially useful, cannot improve during production without manual updates by an expert. Previous methods have attempted to incorporate learning into the systems via neural network fall-backs, which has shown to be an ineffective band-aid solution, as neural models have little guarantee to the correctness of their behavior, and are seldom deployed in practice.</p><p>We propose a five-step procedure, which can be employed on a deployed system and uses conversation logs collected during run time. These logs named "escalation logs" include interactions where the dialogue system assumed initial control, subsequently failed, and control was escalated to a human agent to resolve the case. Our procedure yields an improved version of the system, where the modifications fulfill additional behaviors in cases where the system failed to provide a satisfactory response.</p><p>Future Work. This research is aimed to help in real customer care environments in which human agents and virtual assistants work in tandem. We propose a first step towards relieving the need of manual expert annotations for the improvement of the system. Future work on this topic will naturally involve a thorough evaluation in a production setting, where the system is deployed, improved, and evaluated for its quality in comparison to the previous version. This procedure can be repeated multiple times to iteratively improve the system.</p><p>The MultiWOZ dataset poses a real-life scenario of slot filling, in which a user needs to provide several slots of information before the system can respond. The system will then consider the entire context, e.g. all slots filled so far, the new value from the current user utterance, and evaluate the current state to decide on an action. This dependency between the system response and the anticipated result of its action (based on slot value filled and the system state), makes the prevalent slot filling case a challenging scenario for our clustering step, which needs to take into account not only the agent response but also the context of the conversation, the current user utterance and the state of the system. On top of the calculated feature we suggest in the paper, we plan an in-depth analysis of such cases in future work.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>2 I</head><label>2</label><figDesc></figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Step 1 of our solution (see Section 3.1).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Step 3 :Figure 4 :Figure 5 :</head><label>345</label><figDesc>Figure 4: Step 2 of our solution (see Section 3.2).sStep 3: Affixing Actions to Response Types</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Step 4 of our solution (see Section 3.4). "Decision function" refers to boolean condition.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Distribution of ARI over escalation nodes in Banking Dataset for the decision tree with 𝜏 = 0.01.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 :</head><label>1</label><figDesc>Automatic evaluation results for the Banking Dataset (averaged over all escalation nodes). The values in parentheses refer to the original graph</figDesc><table><row><cell>Model</cell><cell>𝜏</cell><cell></cell><cell cols="4">ARI Clustering</cell><cell cols="2">#Child</cell><cell>COND-</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">Coverage</cell><cell></cell><cell cols="2">Nodes</cell><cell>Length</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">(2.46)</cell><cell>(1.40)</cell></row><row><cell></cell><cell cols="3">0.001 0.68</cell><cell></cell><cell>0.95</cell><cell></cell><cell cols="2">2.79</cell><cell>8.05</cell></row><row><cell>DT</cell><cell cols="2">0.01</cell><cell>0.71</cell><cell></cell><cell>0.95</cell><cell></cell><cell cols="2">2.58</cell><cell>6.32</cell></row><row><cell></cell><cell cols="2">0.05</cell><cell>0.64</cell><cell></cell><cell>0.92</cell><cell></cell><cell cols="2">2.02</cell><cell>1.64</cell></row><row><cell></cell><cell cols="3">0.001 0.68</cell><cell></cell><cell>0.97</cell><cell></cell><cell cols="2">4.97</cell><cell>-</cell></row><row><cell>RF</cell><cell cols="2">0.01</cell><cell>0.47</cell><cell></cell><cell>0.95</cell><cell></cell><cell cols="2">4.12</cell><cell>-</cell></row><row><cell></cell><cell cols="2">0.05</cell><cell>0.27</cell><cell></cell><cell>0.95</cell><cell></cell><cell cols="2">2.33</cell><cell>-</cell></row><row><cell>XGB</cell><cell cols="2">-</cell><cell>0.66</cell><cell></cell><cell>0.97</cell><cell></cell><cell cols="2">2.33</cell><cell>-</cell></row><row><cell></cell><cell></cell><cell>60%</cell><cell></cell><cell></cell><cell cols="3">Distribution of ARI</cell></row><row><cell></cell><cell>% Escalation Nodes</cell><cell>10% 20% 30% 40% 50%</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell>0%</cell><cell>0.0</cell><cell>0.2</cell><cell>0.4</cell><cell>ARI</cell><cell>0.6</cell><cell>0.8</cell><cell>1.0</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Also referred to as "goal-oriented" or "closed-domain".</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">While out of the scope of this work, neural models indeed dominate the open-domain chit-chat settings which don't suffer from these constraints<ref type="bibr" target="#b0">[1]</ref>.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">Within the scope of this work, we do not consider the text generation case.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">Such heuristics may include the length of the decision function, the amount of nesting (such as "A or (B and C)"), the number of negation elements in the function, and so on.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">Although any other clustering algorithm is applicable, we chose Mean Shift since it does not require the number of clusters to be predefined.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Towards a Human-like Open-Domain Chatbot</title>
		<author>
			<persName><forename type="first">Daniel</forename><surname>Adiwardana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Minh-Thang</forename><surname>Luong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><forename type="middle">R</forename><surname>So</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jamie</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Noah</forename><surname>Fiedel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Romal</forename><surname>Thoppilan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zi</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Apoorv</forename><surname>Kulshreshtha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gaurav</forename><surname>Nemade</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yifeng</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Quoc</surname></persName>
		</author>
		<author>
			<persName><surname>Le</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2001.09977</idno>
		<ptr target="https://arxiv.org/abs/2001.09977" />
		<imprint>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Conversational Contextual Cues: The Case of Personalization and History for Response Ranking</title>
		<author>
			<persName><forename type="first">Rami</forename><surname>Al-Rfou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marc</forename><surname>Pickett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Javier</forename><surname>Snaider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yun-Hsuan</forename><surname>Sung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Brian</forename><surname>Strope</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ray</forename><surname>Kurzweil</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1606.00372</idno>
		<ptr target="http://arxiv.org/abs/1606.00372" />
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">An efficient algorithm for optimal pruning of decision trees</title>
		<author>
			<persName><forename type="first">Hussein</forename><surname>Almuallim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">83</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="347" to="362" />
			<date type="published" when="1996">1996. 1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">A retrieval-based dialogue system utilizing utterance and context embeddings</title>
		<author>
			<persName><forename type="first">Alexander</forename><surname>Bartl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gerasimos</forename><surname>Spanakis</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1710.05780</idno>
		<ptr target="http://arxiv.org/abs/1710.05780" />
		<imprint>
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Microsoft Bot Framework</title>
		<author>
			<persName><forename type="first">Manisha</forename><surname>Biswas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Beginning AI Bot Frameworks</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="25" to="66" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">MultiWOZ -A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling</title>
		<author>
			<persName><forename type="first">Pawel</forename><surname>Budzianowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tsung-Hsien</forename><surname>Wen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bo-Hsiang</forename><surname>Tseng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Iñigo</forename><surname>Casanueva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stefan</forename><surname>Ultes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Milica</forename><surname>Osman Ramadan</surname></persName>
		</author>
		<author>
			<persName><surname>Gasic</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.00278</idno>
		<ptr target="http://arxiv.org/abs/1810.00278" />
		<imprint>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Toward an Architecture for Never-Ending Language Learning</title>
		<author>
			<persName><forename type="first">Andrew</forename><surname>Carlson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Justin</forename><surname>Betteridge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bryan</forename><surname>Kisiel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Burr</forename><surname>Settles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Estevam</forename><forename type="middle">R</forename><surname>Hruschka</surname><genName>Jr</genName></persName>
		</author>
		<author>
			<persName><forename type="first">Tom</forename><forename type="middle">M</forename><surname>Mitchell</surname></persName>
		</author>
		<ptr target="http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1879" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010</title>
				<editor>
			<persName><forename type="first">Maria</forename><surname>Fox</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">David</forename><surname>Poole</surname></persName>
		</editor>
		<meeting>the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010<address><addrLine>Atlanta, Georgia, USA</addrLine></address></meeting>
		<imprint>
			<publisher>AAAI Press</publisher>
			<date type="published" when="2010-07-11">2010. July 11-15, 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A Survey on Dialogue Systems: Recent Advances and New Frontiers</title>
		<author>
			<persName><forename type="first">Hongshen</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiaorui</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dawei</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jiliang</forename><surname>Tang</surname></persName>
		</author>
		<idno type="DOI">10.1145/3166054.3166058</idno>
		<ptr target="https://doi.org/10.1145/3166054.3166058" />
	</analytic>
	<monogr>
		<title level="j">SIGKDD Explorations</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="25" to="35" />
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">XGBoost: A Scalable Tree Boosting System</title>
		<author>
			<persName><forename type="first">Tianqi</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Carlos</forename><surname>Guestrin</surname></persName>
		</author>
		<idno type="DOI">10.1145/2939672.2939785</idno>
		<ptr target="https://doi.org/10.1145/2939672.2939785" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</title>
				<editor>
			<persName><forename type="first">Balaji</forename><surname>Krishnapuram</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Mohak</forename><surname>Shah</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Alexander</forename><forename type="middle">J</forename><surname>Smola</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Charu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Dou</forename><surname>Aggarwal</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Rajeev</forename><surname>Shen</surname></persName>
		</editor>
		<editor>
			<persName><surname>Rastogi</surname></persName>
		</editor>
		<meeting>the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining<address><addrLine>San Francisco, CA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2016-08-13">2016. August 13-17, 2016</date>
			<biblScope unit="page" from="785" to="794" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Mean Shift, Mode Seeking, and Clustering</title>
		<author>
			<persName><forename type="first">Yizong</forename><surname>Cheng</surname></persName>
		</author>
		<idno type="DOI">10.1109/34.400568</idno>
		<ptr target="https://doi.org/10.1109/34.400568" />
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. Pattern Anal. Mach. Intell</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="790" to="799" />
			<date type="published" when="1995">1995. 1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Survey on Evaluation Methods for Dialogue Systems</title>
		<author>
			<persName><forename type="first">Jan</forename><surname>Deriu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Álvaro</forename><surname>Rodrigo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Arantxa</forename><surname>Otegi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Guillermo</forename><surname>Echegoyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sophie</forename><surname>Rosset</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eneko</forename><surname>Agirre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mark</forename><surname>Cieliebak</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1905.04071</idno>
		<ptr target="http://arxiv.org/abs/1905.04071" />
		<imprint>
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</title>
		<author>
			<persName><forename type="first">Jacob</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ming-Wei</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kenton</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kristina</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<ptr target="http://arxiv.org/abs/1810.04805" />
		<imprint>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Building Watson: An Overview of the DeepQA Project</title>
		<author>
			<persName><forename type="first">David</forename><surname>Ferrucci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eric</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jennifer</forename><surname>Chu-Carroll</surname></persName>
		</author>
		<author>
			<persName><forename type="first">James</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Gondek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aditya</forename><forename type="middle">A</forename><surname>Kalyanpur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Adam</forename><surname>Lally</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">William</forename><surname>Murdock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eric</forename><surname>Nyberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">John</forename><surname>Prager</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nico</forename><surname>Schlaefer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chris</forename><surname>Welty</surname></persName>
		</author>
		<idno type="DOI">10.1609/aimag.v31i3.2303</idno>
		<ptr target="https://doi.org/10.1609/aimag.v31i3.2303" />
	</analytic>
	<monogr>
		<title level="j">AI Magazine</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="59" to="79" />
			<date type="published" when="2010-07">2010. Jul. 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Distributed Representations of Sentences and Documents</title>
		<author>
			<persName><forename type="first">V</forename><surname>Quoc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tomas</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><surname>Mikolov</surname></persName>
		</author>
		<ptr target="http://proceedings.mlr.press/v32/le14.html" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 31th International Conference on Machine Learning, ICML 2014</title>
		<title level="s">JMLR Workshop and Conference Proceedings</title>
		<meeting>the 31th International Conference on Machine Learning, ICML 2014<address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<publisher>JMLR</publisher>
			<date type="published" when="2014-06-26">2014. 21-26 June 2014</date>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="page" from="1188" to="1196" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Iterative policy learning in end-to-end trainable task-oriented neural dialog models</title>
		<author>
			<persName><forename type="first">Bing</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ian</forename><surname>Lane</surname></persName>
		</author>
		<idno type="DOI">10.1109/ASRU.2017.8268975</idno>
		<ptr target="https://doi.org/10.1109/ASRU.2017.8268975" />
	</analytic>
	<monogr>
		<title level="m">IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017</title>
				<meeting><address><addrLine>Okinawa, Japan</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2017-12-16">2017. 2017. December 16-20, 2017</date>
			<biblScope unit="page" from="482" to="489" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems</title>
		<author>
			<persName><forename type="first">Bing</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gökhan</forename><surname>Tür</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dilek</forename><surname>Hakkani-Tür</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pararth</forename><surname>Shah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Larry</forename><forename type="middle">P</forename><surname>Heck</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/n18-1187</idno>
		<ptr target="https://doi.org/10.18653/v1/n18-1187" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018</title>
				<editor>
			<persName><forename type="first">Marilyn</forename><forename type="middle">A</forename><surname>Walker</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Ji</forename><surname>Heng</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Amanda</forename><surname>Stent</surname></persName>
		</editor>
		<meeting>the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018<address><addrLine>New Orleans, Louisiana, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018-06-01">2018. June 1-6, 2018</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="2060" to="2069" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A New Method for Boolean Function Simplification</title>
		<author>
			<persName><forename type="first">Maher</forename><surname>Nabulsi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ahmad</forename><surname>Alkatib</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fatima</forename><surname>Quiam</surname></persName>
		</author>
		<idno type="DOI">10.14257/ijca.2017.10.12.13</idno>
		<ptr target="https://doi.org/10.14257/ijca.2017.10.12.13" />
	</analytic>
	<monogr>
		<title level="j">International Journal of Control and Automation</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">12</biblScope>
			<biblScope unit="page" from="139" to="146" />
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A Survey on Chatbot Implementation in Customer Service Industry through Deep Neural Networks</title>
		<author>
			<persName><forename type="first">Mohammad</forename><surname>Nuruzzaman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Omar</forename><surname>Khadeer Hussain</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICEBE.2018.00019</idno>
		<ptr target="https://doi.org/10.1109/ICEBE.2018.00019" />
	</analytic>
	<monogr>
		<title level="m">15th IEEE International Conference on e-Business Engineering, ICEBE 2018</title>
				<meeting><address><addrLine>Xi&apos;an, China</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2018-10-12">2018. October 12-14, 2018</date>
			<biblScope unit="page" from="54" to="61" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Scikit-learn: Machine Learning in Python</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Varoquaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gramfort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Thirion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Grisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blondel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Prettenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dubourg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanderplas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cournapeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brucher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perrot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Duchesnay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011">2011. 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Retrospective and Prospective Mixture-of-Generators for Task-oriented Dialogue Response Generation</title>
		<author>
			<persName><forename type="first">Jiahuan</forename><surname>Pei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pengjie</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christof</forename><surname>Monz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maarten</forename><surname>De Rijke</surname></persName>
		</author>
		<idno>ArXiv abs/1911.08151</idno>
		<imprint>
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Learning End-to-End Goal-Oriented Dialog with Maximal User Task Success and Minimal Human Agent Use</title>
		<author>
			<persName><forename type="first">Janarthanan</forename><surname>Rajendran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jatin</forename><surname>Ganhotra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lazaros</forename><forename type="middle">C</forename><surname>Polymenakos</surname></persName>
		</author>
		<ptr target="https://transacl.org/ojs/index.php/tacl/article/view/1622" />
	</analytic>
	<monogr>
		<title level="j">TACL</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="375" to="386" />
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Objective Criteria for the Evaluation of Clustering Methods</title>
		<author>
			<persName><forename type="first">William</forename><forename type="middle">M</forename><surname>Rand</surname></persName>
		</author>
		<ptr target="http://www.jstor.org/stable/2284239" />
	</analytic>
	<monogr>
		<title level="j">J. Amer. Statist. Assoc</title>
		<imprint>
			<biblScope unit="volume">66</biblScope>
			<biblScope unit="page" from="846" to="850" />
			<date type="published" when="1971">1971. 1971</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Scalable Multi-Domain Dialogue State Tracking</title>
		<author>
			<persName><forename type="first">Abhinav</forename><surname>Rastogi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dilek</forename><surname>Hakkani-Tür</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Larry</forename><forename type="middle">P</forename><surname>Heck</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1712.10224</idno>
		<ptr target="http://arxiv.org/abs/1712.10224" />
		<imprint>
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Creating natural dialogs in the Carnegie Mellon Communicator system</title>
		<author>
			<persName><forename type="first">Eric</forename><surname>Alexander I Rudnicky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paul</forename><surname>Thayer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chris</forename><surname>Constantinides</surname></persName>
		</author>
		<author>
			<persName><surname>Tchou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kevin</forename><surname>Shern</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wei</forename><surname>Lenzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alice</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><surname>Oh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Sixth European Conference on Speech Communication and Technology</title>
				<imprint>
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Introduction to Google Dialogflow</title>
		<author>
			<persName><forename type="first">Navin</forename><surname>Sabharwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Amit</forename><surname>Agrawal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Cognitive Virtual Assistants Using Google Dialogflow</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="13" to="54" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Integrating with Advance Services</title>
		<author>
			<persName><forename type="first">Navin</forename><surname>Sabharwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sudipta</forename><surname>Barua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Neha</forename><surname>Anand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pallavi</forename><surname>Aggarwal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Developing Cognitive Bots Using the IBM Watson Engine</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="197" to="239" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Lifelong Machine Learning Systems: Beyond Learning Algorithms</title>
		<author>
			<persName><forename type="first">L</forename><surname>Daniel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Qiang</forename><surname>Silver</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lianghao</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><surname>Li</surname></persName>
		</author>
		<ptr target="http://www.aaai.org/ocs/index.php/SSS/SSS13/paper/view/5802" />
	</analytic>
	<monogr>
		<title level="m">Papers from the 2013 AAAI Spring Symposium</title>
		<title level="s">AAAI Technical Report</title>
		<meeting><address><addrLine>Palo Alto, California, USA</addrLine></address></meeting>
		<imprint>
			<publisher>AAAI</publisher>
			<date type="published" when="2013-03-25">2013. March 25-27, 2013</date>
			<biblScope unit="page" from="13" to="15" />
		</imprint>
	</monogr>
	<note>Lifelong Machine Learning</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Production Ready Chatbots: Generate if Not Retrieve</title>
		<author>
			<persName><forename type="first">Aniruddha</forename><surname>Tammewar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Monik</forename><surname>Pamecha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chirag</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Apurva</forename><surname>Nagvenkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Krupal</forename><surname>Modi</surname></persName>
		</author>
		<ptr target="https://aaai.org/ocs/index.php/WS/AAAIW18/paper/view/17357" />
	</analytic>
	<monogr>
		<title level="m">The Workshops of the The Thirty-Second AAAI Conference on Artificial Intelligence</title>
		<title level="s">AAAI Workshops</title>
		<meeting><address><addrLine>New Orleans, Louisiana, USA</addrLine></address></meeting>
		<imprint>
			<publisher>AAAI Press</publisher>
			<date type="published" when="2018-02-02">2018. February 2-7, 2018</date>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="739" to="745" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Multi-Representation Fusion Network for Multi-Turn Response Selection in Retrieval-Based Chatbots</title>
		<author>
			<persName><forename type="first">Chongyang</forename><surname>Tao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wei</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Can</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wenpeng</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dongyan</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rui</forename><surname>Yan</surname></persName>
		</author>
		<idno type="DOI">10.1145/3289600.3290985</idno>
		<ptr target="https://doi.org/10.1145/3289600.3290985" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining</title>
				<meeting>the Twelfth ACM International Conference on Web Search and Data Mining<address><addrLine>Melbourne VIC, Australia; New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="267" to="275" />
		</imprint>
	</monogr>
	<note>WSDM &apos;19</note>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance</title>
		<author>
			<persName><forename type="first">Xuan</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Julien</forename><surname>Vinh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">James</forename><surname>Epps</surname></persName>
		</author>
		<author>
			<persName><surname>Bailey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Mach. Learn. Res</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="2837" to="2854" />
			<date type="published" when="2010-12">2010. Dec. 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Lightly Supervised Learning of Procedural Dialog Systems</title>
		<author>
			<persName><forename type="first">Svitlana</forename><surname>Volkova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pallavi</forename><surname>Choudhury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chris</forename><surname>Quirk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bill</forename><surname>Dolan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Luke</forename><forename type="middle">S</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/P13-1164/" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013</title>
				<meeting>the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013<address><addrLine>Sofia, Bulgaria</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013-08">2013. August 2013</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1669" to="1679" />
		</imprint>
	</monogr>
	<note>Long Papers. The Association for Computer Linguistics</note>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<title level="m" type="main">A Network-based End-to-End Trainable Task-oriented Dialogue System</title>
		<author>
			<persName><forename type="first">Tsung-Hsien</forename><surname>Wen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Milica</forename><surname>Gasic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nikola</forename><surname>Mrksic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lina</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Maria</forename><surname>Rojas-Barahona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pei-Hao</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stefan</forename><surname>Ultes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Vandyke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Steve</forename><forename type="middle">J</forename><surname>Young</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1604.04562</idno>
		<ptr target="http://arxiv.org/abs/1604.04562" />
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System</title>
		<author>
			<persName><forename type="first">Rui</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yiping</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hua</forename><surname>Wu</surname></persName>
		</author>
		<idno type="DOI">10.1145/2911451.2911542</idno>
		<ptr target="https://doi.org/10.1145/2911451.2911542" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval</title>
				<meeting>the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval<address><addrLine>Pisa, Italy; New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="55" to="64" />
		</imprint>
	</monogr>
	<note>SIGIR &apos;16</note>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Building Task-Oriented Dialogue Systems for Online Shopping</title>
		<author>
			<persName><forename type="first">Nan</forename><surname>Zhao Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Peng</forename><surname>Duan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ming</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jianshe</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhoujun</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><surname>Li</surname></persName>
		</author>
		<ptr target="http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14261" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence</title>
				<editor>
			<persName><surname>Usa</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Satinder</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Shaul</forename><surname>Singh</surname></persName>
		</editor>
		<editor>
			<persName><surname>Markovitch</surname></persName>
		</editor>
		<meeting>the Thirty-First AAAI Conference on Artificial Intelligence<address><addrLine>San Francisco, California</addrLine></address></meeting>
		<imprint>
			<publisher>AAAI Press</publisher>
			<date type="published" when="2017-02-04">2017. February 4-9, 2017</date>
			<biblScope unit="page" from="4618" to="4626" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">MOLI: Smart Conversation Agent for Mobile Customer Service</title>
		<author>
			<persName><forename type="first">Guoguang</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jianyu</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yang</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christoph</forename><surname>Alt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Robert</forename><surname>Schwarzenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Leonhard</forename><surname>Hennig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stefan</forename><surname>Schaffer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sven</forename><surname>Schmeier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Changjian</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Feiyu</forename><surname>Xu</surname></persName>
		</author>
		<idno type="DOI">10.3390/info10020063</idno>
		<ptr target="https://doi.org/10.3390/info10020063" />
	</analytic>
	<monogr>
		<title level="j">Information</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">02</biblScope>
			<biblScope unit="page">63</biblScope>
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
