<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Intimacy-aware Style Control in Dialog Response Generation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Takuto</forename><surname>Miura</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Advanced Institute of Science and Technology</orgName>
								<orgName type="institution">Japan</orgName>
								<address>
									<postCode>9231211</postCode>
									<settlement>Nomi</settlement>
									<region>Ishikawa</region>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Kiyoaki</forename><surname>Shirai</surname></persName>
							<email>kshirai@jaist.ac.jp</email>
							<affiliation key="aff0">
								<orgName type="department">Advanced Institute of Science and Technology</orgName>
								<orgName type="institution">Japan</orgName>
								<address>
									<postCode>9231211</postCode>
									<settlement>Nomi</settlement>
									<region>Ishikawa</region>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Natthawut</forename><surname>Kertkeidkachorn</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Advanced Institute of Science and Technology</orgName>
								<orgName type="institution">Japan</orgName>
								<address>
									<postCode>9231211</postCode>
									<settlement>Nomi</settlement>
									<region>Ishikawa</region>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Intimacy-aware Style Control in Dialog Response Generation</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">357DAFD9818A8DD86AD5EEC6B4A23D70</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Dialog System</term>
					<term>Speech Style</term>
					<term>Intimacy The 9th Linguistic and Cognitive Approaches to Dialog Agents Workshop</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>One of the crucial features in developing a dialog system is the choice of an appropriate speech style. This paper proposes a novel method for training a dialog model that can effectively control the style of a response. Specifically, the dialog model generates responses in a polite style when the user exhibits a low level of intimacy with the system and in a casual style when the user shows a high level of intimacy. Using a pre-trained language model (PLM) as a base dialog model, two loss functions are proposed for fine-tuning the PLM to generate responses in an appropriate style. One is the intimacy-aware word-level loss, which serves to ensure that the dialog model generates a polite or casual word when the user's level of intimacy is low or high. The other is the intimacy-aware sentence-level loss, which functions to increase the probability of the polite style of the generated utterance when the user's level of intimacy is low, and vice versa. The results of both automatic and human evaluations in the experiments demonstrate that the proposed method is more effective than the baselines in generating responses that align with the user's degree of intimacy. Furthermore, the proposed method exhibits comparable relevance and fluency to the PLM, indicating that the losses for the style control do not diminish the PLM's exceptional capacity for generating relevant and fluent responses.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Dialog systems that freely chat with users on a wide range of topics have attracted a great deal of attention in recent years <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b2">3]</ref>. These systems are required to have comfortable conversations with users and build long-term friendly relationships with them <ref type="bibr" target="#b3">[4]</ref>. Humans adjust their speech style according to their social relationships with their partners and/or the level of intimacy they share with their partners <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7]</ref>. Such behavior is referred to as a "style control" hereafter. One of the style controls is to use both polite and casual styles depending on the relationship with the partner <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref>. Polite styles are often used in a conversation with a boss or a teacher, while casual styles are often employed with a friend or a life partner. The style control should be considered in all conversations, whether between humans or between humans and dialog systems <ref type="bibr" target="#b9">[10]</ref>.</p><p>The goal of this research is to develop a dialog system that flexibly controls speech styles according to the user. Specifically, concerning the user's intimacy with the dialog system, a response is generated in a polite style when the user's level of the intimacy is low, and in a casual style when the level of the intimacy is high. To achieve this, we propose a method to incorporate knowledge necessary for style control by fine-tuning a dialog model based on a pre-trained language model (PLM) that is capable of generating a variety of responses consistent with the dialog context. A new loss function for fine-tuning a dialog model is designed so that the model generates polite or casual responses when the level of the intimacy is low or high, where the level of the intimacy is estimated from the user's past utterances.</p><p>The contributions of this paper are summarized as follows:</p><p>• We develop a dialog system that estimates the user's level of the intimacy and controls the polite and casual styles in generating responses accordingly.</p><p>• We propose an approach to incorporate knowledge for style control into an existing outstanding PLM-based dialog model. • We demonstrate the effectiveness of the proposed method by both automatic and manual evaluations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Several methods have been developed for the generation of responses in a specific style. Niu and Bansal defined the task of generating responses in a predefined style, such as polite or rude <ref type="bibr" target="#b10">[11]</ref>. <ref type="bibr">Gao et al.</ref> proposed a method that shared the latent space between conversational and stylistic modeling and developed a model that generated responses in a specified style while maintaining consistency with the dialog context <ref type="bibr" target="#b11">[12]</ref>. Zhu et al. extended Gao's model so that the representation of content and style was learned in different dimensions in latent space <ref type="bibr" target="#b12">[13]</ref>. Zheng et al. proposed a method for automatic construction of a dialog corpus consisting of utterances in a certain style, aiming to train a stylized dialog model <ref type="bibr" target="#b13">[14]</ref>. Specifically, they created a Seq2Seq model, which transforms sentences in an original dialog corpus into ones in the specified style, using texts written in that style. Tsai et al. evaluated three approaches to achieve both content and style fidelity: conditional learning, guided fine-tuning, and guided decoding <ref type="bibr" target="#b14">[15]</ref>. In conditional learning, special tokens about a style are added to the input of the dialog model. In guided fine-tuning, a style of an utterance is classified, and the classification result is added to the input of the dialog model. In guided decoding, the weights of the output of the decoder are determined based on the result of the style classification model. Saha et al. proposed a multitask learning method that predicts the speaker's personality and intention when training a dialog model <ref type="bibr" target="#b15">[16]</ref>. This approach is designed to control the style following the predicted state of the speaker.</p><p>Based on the aforementioned studies on maintaining a style in response generation, more recent methods have been developed to add the capability of style control to a well-developed existing dialog model. Sun et al. trained a dialog model using reinforcement learning, in which responses similar to the ground-truth response and including style-related tokens got a higher reward <ref type="bibr" target="#b16">[17]</ref>. The similarity between responses was measured by the cosine similarity of the sentence embeddings, while the stylespecific tokes were identified by the pre-trained classification model. Li et al. retrieved a sentence similar to an utterance from a corpus of sentences written in a specific style and fed the retrieved sentence and the utterance into a dialog model to generate a stylized response <ref type="bibr" target="#b17">[18]</ref>. Since the retrieved style sentence might be harmful to generate a response consistent with the context, they incorporated an encoder that removed features not pertinent to the context, resulting in the extraction of style features only, into the dialog model. This encoder was trained simultaneously with the dialog model. Yang et al. proposed loss functions using a language model that generated sentences in the specified style and a classification model that identified the style of a sentence for fine-tuning the PLM of the dialog model <ref type="bibr" target="#b18">[19]</ref>.</p><p>Although the above previous studies can generate natural stylized responses, they are limited in their ability to handle a single style. In contrast, our method enables the control of multiple styles according to the user's mental state.</p><p>Several studies focused on the emotional state of the user during a dialog. Skowron et al. showed that interactive expression of emotions in response to the user's feelings can significantly contribute to enhancing the enjoyment of the chat and the emotional connection between the user and the system <ref type="bibr" target="#b19">[20]</ref>. D'mello and Graesser developed an intelligent tutoring agent that responds empathetically or motivationally according to the user's cognitive and emotional states <ref type="bibr" target="#b20">[21]</ref>. This interactive agent dramatically improved the learning efficacy of students with limited domain knowledge. Thus, controlling the type of response of the system according to the user's internal state exerts a considerable influence. In this study, we deal with the user's intimacy as the user's internal state and the speech style as the type of response. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Proposed Method</head><p>The proposed method learns a dialog model that can adapt its style to either polite or casual by the user's level of intimacy, which is automatically estimated from the user's historical utterances. Figure <ref type="figure" target="#fig_0">1</ref> shows an overview of the proposed method. Let us suppose that a dialog model generates a response to a user for a given dialog context 𝑋 = {𝑆 1 , 𝑈 1 , • • • , 𝑆 𝑛 , 𝑈 𝑛 }, where a system (𝑆) and a user (𝑈 ) make an utterance alternately. Figure <ref type="figure" target="#fig_0">1</ref> exemplifies the case of n = 4. The intimacy estimation model employs the user's previous utterances 𝑋 𝑢 = {𝑈 1 , • • • , 𝑈 𝑛 } as input and determines whether the user's level of intimacy with the dialog system is high or low. The dialog model accepts the context 𝑋 and the estimated intimacy level as input and generates the response 𝑆 𝑛+1 in a casual style when the user's intimacy is high and in a polite style when the user's intimacy is low.</p><p>To learn the above dialog model, we extend STYLEDGPT <ref type="bibr" target="#b18">[19]</ref>, a model that consistently generates responses in a specified style obtained by fine-tuning a PLM that can generate versatile responses. To avoid impairing the exceptional response generation capability of the PLM, only the loss function in finetuning is modified while the architecture of the PLM remains intact. Indeed, Yang et al. demonstrated that STYLEDGPT performed well not only in its ability to produce utterances in the specified style but also generate relevant and fluency responses <ref type="bibr" target="#b18">[19]</ref>. First, we provide an overview of STYLEDGPT in subsection 3.1 and then describe the details of the proposed method in the succeeding subsections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">STYLEDGPT</head><p>STYLEDGPT employs DialoGPT <ref type="bibr" target="#b21">[22]</ref> as a PLM and learns a model that consistently generates responses in a specified style by fine-tuning it. DialoGPT is a Seq2Seq model based on GPT-2 <ref type="bibr" target="#b22">[23]</ref> and has been pre-trained with a large amount of dialog data.</p><p>Word-Level Loss First, a style language model 𝑃 𝑠 (𝑇 ) is trained in advance. Using a style corpus, 𝐷 𝑠𝑡𝑦𝑙𝑒 , consisting of only texts in a given style, GPT-2 is trained as an autoencoder, i.e., the same sentence 𝑇 ∈ 𝐷 𝑠𝑡𝑦𝑙𝑒 is given as input and output for fine-tuning.</p><p>Let 𝑃 (𝑌 |𝑋) be a dialog model that returns a response 𝑌 for a given dialog context 𝑋. The loss is computed for each dialog sample (𝑋, 𝑌 ) in the training data 𝐷 𝑑𝑖𝑎𝑙𝑜𝑔 . 𝑌 is a sequence of words denoted by</p><formula xml:id="formula_0">𝑌 = {𝑦 1 , • • • , 𝑦 𝑚 }. Let p Y = {𝑝 𝑦 1 , • • • , 𝑝 𝑦𝑚 }</formula><p>be the distribution of the predicted probability of the next word given by the dialog model 𝑃 (𝑌 |𝑋). Also, let p ^Y = {𝑝 ^𝑦1 , • • • , 𝑝 ^𝑦𝑚 } be the distribution of the probability of predicting the next word given by the style language model 𝑃 𝑠 (𝑌 ) when the output 𝑌 of the dialog model is taken as input of the style language model. The distance between p Y and p ^Y is defined as word-level loss 𝐿 𝑤 as in Eq. <ref type="bibr" target="#b0">(1)</ref>.</p><formula xml:id="formula_1">𝐿 𝑤 = 𝑑(p Y ||p ^Y) def = 𝑚 ∑︁ 𝑖=1 𝐷 𝐾𝐿 (𝑝 𝑦 𝑖 ||𝑝 ^𝑦𝑖 )<label>(1)</label></formula><p>𝐷 𝐾𝐿 is the Kullback-Leibler (KL) divergence of the two probability distributions. This loss causes p Y to approach p ^Y, i.e., the dialog model is trained to produce utterances in the specified style.</p><p>Sentence-Level Loss First, a style discrimination model 𝑃 (𝑆|𝑇 ) is trained in advance. It identifies whether a sentence 𝑇 is written in a given style 𝑆. This model is trained on a dataset that consists of 𝐷 𝑠𝑡𝑦𝑙𝑒 , a corpus of sequences written in the specific style, as positive samples, and 𝐷 𝑑𝑖𝑎𝑙𝑜𝑔 , a general dialog corpus, as negative samples.</p><p>The loss is computed for each dialog sample (𝑋, 𝑌 ) ∈ 𝐷 𝑑𝑖𝑎𝑙𝑜𝑔 . Let 𝑌 ^be a response generated by the dialog model 𝑃 (𝑌 |𝑋) for the input 𝑋, and 𝑝(𝑆|𝑌 ^) be the probability that the style of 𝑌 ^is coincident with the style 𝑆. Then, the sentence-level loss 𝐿 𝑠 is defined as in Eq. <ref type="bibr" target="#b1">(2)</ref>.</p><formula xml:id="formula_2">𝐿 𝑠 = − log 𝑝(𝑆|𝑌 ^)<label>(2)</label></formula><p>This loss causes the dialog model 𝑃 (𝑌 |𝑋) to produce utterances in the style 𝑆.</p><p>Negative Log-likelihood Loss The two losses mentioned above are designed to take into account the style of a response. Fine-tuning a model with only these losses may result in a lack of consistency between a context and a generated response. Therefore, the negative log-likelihood loss (Eq. ( <ref type="formula" target="#formula_3">3</ref>)) is also used, which is a common loss for training a dialog model. 𝑝(𝑌 |𝑋) is the probability that the dialog model generates a ground-truth response 𝑌 from 𝑋, where (𝑋, 𝑌 ) is a sample in 𝐷 𝑑𝑖𝑎𝑙𝑜𝑔 .</p><formula xml:id="formula_3">𝐿 𝑁 𝐿𝐿 = −𝑙𝑜𝑔 𝑝(𝑌 |𝑋)<label>(3)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Loss for Style Control</head><p>We modify the word-level and sentence-level losses in STYLEDGPT to control the style of the response according to the user's level of intimacy.</p><p>First, an intimacy estimation model 𝑃 (𝐼|𝑋 𝑢 ) is trained. This model predicts 𝐼, the user's level of intimacy with a dialog system, given the user's past 𝑛 utterances (𝑋 𝑢 ) as input. In our model, 𝐼 is defined as either low or high. The intimacy estimation model is pre-trained using a dialog corpus annotated with the speaker's intimacy.</p><p>To handle both polite and casual styles in response generation, two style corpora are prepared. One is 𝐷 𝑝𝑜 𝑠𝑡𝑦𝑙𝑒 , which consists of polite-style sentences, and the other is 𝐷 𝑐𝑎 𝑠𝑡𝑦𝑙𝑒 , which consists of casual-style sentences.</p><p>Intimacy-aware Word-Level Loss First, the language models of the polite and casual styles, 𝑃 𝑝𝑜 (𝑇 ) and 𝑃 𝑐𝑎 (𝑇 ), are pre-trained using the corpora 𝐷 𝑝𝑜 𝑠𝑡𝑦𝑙𝑒 and 𝐷 𝑐𝑎 𝑠𝑡𝑦𝑙𝑒 , respectively. Next, the word-level loss of the polite style, 𝐿 𝑝𝑜 𝑤 , is computed as in Eq. ( <ref type="formula" target="#formula_1">1</ref>). It evaluates how likely a response is to be polite. Similarly, the word-level loss of the casual style, 𝐿 𝑐𝑎 𝑤 , is calculated. Finally, the intimacy-aware word-level loss, 𝐿 𝑖𝑛 𝑤 , is defined as the weighted sum of these two losses (Eq. ( <ref type="formula" target="#formula_4">4</ref>)). 𝑝(𝐼=low|𝑋 𝑢 ) and 𝑝(𝐼=high|𝑋 𝑢 ) are the weights for 𝐿 𝑝𝑜 𝑤 and 𝐿 𝑐𝑎 𝑤 , which are the probability that the user's level of intimacy is low and high, respectively.</p><formula xml:id="formula_4">𝐿 𝑖𝑛 𝑤 𝑑𝑒𝑓 = 𝑝(𝐼=low|𝑋 𝑢 ) • 𝐿 𝑝𝑜 𝑤 + 𝑝(𝐼=high|𝑋 𝑢 ) • 𝐿 𝑐𝑎 𝑤 (<label>4</label></formula><formula xml:id="formula_5">)</formula><p>This loss is expected to encourage the generation of more polite tokens when intimacy is low and more casual tokens when intimacy is high. Following the sentence-level loss of STYLEDGPT, the intimacy-aware sentence-level loss, 𝐿 𝑖𝑛 𝑠 , is defined as the weighted sum of the logarithms of these probabilities, using the two probabilities 𝑝(𝐼=low|𝑋 𝑢 ) and 𝑝(𝐼=high|𝑋 𝑢 ) as weights (Eq. ( <ref type="formula" target="#formula_6">5</ref>)).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Intimacy-aware</head><formula xml:id="formula_6">𝐿 𝑖𝑛 𝑠 𝑑𝑒𝑓 = −𝑝(𝐼=low|𝑋 𝑢 ) • log 𝑝(𝑆=polite|𝑌 ) − 𝑝(𝐼=high|𝑋 𝑢 ) • log 𝑝(𝑆=casual|𝑌 )<label>(5)</label></formula><p>This loss is expected to learn the dialog model to generate utterances in the polite style when the intimacy is low and in the casual style when the intimacy is high.</p><p>Training Objective Eq. ( <ref type="formula" target="#formula_7">6</ref>) shows the total loss, which is a weighted sum of the two losses concerning a style (𝐿 𝑖𝑛 𝑤 and 𝐿 𝑖𝑛 𝑠 ) and a general response loss (𝐿 𝑁 𝐿𝐿 ).</p><formula xml:id="formula_7">𝐿 = 𝛽 𝑤 • 𝐿 𝑖𝑛 𝑤 + 𝛽 𝑠 • 𝐿 𝑖𝑛 𝑠 + 𝛽 𝑁 𝐿𝐿 • 𝐿 𝑁 𝐿𝐿<label>(6)</label></formula><p>𝛽 𝑤 , 𝛽 𝑠 , and 𝛽 𝑁 𝐿𝐿 are hyperparameters representing the weight of each loss.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Additional Input</head><p>In addition to incorporating the information of user's intimacy into the loss functions, the user's level of intimacy is explicitly given in an input to the dialog model. Specifically, the level of intimacy is identified by 𝑃 (𝐼|𝑋 𝑢 ), and then the intimacy label is added to the input as follows:</p><p>• When 𝐼=low : &lt;l&gt; &lt;s&gt; context &lt;/s&gt; • When 𝐼=high : &lt;h&gt; &lt;s&gt; context &lt;/s&gt; &lt;l&gt; and &lt;h&gt; are special tokens indicating the low and high intimacy classes, respectively. &lt;s&gt; and &lt;/s&gt; are special tokens indicating the beginning and end of the dialog context. This additional input allows the dialog model to generate responses in an appropriate style that matches the identified level of intimacy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Sampling and Ranking</head><p>To enhance the ability of the dialog model to generate appropriately styled utterances, the samplingand-rank decoding strategy <ref type="bibr" target="#b11">[12]</ref> is employed as in STYLEDGPT. First, the dialog model generates 𝑁 candidate responses using top-𝑘 sampling. Next, a style score and a content score are calculated for each candidate response, 𝑌 𝑖 , to assess the quality of 𝑌 𝑖 . The candidate responses are then re-ranked by the weighted sum of these scores, and the response with the highest score is chosen as the final output.</p><p>The style score Score 𝑠𝑡𝑦𝑙𝑒 (𝑌 𝑖 ) is a weighted sum of the style probabilities of 𝑌 𝑖 , as in Eq. ( <ref type="formula" target="#formula_8">7</ref>). The weights are the probabilities of the low and high intimacy predicted by the history of the user's utterances 𝑋 𝑢 . A greater style score indicates that a response is generated in the polite (or casual) style and the user's level of intimacy is low (or high).</p><formula xml:id="formula_8">Score 𝑠𝑡𝑦𝑙𝑒 (𝑌 𝑖 ) 𝑑𝑒𝑓 = 𝑝(𝐼=low|𝑋 𝑢 ) • 𝑝(𝑆=polite|𝑌 𝑖 ) + 𝑝(𝐼=high|𝑋 𝑢 ) • 𝑝(𝑆=casual|𝑌 𝑖 )<label>(7)</label></formula><p>The content score Score 𝑐𝑜𝑛𝑡𝑒𝑛𝑡 (𝑌 𝑖 ) is defined as the probability that the dialog model 𝑃 (𝑌 |𝑋) outputs the response candidate 𝑌 𝑖 when the dialog context 𝑋 is an input, as shown in Eq. ( <ref type="formula" target="#formula_9">8</ref>). This score evaluates the relevance of 𝑌 𝑖 to 𝑋.</p><formula xml:id="formula_9">Score 𝑐𝑜𝑛𝑡𝑒𝑛𝑡 (𝑌 𝑖 ) 𝑑𝑒𝑓 = 𝑃 (𝑌 𝑖 |𝑋)<label>(8)</label></formula><p>The final score Score(𝑌 𝑖 ) is defined as Eq. ( <ref type="formula" target="#formula_10">9</ref>). The hyperparameter 𝜔 determines the relative weighting of the two scores.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Score(𝑌</head><formula xml:id="formula_10">𝑖 ) 𝑑𝑒𝑓 = (1 − 𝜔) • Score 𝑠𝑡𝑦𝑙𝑒 (𝑌 𝑖 ) + 𝜔 • Score 𝑐𝑜𝑛𝑡𝑒𝑛𝑡 (𝑌 𝑖 )<label>(9)</label></formula><p>4. Experiments</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Datasets</head><p>Dialog Corpus with Intimacy Level Our in-house dialog corpus annotated with intimacy labels was used to evaluate the proposed method. This corpus consists of recorded and transcribed dialogs of approximately ten minutes, conducted between two speakers. For each dialog, the intimacy labels of each of the two speakers to his/her dialog partner are annotated on a five-point scale. The statistics of the corpus are as follows: the number of subjects who participated in the conversations is 19, the number of conversations is 54, and the total number of utterances is 6,984. Hereafter, we refer to this corpus as the "Japanese Intimacy Dialog Corpus" or "JID corpus" for short. The 54 dialogs in the JID corpus were divided into three subsets: a training set of 33 dialogs, a validation set of 9, and a test set of 12. As mentioned in Section 3, the dialog model accepts the preceding dialog context of the user and the system, 𝑋 = {𝑆 1 , 𝑈 1 , • • • , 𝑆 𝑛 , 𝑈 𝑛 }, as input and generates the subsequent response 𝑆 𝑛+1 as output. Hereafter, the pair of a dialog context and its corresponding response, denoted by (𝑋, 𝑆 𝑛+1 ), will be referred to as an instance of response. One speaker in the corpus was designated as the system and the other as the user to extract a dialog context and response. The first 𝑛×2 utterances and the next utterance in a dialog were extracted as (𝑋, 𝑆 𝑛+1 ). This procedure was then repeated, with the utterance shifted one by one, to obtain multiple instances of responses. Finally, 4,032, 921, and 1,284 instances of responses were obtained as the training, validation, and test data, respectively.</p><p>We also used this corpus to train an intimacy estimation model. Let 𝑋 𝑢 = {𝑈 1 , • • • , 𝑈 𝑛 } be the user's utterance extracted from the dialog context 𝑋 in an instance of response, and let 𝐼 be the intimacy label for the dialog. The intimacy label was designated as "low" when the corresponding score in the JID corpus was 1 or 2, or "high" when the value was 3, 4, or 5. The intimacy estimation model, 𝑃 (𝐼|𝑋 𝑢 ), is a binary classification model that takes 𝑋 𝑢 as input and estimates the intimacy label 𝐼. The model was trained using samples (𝑋 𝑢 , 𝐼) in the training and validation data and its performance was evaluated using the test data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Style Corpus</head><p>Two style corpora are required to train style language models and a style discrimination model: 𝐷 𝑝𝑜 𝑠𝑡𝑦𝑙𝑒 and 𝐷 𝑐𝑎 𝑠𝑡𝑦𝑙𝑒 . The KeiCO corpus <ref type="bibr" target="#b8">[9]</ref> was used as 𝐷 𝑝𝑜 𝑠𝑡𝑦𝑙𝑒 . This corpus contains utterances using various types of honorific expressions in Japanese. Besides, 𝐷 𝑐𝑎 𝑠𝑡𝑦𝑙𝑒 was constructed by extracting utterances from conversations between speakers who know each other in the BTSJ corpus <ref type="bibr" target="#b23">[24]</ref>. 𝐷 𝑝𝑜 𝑠𝑡𝑦𝑙𝑒 contains 10,007 utterances, while 𝐷 𝑐𝑎 𝑠𝑡𝑦𝑙𝑒 contains 13,351 utterances. To train the polite and the casual style language model, 𝑃 𝑝𝑜 (𝑇 ) and 𝑃 𝑐𝑎 (𝑇 ), all utterances in 𝐷 𝑝𝑜 𝑠𝑡𝑦𝑙𝑒 and 𝐷 𝑐𝑎 𝑠𝑡𝑦𝑙𝑒 , respectively, were utilized. To train the style discrimination model 𝑃 ′ (𝑆|𝑇 ), a total of 23,248 utterances were used, comprising 9,957 utterances in 𝐷 𝑝𝑜 𝑠𝑡𝑦𝑙𝑒 and 13,301 utterances in 𝐷 𝑐𝑎 𝑠𝑡𝑦𝑙𝑒 . The remaining 100 utterances (50 utterances each) were used to evaluate the style discrimination model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Experimental Setting</head><p>The following methods, including our proposed methods, were compared in the experiment.</p><p>• DialoGPT <ref type="bibr" target="#b21">[22]</ref> is the dialog model based on GPT-2 <ref type="bibr" target="#b22">[23]</ref>, which has been pre-trained using a large amount of dialog data. • S-GPT 𝑝𝑜 is a STYLEDGPT that always generates polite-style responses.</p><p>• S-GPT 𝑐𝑎 is a STYLEDGPT that always generates casual-style responses.</p><p>• Rule 𝑎𝑢𝑡𝑜 is a method to control the style by heuristics. A response is generated by S-GPT 𝑝𝑜 when the intimacy estimation model identifies the user's level of intimacy as low, and by S-GPT 𝑐𝑎 when it is high. • Rule 𝑔𝑜𝑙𝑑 switches between S-GPT 𝑝𝑜 and S-GPT 𝑐𝑎 based on the ground-truth label of the user's intimacy. • I-S-GPT 𝑎𝑢𝑡𝑜 is Intimacy-aware STYLEDGPT, our proposed method.</p><p>• I-S-GPT 𝑔𝑜𝑙𝑑 is our proposed method, in which the ground-truth intimacy label is used instead of an estimate based on the intimacy estimation model.</p><p>If the performance of the intimacy estimation model is inadequate, misclassification of the level of intimacy may prevent the learning of the stylized dialog model. To verify the effectiveness of our approach to control the style of response in terms of intimacy, I-S-GPT 𝑔𝑜𝑙𝑑 was also evaluated. It can be regarded as an ideal system that always correctly estimate the user's intimacy. In this method, in Eq. ( <ref type="formula" target="#formula_4">4</ref>) and ( <ref type="formula" target="#formula_6">5</ref>), the probability of the level of intimacy was approximated by the five-point intimacy score (𝐼𝑆) in the JID corpus as 𝑝(𝐼=low|𝑋 𝑢 ) ≃ 1 − 𝐼𝑆 5 and 𝑝(𝐼=high|𝑋 𝑢 ) ≃ 𝐼𝑆 5 . The additional input described in subsection 3.3 was also given by the ground-truth intimacy score, that is, &lt;l&gt; is added when 𝐼𝑆 is between 1-2, while &lt;h&gt; is added when 𝐼𝑆 is between 3-5.</p><p>A method using a Large Language Model (LLM) for style-controlled generation can be considered as a baseline. However, when a prompt is provided to ChatGPT to guess the user's level of intimacy and respond in an appropriate style, the generated responses are almost always polite. Therefore, prompting-based LLM is not included in this experiment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Implementation Details</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Style Language Model and Discrimination Model</head><p>The style language models 𝑃 𝑝𝑜 (𝑇 ) and 𝑃 𝑐𝑎 (𝑇 ) were obtained by fine-tuning GPT-2. The architecture of the style language model consists of an embedding layer, a transformer module, and a decoding layer of GPT-2. The pre-trained model was japanese-gpt2-medium <ref type="foot" target="#foot_0">1</ref> , which had been trained on a large-scale Japanese dialog dataset. The learning rate was set to 5e −4 , the batch size to 4, and the epoch to 20. The Adam optimizer <ref type="bibr" target="#b24">[25]</ref> was used to fine-tune the model.</p><p>The style discrimination model 𝑃 ′ (𝑆|𝑇 ) was also obtained by fine-tuning the GPT-2 model. The architecture of the style discrimination model consists of an embedding layer, a transformer module, and a classification layer of GPT-2. The same pre-trained model used to train the style language model was fine-tuned using the Adam optimizer with the same hyperparameters. The style discrimination model was evaluated using the 100 utterances not used for training. Its accuracy was 64%.</p><p>Intimacy Estimation Model Bidirectional Encoder Representations from Transformers (BERT) <ref type="bibr" target="#b25">[26]</ref> was used to train the intimacy estimation model. The BERT base Japanese<ref type="foot" target="#foot_1">2</ref> , which had been trained on Japanese Wikipedia and Japanese CC-100, was used as a pre-trained model. This BERT model was fine-tuned using the JID corpus. As for the hyperparameters, the learning rate was set to 5e −6 , the batch size to 1, and the epoch to 10. The Adam optimizer was used to fine-tune the model. The accuracy of the intimacy estimation model on the test set was 69%.</p><p>The low accuracy indicates that intimacy estimation is a difficult task. Our error analysis shows that there are few indicative words that are highly related to the speaker's intimacy. For example, in the sentiment analysis task, "pleasant" and "happy" are indicative words for positive emotions, and "sad" and "unhappy" are ones for negative emotions. However, such indicative words are rare in the intimacy estimation task. Another possible reason for the poor performance is the lack of training data. One of the possible directions is to apply semi-supervised learning to compensate for small amounts of labeled data with large amounts of unlabeled data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dialog Model</head><p>The dialog model described in subsection 4.2 was obtained by fine-tuning GPT-2. The same pre-trained model 1 used for training the style language models was utilized for fine-tuning the dialog model. As for the hyperparameters, the learning rate was set to 1e −18 , the batch size to 1, and the epoch to 10. The Adam optimizer was used to fine-tune the model.</p><p>The parameters 𝛽 𝑤 , 𝛽 𝑠 , and 𝛽 𝑁 𝐿𝐿 in Eq. ( <ref type="formula" target="#formula_7">6</ref>) were set to 0.45, 0.45, and 0.1, respectively. These values were optimized on the validation data according to the StyCor criterion, which will be described in §4.4. As for the sampling-and-rank decoding strategy, the hyperparameters were set to the same values as those used in STYLEDGPT <ref type="bibr" target="#b18">[19]</ref>, specifically 𝑘 to 40, 𝑁 to 50, and 𝜔 in Eq. (9) to 0.5.</p><p>The length of a dialog context 𝑋 = {𝑆 1 , 𝑈 1 , • • • , 𝑆 𝑛 , 𝑈 𝑛 } was set to 8, i.e., the parameter 𝑛 was set to 4. In the preliminary experiment to evaluate the intimacy estimation model, the accuracy of the model was measured for different values of 𝑛. The highest accuracy was obtained when 𝑛 = 4. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Evaluation Criteria</head><p>Both automatic and human evaluations were carried out to access responses generated by various methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Automatic Evaluation</head><p>In automatic evaluation, the quality of the generated responses was evaluated from three perspectives: relevance, diversity, and style. The relevance was measured by BLEU <ref type="bibr" target="#b26">[27]</ref> and ROUGE <ref type="bibr" target="#b27">[28]</ref>. Specifically, the similarity between a generated response and a ground-truth response was evaluated using BLEU-1, BLEU-2, ROUGE-1, ROUGE-2, and ROUGE-L. The diversity was measured by Distinct-1 (Dist-1) and Distinct-2 (Dist-2), following the experiment by Li et al. <ref type="bibr" target="#b17">[18]</ref>. The style was evaluated using "Style Correlation" (StyCor). The StyCor metric is defined as the correlation between the probability of the casual style 𝑝(𝑆=casual|𝑌 ) and the ground-truth level of the intimacy <ref type="foot" target="#foot_2">3</ref> . This correlation is high when both the predicted probability of the casual style and the intimacy level are high, or both are low (i.e., the probability of the polite style is high and the intimacy is low). It evaluates the extent to which the dialog model can control the style so that it generates a response in the casual (or polite) style when the user's level of intimacy is high (or low).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Human Evaluation</head><p>The quality of the generated responses was evaluated by human subjects. A hundred instances of responses were randomly chosen from the test set in the JID corpus. For each instance, responses were generated using the methods described in subsection 4.2 against the dialog context 𝑋. The responses were then evaluated by the subjects according to the following three criteria:</p><p>• Style Control: Does the response align with the appropriate style for the relationship between the two speakers? Annotators are also instructed to read the dialog context and guess the relationship between the speakers. • Relevance: Is the content of the response relevant and consistent with the context? • Fluency: Is the response natural, fluent, and free of grammatical errors?</p><p>The quality of responses was evaluated by assigning a score of 3 (appropriate), 2 (neutral), or 1 (inappropriate) for each of the three perspectives. Ten native Japanese speakers participated in the human evaluation. The inter-annotator agreement was measured using Fleiss's kappa <ref type="bibr" target="#b28">[29]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Results of Automatic Evaluation</head><p>Table <ref type="table" target="#tab_1">1</ref> shows the results of the automatic evaluation. The bold indicates the best system for each criterion. The StyCor of our proposed method using ground-truth intimacy labels, I-S-GPT 𝑔𝑜𝑙𝑑 , was 0.366. It significantly outperformed the other baseline methods. Especially, the StyCor of I-S-GPT 𝑔𝑜𝑙𝑑 was much better than that of the rule-based method, Rule 𝑔𝑜𝑙𝑑 , which naively altered the polite and casual style generation by heuristics. These results indicated that our proposed method was superior at generating stylized responses based on the user's level of intimacy. When the user's intimacy was estimated, however, the StyCor of I-S-GPT 𝑎𝑢𝑡𝑜 was 0.103, which was better than STYLEDGPT but worse than DialoGPT. The poor StyCor of I-S-GPT 𝑎𝑢𝑡𝑜 may be due to the low accuracy (69%) of the intimacy estimation model. It was also supported by the large difference between I-S-GPT 𝑎𝑢𝑡𝑜 and I-S-GPT 𝑔𝑜𝑙𝑑 . Our proposed method is highly dependent on the performance of the intimacy estimation model. As for the relevance, S-GPT 𝑐𝑎 achieved the best BLEU, while DialogGPT achieved the best ROUGE. Our methods I-S-GPT 𝑎𝑢𝑡𝑜 and I-S-GPT 𝑔𝑜𝑙𝑑 were slightly worse for BLEU and obviously worse for ROUGE than the best system, but comparable to other baselines. As for the diversity, no significant difference of Dist-1 and Dist-2 was observed between the methods. From these results, it was found that the outstanding ability of the pre-trained dialog model (DialoGPT) to produce relevant and diverse responses was not remarkably damaged by incorporating the techniques of style control. Besides, no significant difference was found in relevance and diversity between I-S-GPT 𝑎𝑢𝑡𝑜 and I-S-GPT 𝑔𝑜𝑙𝑑 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Results of Human Evaluation</head><p>The automatic evaluation revealed that the StyCor scores of the methods that automatically estimated the level of intimacy (I-S-GPT 𝑎𝑢𝑡𝑜 and Rule 𝑎𝑢𝑡𝑜 ) were insufficiently high. These two methods were excluded from the human evaluation process to reduce the burden on the annotators.</p><p>Table <ref type="table" target="#tab_2">2</ref> shows the results of the human evaluation. The "Score" column indicates the average of scores assigned by the ten annotators. The "𝜅" column represents Fleiss's 𝜅, which indicates the agreement of scores between annotators. We also used Welch's test to verify whether there was a significant difference in scores between I-S-GPT 𝑔𝑜𝑙𝑑 and other methods. The "𝑝" column shows the 𝑝-value associated with this statistical test.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Style Control</head><p>The proposed method, I-S-GPT 𝑔𝑜𝑙𝑑 , achieved the highest score for style control. The 𝑝-values indicated that I-S-GPT 𝑔𝑜𝑙𝑑 was significantly better than the other methods, except for Rule 𝑔𝑜𝑙𝑑 . These results demonstrated that our proposed method was capable of generating responses in a more appropriate style. Rule 𝑔𝑜𝑙𝑑 was the second-best method, and both I-S-GPT 𝑔𝑜𝑙𝑑 and Rule 𝑔𝑜𝑙𝑑 were designed to control the style according to the level of intimacy. This confirms the validity of our approach to consider the user's level of intimacy to use polite and casual styles appropriately. However, the 𝜅 for style control was 0.13, indicating that the inter-annotator agreement was relatively low.</p><p>Relevance Although I-S-GPT 𝑔𝑜𝑙𝑑 was worse than the other methods in the automatic evaluation of relevance (as shown in Table <ref type="table" target="#tab_2">2</ref>), it achieved the highest score for relevance in the human evaluation. Nevertheless, no significant difference was observed. At least, the ability of the proposed method to generate responses relevant to the dialog context was comparable to that of the other baselines.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fluency</head><p>As with the relevance score, the average score for fluency was the highest for the proposed method. However, a significant difference was only found between DialoGPT and I-S-GPT 𝑔𝑜𝑙𝑑 . The 𝜅 for fluency was higher than that for style control and relevance, indicating that the annotators exhibited greater consistency in evaluating the fluency of the responses. Computational Time Table <ref type="table" target="#tab_3">3</ref> shows a comparison of the average time required for response generation per utterance across all test samples. A server with NVIDIA RTX A6000 48GB is used for the time measurements. DialoGPT exhibited the shortest generation time, followed by S-GPT, I-S-GPT, and Rule. S-GPT takes more time than DialoGPT due to the additional sampling and ranking strategy. In addition, I-S-GPT and Rule are slower than S-GPT because they require additional processing for the intimacy estimation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>This paper proposed a novel method of controlling the speech style of a dialog system according to the user's level of intimacy with the dialog system. Based on the PLM, which was a good dialog model, two loss functions were proposed to fine-tune it to generate responses in an appropriate style. In addition, the special token indicating the user's level of intimacy was added to the input of the dialog model. The results of automatic and human evaluations demonstrated that our proposed method outperformed the baseline for style control, indicating that the method could generate responses in a polite style when intimacy was low and a casual style when intimacy was high.</p><p>In the experiments, the accuracy of the intimacy estimation model was low, which caused a considerable decrease in the performance of the dialog model that used this intimacy estimation model. In the future, by improving the intimacy estimation model, we will enhance the style control ability of the dialog system under conditions where the ground-truth intimacy labels are not used.</p><p>It is our position that the study will not give rise to any significant ethical concerns. Our approach only controls speech styles according to the internal state of a user, and it does not introduce or exacerbate any ethical or social bias in a dialog system.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Overview of proposed method</figDesc><graphic coords="3,105.84,65.61,383.60,128.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>Sentence-Level Loss First, we train a style discrimination model 𝑃 ′ (𝑆|𝑇 ) that identifies a style 𝑆 of a sentence 𝑇 , where 𝑆 is either polite or casual. The style discrimination model is trained in advance on training data where utterances in 𝐷 𝑝𝑜 𝑠𝑡𝑦𝑙𝑒 are samples of the polite class and those in 𝐷 𝑐𝑎 𝑠𝑡𝑦𝑙𝑒 are samples of the casual class. Let 𝑌 ^be the output of the dialog model 𝑃 (𝑌 |𝑋) for a given context 𝑋. Then, the style of 𝑌 îs identified by the style discrimination model and 𝑝(𝑆=polite|𝑌 ^) and 𝑝(𝑆=casual|𝑌 ^) are obtained.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>Results of Automatic Evaluation</figDesc><table><row><cell>Methods</cell><cell cols="4">Relevance BLEU-1 BLEU-2 ROUGE-1 ROUGE-2 ROUGE-L Dist-1 Dist-2 StyCor Diversity Style</cell></row><row><cell cols="3">DialoGPT 0.0798 0.0110 0.445</cell><cell cols="2">0.0617 0.0400 0.674 0.915 0.115</cell></row><row><cell>S-GPT 𝑝𝑜</cell><cell>0.0927 0.0118</cell><cell>0.393</cell><cell>0.0439</cell><cell>0.0244 0.648 0.897 0.0700</cell></row><row><cell>S-GPT 𝑐𝑎</cell><cell cols="2">0.0933 0.0128 0.392</cell><cell>0.0556</cell><cell>0.0274 0.643 0.894 0.0602</cell></row><row><cell>Rule 𝑎𝑢𝑡𝑜</cell><cell>0.0727 0.0082</cell><cell>0.428</cell><cell>0.0501</cell><cell>0.0195 0.666 0.910 0.109</cell></row><row><cell>Rule 𝑔𝑜𝑙𝑑</cell><cell>0.0739 0.0078</cell><cell>0.432</cell><cell>0.0477</cell><cell>0.0327 0.669 0.912 0.161</cell></row><row><cell cols="2">I-S-GPT 𝑎𝑢𝑡𝑜 0.0894 0.0103</cell><cell>0.372</cell><cell>0.0506</cell><cell>0.0230 0.660 0.900 0.103</cell></row><row><cell cols="2">I-S-GPT 𝑔𝑜𝑙𝑑 0.0715 0.0078</cell><cell>0.414</cell><cell>0.0455</cell><cell>0.0271 0.666 0.902 0.366</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Results of Human Evaluation. * means 𝑝 &lt; 0.05. ** means 𝑝 &lt; 0.01.</figDesc><table><row><cell>Model</cell><cell cols="2">Style Control</cell><cell>Relevance</cell><cell></cell><cell>Fluency</cell></row><row><cell></cell><cell>Score 𝜅</cell><cell>𝑝</cell><cell>Score 𝜅</cell><cell cols="2">𝑝 Score 𝜅</cell><cell>𝑝</cell></row><row><cell>DialoGPT</cell><cell cols="5">1.98 0.26 5e −5 ** 1.51 0.22 0.15 2.16 0.39 0.03*</cell></row><row><cell>S-GPT 𝑝𝑜</cell><cell cols="5">2.08 0.18 3e −4 ** 1.50 0.23 0.12 2.32 0.39 0.64</cell></row><row><cell>S-GPT 𝑐𝑎</cell><cell cols="5">2.05 0.19 1e −3 ** 1.51 0.26 0.14 2.27 0.34 0.32</cell></row><row><cell>Rule 𝑔𝑜𝑙𝑑</cell><cell cols="2">2.22 0.11 0.27</cell><cell cols="3">1.52 0.26 0.20 2.23 0.37 0.14</cell></row><row><cell cols="2">I-S-GPT 𝑔𝑜𝑙𝑑 2.29 0.13</cell><cell>-</cell><cell cols="2">1.62 0.28 -</cell><cell>2.36 0.35</cell><cell>-</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3</head><label>3</label><figDesc>Average Time of Response Generation Per Utterance (seconds)DialoGPT S-GPT 𝑝𝑜 S-GPT 𝑐𝑎 Rule 𝑎𝑢𝑡𝑜 I-S-GPT 𝑎𝑢𝑡𝑜</figDesc><table><row><cell>3.661</cell><cell>4.162</cell><cell>4.465</cell><cell>5.284</cell><cell>5.111</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://huggingface.co/rinna/japanese-gpt2-medium</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://huggingface.co/tohoku-nlp/bert-base-japanese-v2</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">The five-scale score is normalized to values between 0 and 1.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Alexa prize -state of the art in conversational AI</title>
		<author>
			<persName><forename type="first">C</forename><surname>Khatri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Venkatesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hedayatnia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gabriel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ram</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Prasad</surname></persName>
		</author>
		<idno type="DOI">10.1609/aimag.v39i3.2810</idno>
		<ptr target="https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/2810.doi:10.1609/aimag.v39i3.2810" />
	</analytic>
	<monogr>
		<title level="j">AI Magazine</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="page" from="40" to="55" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Dialogue System Live Competition: Identifying Problems with Dialogue Systems Through Live Event</title>
		<author>
			<persName><forename type="first">R</forename><surname>Higashinaka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Funakoshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Inaba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tsunomori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Takahashi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Akama</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2021">2021</date>
			<publisher>Springer</publisher>
			<biblScope unit="page" from="185" to="199" />
			<pubPlace>Singapore</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The second conversational intelligence challenge (convai2)</title>
		<author>
			<persName><forename type="first">E</forename><surname>Dinan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Logacheva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Malykh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Shuster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Urbanek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kiela</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Szlam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Serban</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Lowe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The NeurIPS&apos;18 Competition: From Machine Learning to Intelligent Conversations</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="187" to="208" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Ram</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Prasad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Khatri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Venkatesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gabriel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Nunn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hedayatnia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nagar</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1801.03604</idno>
		<title level="m">Conversational AI: The science behind the alexa prize</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">An introduction to sociolinguistics</title>
		<author>
			<persName><forename type="first">R</forename><surname>Wardhaugh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Fuller</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2021">2021</date>
			<publisher>John Wiley &amp; Sons</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Generating natural language under pragmatic constraints</title>
		<author>
			<persName><forename type="first">E</forename><surname>Hovy</surname></persName>
		</author>
		<idno type="DOI">10.1016/0378-2166(87)90109-3</idno>
		<ptr target="https://doi.org/10.1016/0378-2166(87)90109-3" />
	</analytic>
	<monogr>
		<title level="j">Journal of Pragmatics</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="689" to="719" />
			<date type="published" when="1987">1987</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Indexical order and the dialectics of social life</title>
		<author>
			<persName><forename type="first">M</forename><surname>Silverstein</surname></persName>
		</author>
		<idno type="DOI">10.1016/S0271-5309(03)00013-2</idno>
	</analytic>
	<monogr>
		<title level="j">Language &amp; Communication</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="193" to="229" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Understanding Through Politeness -Translations of Japanese Honorific Speech to Finnish and English</title>
		<author>
			<persName><forename type="first">N</forename><surname>Aapakallio</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
		<respStmt>
			<orgName>University of Eastern Finland</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Construction and validation of a Japanese honorific corpus based on systemic functional linguistics</title>
		<author>
			<persName><forename type="first">M</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Kobayashi</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2022.dclrl-1.3" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference, European Language Resources Association</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Sälevä</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Lignos</surname></persName>
		</editor>
		<meeting>the Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference, European Language Resources Association<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="19" to="26" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Improving user impression in spoken dialog system with gradual speech form control</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Kageyama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chiba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Nose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ito</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/W18-5026</idno>
		<ptr target="https://aclanthology.org/W18-5026.doi:10.18653/v1/W18-5026" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Association for Computational Linguistics</title>
				<meeting>the 19th Annual SIGdial Meeting on Discourse and Dialogue, Association for Computational Linguistics<address><addrLine>Melbourne, Australia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="235" to="240" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Polite dialogue generation without parallel data</title>
		<author>
			<persName><forename type="first">T</forename><surname>Niu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bansal</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/Q18-1027" />
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="373" to="389" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Structuring latent spaces for stylized response generation</title>
		<author>
			<persName><forename type="first">X</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Galley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Brockett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Dolan</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D19-1190</idno>
		<ptr target="https://aclanthology.org/D19-1190.doi:10.18653/v1/D19-1190" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Inui</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Jiang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Ng</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Wan</surname></persName>
		</editor>
		<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1814" to="1823" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Neural stylistic response generation with disentangled latent variables</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">Y</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing</title>
				<meeting>the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing<address><addrLine>Bangkok, Thailand</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4391" to="4401" />
		</imprint>
	</monogr>
	<note>: Long Papers), Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Stylized dialogue response generation using stylized unpaired texts</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Mao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Huang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<publisher>AAAI Press</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="14558" to="14567" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Style control for schema-guided natural language generation</title>
		<author>
			<persName><forename type="first">A</forename><surname>Tsai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Oraby</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Perera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-Y</forename><surname>Kao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Narayan-Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Chung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hakkani-Tur</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.nlp4convai-1.21</idno>
		<ptr target="https://aclanthology.org/2021.nlp4convai-1.21.doi:10.18653/v1/2021.nlp4convai-1.21" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Papangelis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Budzianowski</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Liu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Nouri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Rastogi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y.-N</forename><surname>Chen</surname></persName>
		</editor>
		<meeting>the 3rd Workshop on Natural Language Processing for Conversational AI, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="228" to="242" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Stylistic response generation by controlling personality traits and intent</title>
		<author>
			<persName><forename type="first">S</forename><surname>Saha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Srihari</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.nlp4convai-1.16</idno>
		<ptr target="https://aclanthology.org/2022.nlp4convai-1.16.doi:10.18653/v1/2022.nlp4convai-1.16" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4th Workshop on NLP for Conversational AI, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">B</forename><surname>Liu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Papangelis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Ultes</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Rastogi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y.-N</forename><surname>Chen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Spithourakis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Nouri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">W</forename><surname>Shi</surname></persName>
		</editor>
		<meeting>the 4th Workshop on NLP for Conversational AI, Association for Computational Linguistics<address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="197" to="211" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Stylized knowledgegrounded dialogue generation via disentangled template rewriting</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Miao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Geng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Jiang</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.naacl-main.241</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</title>
				<meeting>the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics<address><addrLine>Seattle, Washington, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="3304" to="3318" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Stylized dialogue generation with feature-guided knowledge augmentation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Yan</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.findings-emnlp.475</idno>
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics</title>
				<meeting><address><addrLine>Sentosa Gateway, Singapore</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="7144" to="7157" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">StyleDGPT: Stylized response generation with pre-trained language models</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.findings-emnlp.140</idno>
		<ptr target="https://aclanthology.org/2020.findings-emnlp.140.doi:10.18653/v1/2020.findings-emnlp.140" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">T</forename><surname>Cohn</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1548" to="1559" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">The good, the bad and the neutral: affective profile in dialog system-user communication</title>
		<author>
			<persName><forename type="first">M</forename><surname>Skowron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Theunis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sienkiewicz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction -Volume Part I, ACII&apos;11</title>
				<meeting>the 4th International Conference on Affective Computing and Intelligent Interaction -Volume Part I, ACII&apos;11<address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="337" to="346" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Autotutor and affective autotutor: Learning by talking with cognitively and emotionally intelligent computers that talk back</title>
		<author>
			<persName><forename type="first">S</forename><surname>D'mello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Graesser</surname></persName>
		</author>
		<idno type="DOI">10.1145/2395123.2395128</idno>
		<idno>doi:10.1145/2395123.2395128</idno>
		<ptr target="https://doi.org/10.1145/2395123.2395128" />
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Interact. Intell. Syst</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">DIALOGPT : Large-scale generative pre-training for conversational response generation</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Galley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-C</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Brockett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Dolan</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-demos.30</idno>
		<ptr target="https://aclanthology.org/2020.acl-demos.30.doi:10.18653/v1/2020.acl-demos" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Celikyilmaz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T.-H</forename><surname>Wen</surname></persName>
		</editor>
		<meeting>the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Language models are unsupervised multitask learners</title>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Child</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Luan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amodei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">OpenAI blog</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">9</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">BTSJ-Japanese Natural Conversation Corpus with Transcripts and Recordings</title>
		<editor>M. Usami</editor>
		<imprint>
			<date type="published" when="2021-03">March 2021. 2021</date>
			<pubPlace>Japan</pubPlace>
		</imprint>
		<respStmt>
			<orgName>National Institute for Japanese Language and Linguistics</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
		<idno>CoRR abs/1412.6980</idno>
		<ptr target="https://api.semanticscholar.org/CorpusID:6628106" />
		<title level="m">Adam: A method for stochastic optimization</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
		<ptr target="https://aclanthology.org/N19-1423.doi:10.18653/v1/N19-1423" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">BLEU: a method for automatic evaluation of machine translation</title>
		<author>
			<persName><forename type="first">K</forename><surname>Papineni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Roukos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ward</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-J</forename><surname>Zhu</surname></persName>
		</author>
		<idno type="DOI">10.3115/1073083.1073135</idno>
		<idno>doi:10.3115/1073083.1073135</idno>
		<ptr target="https://doi.org/10.3115/1073083.1073135" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL &apos;02, Association for Computational Linguistics</title>
				<meeting>the 40th Annual Meeting on Association for Computational Linguistics, ACL &apos;02, Association for Computational Linguistics<address><addrLine>USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="311" to="318" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">ROUGE: A package for automatic evaluation of summaries</title>
		<author>
			<persName><forename type="first">C.-Y</forename><surname>Lin</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/W04-1013" />
	</analytic>
	<monogr>
		<title level="m">Text Summarization Branches Out, Association for Computational Linguistics</title>
				<meeting><address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="74" to="81" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Fleiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jacob</surname></persName>
		</author>
		<idno type="DOI">10.1177/001316447303300309</idno>
		<ptr target="https://cir.nii.ac.jp/crid/1360855569674739072.doi:10.1177/001316447303300309" />
	</analytic>
	<monogr>
		<title level="j">Educational and Psychological Measurement</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="613" to="619" />
			<date type="published" when="1973">1973</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
