<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Improving Persona Consistency of Dialogue Generation by Constructing Negative Word Set</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Zhenfeng</forename><surname>Han</surname></persName>
							<email>zhenfenghan@tju.edu.cn</email>
							<affiliation key="aff0">
								<orgName type="department">College of Intelligence and Computing</orgName>
								<orgName type="institution">Tianjin University</orgName>
								<address>
									<postCode>300350</postCode>
									<settlement>Tianjin</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sai</forename><surname>Zhang</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">College of Intelligence and Computing</orgName>
								<orgName type="institution">Tianjin University</orgName>
								<address>
									<postCode>300350</postCode>
									<settlement>Tianjin</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Xiaowang</forename><surname>Zhang</surname></persName>
							<email>xiaowangzhang@tju.edu.cn</email>
							<affiliation key="aff0">
								<orgName type="department">College of Intelligence and Computing</orgName>
								<orgName type="institution">Tianjin University</orgName>
								<address>
									<postCode>300350</postCode>
									<settlement>Tianjin</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Improving Persona Consistency of Dialogue Generation by Constructing Negative Word Set</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">FFB31E4D8405C15496EE818635CCC0DD</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:14+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Consistent persona</term>
					<term>Unlikelihood</term>
					<term>ConceptNet</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Maintaining consistent personas is essential for dialogue models. However, dialogue models can generate fluent but inconsistent responses with persona. We observed that some inconsistent responses often contain similar but inconsistent words. In this poster, we propose a method that uses unlikelihood loss to separate semantics of similar but inconsistent words. To get such words, we leverage Word2Vec to construct the negative word set. And ConceptNet is used to remove consistent noise words from negative word set and add antonyms. Experiments demonstrate that our method improves the persona consistency of dialogue generation.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>With the success of existing dialogue models on generating human-like responses, dialogue models are required to express their own personality. Zhang et al. <ref type="bibr" target="#b0">[1]</ref> introduce a personaconditioned dialogue dataset PersonaChat to build persona consistent dialogue models. However, the best performing generative models trained on PersonaChat such as GPT2 <ref type="bibr" target="#b1">[2]</ref> still generate fluent but inconsistent responses. The reason is that dialogue models are trained with standard maximum likelihood loss, which lacks the constraint of persona consistency.</p><p>Unlikelihood training is a technique developed for removal of repetition in language model completions. Li et al. <ref type="bibr" target="#b2">[3]</ref> use unlikelihood to solve the persona consistent issue of dialogue models. However, they just consider the whole sentence and ignore the keywords. We observed that most inconsistent responses are caused by similar but inconsistent words. As shown in fig. <ref type="figure">1</ref>, the generated response of GPT2 model is inconsistent with persona due to the word "20". The word "20" is similar to "26" but is inconsistent considering the fourth persona description.</p><p>In this poster, we construct negative word set to separate semantics of similar but inconsistent words. Firstly, we obtain coarse negative word set by Word2Vec. Secondly, we use ConceptNet <ref type="bibr" target="#b3">[4]</ref> to remove synonyms and add antonyms. Thirdly, we use the unlikelihood loss to compute the loss of negative word set, which assigns a low probability to inconsistent words. The experiments show that our method can generate more consistent responses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Persona</head><p>Query: how old are you?</p><p>Resonse: i am 26 years old. 3. i have three dogs. 2. i am a man.</p><p>4. i am 26 years old.</p><p>1. i am a doctor. inconsistent GPT2: i am 20 years old.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 1:</head><p>The response generated by GPT2 is inconsistent with the persona.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Approach</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Problem Definition</head><p>Our task is to train a generative model to generate a persona consistent response. Formally, given a set of persona texts 𝑃 = {𝑃 1 , 𝑃 2 , … , 𝑃 𝑚 }, and an query 𝑄, to generate a response 𝑅 which should consistent with persona. Here 𝑃 𝑖 , 𝑄 and 𝑅 are sentences, which are consist of some words, such as 𝑅 = {𝑟 1 , 𝑟 2 , … , 𝑟 𝑛 }, where 𝑛 is the length of 𝑅.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Training Loss</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1.">Likelihood Loss</head><p>Likelihood training is commonly used in text generation models. Some pre-trained generative models trained with likelihood can generate fluent and meaningful responses. For a sample {𝑃, 𝑄, 𝑅}, the likelihood uses maximum likelihood estimation (MLE) to compute loss:</p><formula xml:id="formula_0">𝐿 𝑀𝐿𝐸 = − log (𝑝 𝜃 (𝑅 | 𝑃, 𝑄)) = − |𝑅| ∑ 𝑖=1 log (𝑝 𝜃 (𝑟 𝑖 | 𝑃, 𝑄, 𝑅 &lt;𝑖 ))<label>(1)</label></formula><p>where 𝑟 𝑖 is current word needed to be predicted, 𝑅 &lt;𝑖 are previous words before 𝑟 𝑖 and 𝑝 𝜃 (𝑟 𝑖 | 𝑃, 𝑄, 𝑅 &lt;𝑖 ) represents the probability of 𝑟 𝑖 predicted by model conditional on 𝑃, 𝑄, 𝑅 &lt;𝑖 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.2.">Unlikelihood Loss</head><p>Likelihood training increases the probability of true word and decreases the probability of all other words. On the contrary, unlikelihood training decreases the probability of negative words. The unlikelihood (UL) loss can be defined as:</p><formula xml:id="formula_1">𝐿 𝑈 𝐿 = − log (1 − 𝑝 𝜃 (𝑅 | 𝑃, 𝑄)) = − |𝑅| ∑ 𝑖=1 ∑ 𝑐∈𝐶 𝑖 log (1 − 𝑝 𝜃 (𝑐 | 𝑃, 𝑄, 𝑅 &lt;𝑖 ))<label>(2)</label></formula><p>where 𝐶 𝑖 is the negative word set of current word 𝑟 𝑖 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Constructing Negative Word Set</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.1.">Negative Word Set</head><p>MLE leverages the previous context of the word to predict the current word, which results in words with similar contexts having similar semantics. Therefore, the model might generate similar but inconsistent words. One solution is separating the semantics of similar words, which can be done by UL training. The core of UL training is to construct negative word set. Unlike Welleck et al. <ref type="bibr" target="#b4">[5]</ref>, our negative word set contains inconsistent words with current word conditional on persona. As shown in fig. <ref type="figure">1</ref>, the generated word "20" is inconsistent with "26" in persona. The negative word set of "26" maybe contains "20", "25" and "30" et al.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.2.">Word2Vec</head><p>The challenge is how to construct negative word set used for UL loss. Word2Vec, a method of learning word embedding, follow the distributional Hypothesis: words that occur in the same contexts tend to have similar meanings. Word2Vec leverages the previous and following context to predict current word, which is similar to MLE. We learning word embedding of all words on dataset by Word2Vec. we approximatively regard similar words computed by cosine similarity of word embedding as negative word set. For example, the negative word set of "man" computed by Word2Vec contains "male", "boy", and "girl" et al.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.3.">ConceptNet</head><p>We also observed that there are some noise words in the negative word set constructed by Word2Vec. The synonym, hyponym, and hypernym have similar context but are consistent with the word, so we should remove them from the coarse negative word set. Fortunately, ConceptNet <ref type="bibr" target="#b3">[4]</ref>, a knowledge graph containing common sense knowledge, provides these three relations of one word. For example, "male" is a synonym of "man" and "dog" is a hyponym for "pet". Besides, ConceptNet also provides antonyms that can be added to negative word set. For example, "man" and "woman" is a pair of antonym, and they have similar context but opposite semantics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Training Model</head><p>We use GPT2 <ref type="bibr" target="#b1">[2]</ref> as our basic model because it shows strong performance in dialogue generation.</p><p>During the training phase, we combine the UL loss with the MLE loss as follows:</p><formula xml:id="formula_2">𝐿 = 𝐿 𝑀𝐿𝐸 + 𝐿 𝑈 𝐿<label>(3)</label></formula><p>where 𝐿 𝑀𝐿𝐸 aims to promote true words, training model to assign the highest probabilities to such words. On the other hand, 𝐿 𝑈 𝐿 focuses on negative words, so that the model can learn to rank negative words lower than true words effectively. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Experiments</head><p>We verify our method on PersonaChat <ref type="bibr" target="#b0">[1]</ref>. For automatic evaluation, we employ a classification model to evaluate the persona-consistency of the generated responses. The results contain three categories: consistent (Consi.), contradictory (Contr.), and neutral. We use perplexity (PPL) to measure the fluency of responses. For human evaluation, we randomly select 100 samples per method and ask three professional annotators to evaluate the quality of these samples. Annotators also label generated responses as consistent (Consi.), contradictory (Contr.), and neutral with persona. The fluency (Flue.) of responses is rated on a 3-scale, with higher scores indicating better fluency.</p><p>Table <ref type="table" target="#tab_0">1</ref> shows that the model trained by UL loss achieves better results on all metrics than the base model. Higher consistent ratio and lower contradictory ratio indicate our method can separate semantics of similar but inconsistent words. The consistency of responses is further improved after using ConceptNet, which means the knowledge such as synonyms provided by ConceptNet is useful to construct higher-quality negative word set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>In this poster, we propose a method to construct negative word set for unlikelihood training to separate semantics of similar but inconsistent words. Experiments demonstrate that our method can improve persona consistency of dialogue generation. In future work, we are interested in leveraging more efficient loss and constructing more appropriate data to improve the persona consistency of dialogue generation.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Results of automatic(on the left) and human evaluations(on the right).</figDesc><table><row><cell>Method</cell><cell>Consi.↑ Contr.↓ PPL↓ Consi.↑ Contr.↓ Fluc.↑</cell></row><row><cell>MLE baseline</cell><cell>61.8% 14.2% 13.9 51.0% 22.7% 2.65</cell></row><row><cell>+UL</cell><cell>63.2% 12.3% 13.7 52.3% 21.0% 2.71</cell></row><row><cell cols="2">+UL+ConceptNet 63.9% 11.2% 13.7 52.6% 19.7% 2.72</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Personalizing dialogue agents: I have a dog, do you have pets too?</title>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dinan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Urbanek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Szlam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kiela</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics</title>
		<title level="s">Association for Computational Linguistics</title>
		<meeting>the 56th Annual Meeting of the Association for Computational Linguistics<address><addrLine>Melbourne, Australia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="2204" to="2213" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Language models are unsupervised multitask learners</title>
		<author>
			<persName><forename type="first">S</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Child</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<publisher>OpenAI</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Don&apos;t say that! making inconsistent dialogue unlikely with unlikelihood training</title>
		<author>
			<persName><forename type="first">M</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Roller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Kulikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Welleck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Boureau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, Online, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="4715" to="4728" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Conceptnet 5.5: An open multilingual graph of general knowledge</title>
		<author>
			<persName><forename type="first">R</forename><surname>Speer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Havasi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 31st AAAI Conference on Artificial Intelligence</title>
				<meeting>the 31st AAAI Conference on Artificial Intelligence<address><addrLine>San Francisco, California, USA</addrLine></address></meeting>
		<imprint>
			<publisher>AAAI Press</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="4444" to="4451" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Neural text generation with unlikelihood training</title>
		<author>
			<persName><forename type="first">S</forename><surname>Welleck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Kulikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Roller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dinan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 8th International Conference on Learning Representations</title>
				<meeting>the 8th International Conference on Learning Representations<address><addrLine>Addis Ababa, Ethiopia, OpenReview</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
