<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving Persona Consistency of Dialogue Generation by Constructing Negative Word Set</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhenfeng Han</string-name>
          <email>zhenfenghan@tju.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sai Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaowang Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Intelligence and Computing, Tianjin University</institution>
          ,
          <addr-line>Tianjin, 300350</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Consistent persona</institution>
          ,
          <addr-line>Unlikelihood, ConceptNet</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Maintaining consistent personas is essential for dialogue models. However, dialogue models can generate lfuent but inconsistent responses with persona. We observed that some inconsistent responses often contain similar but inconsistent words. In this poster, we propose a method that uses unlikelihood loss to separate semantics of similar but inconsistent words. To get such words, we leverage Word2Vec to construct the negative word set. And ConceptNet is used to remove consistent noise words from negative word set and add antonyms. Experiments demonstrate that our method improves the persona consistency of dialogue generation.</p>
      </abstract>
      <kwd-group>
        <kwd>Negative</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the success of existing dialogue models on generating human-like responses, dialogue
models are required to express their own personality. Zhang et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] introduce a
personaconditioned dialogue dataset PersonaChat to build persona consistent dialogue models. However,
the best performing generative models trained on PersonaChat such as GPT2 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] still generate
lfuent but inconsistent responses. The reason is that dialogue models are trained with standard
maximum likelihood loss, which lacks the constraint of persona consistency.
      </p>
      <p>
        Unlikelihood training is a technique developed for removal of repetition in language model
completions. Li et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] use unlikelihood to solve the persona consistent issue of dialogue
models. However, they just consider the whole sentence and ignore the keywords. We observed
that most inconsistent responses are caused by similar but inconsistent words. As shown in
ifg. 1, the generated response of GPT2 model is inconsistent with persona due to the word “20”.
The word “20” is similar to “26” but is inconsistent considering the fourth persona description.
      </p>
      <p>In this poster, we construct negative word set to separate semantics of similar but
inconsistent words. Firstly, we obtain coarse negative word set by Word2Vec. Secondly, we use
ConceptNet [4] to remove synonyms and add antonyms. Thirdly, we use the unlikelihood loss
to compute the loss of negative word set, which assigns a low probability to inconsistent words.
The experiments show that our method can generate more consistent responses.</p>
      <sec id="sec-1-1">
        <title>1. i am a doctor.</title>
        <p>2. i am a man.</p>
      </sec>
      <sec id="sec-1-2">
        <title>3. i have three dogs.</title>
      </sec>
      <sec id="sec-1-3">
        <title>4. i am 26 years old.</title>
      </sec>
      <sec id="sec-1-4">
        <title>Query: how old are you?</title>
      </sec>
      <sec id="sec-1-5">
        <title>Resonse: i am 26 years old. GPT2: i am 20 years old. inconsistent</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Approach</title>
      <sec id="sec-2-1">
        <title>2.1. Problem Definition</title>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Training Loss</title>
        <p>2.2.1. Likelihood Loss
Our task is to train a generative model to generate a persona consistent response. Formally,
given a set of persona texts  = { 1,  2, … ,   }, and an query  , to generate a response  which
should consistent with persona. Here   ,  and  are sentences, which are consist of some
words, such as  = { 1,  2, … ,   }, where  is the length of  .</p>
        <p>Likelihood training is commonly used in text generation models. Some pre-trained generative
models trained with likelihood can generate fluent and meaningful responses. For a sample
{ , , } , the likelihood uses maximum likelihood estimation (MLE) to compute loss:
 
= − log (  ( ∣  , )</p>
        <p>||
) = − ∑ log (  (  ∣  , , 
=1
&lt; ))
(1)
where   is current word needed to be predicted,  &lt; are previous words before   and   (  ∣
 , ,  &lt; ) represents the probability of   predicted by model conditional on  , ,  &lt; .
2.2.2. Unlikelihood Loss
Likelihood training increases the probability of true word and decreases the probability of all
other words. On the contrary, unlikelihood training decreases the probability of negative words.
The unlikelihood (UL) loss can be defined as:</p>
        <p>= − log (1 −   ( ∣  , )
where   is the negative word set of current word   .</p>
        <p>||
) = − ∑ ∑ log (1 −   ( ∣  , , 
=1 ∈ 
&lt; ))
(2)</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Constructing Negative Word Set</title>
        <p>2.3.1. Negative Word Set
MLE leverages the previous context of the word to predict the current word, which results in
words with similar contexts having similar semantics. Therefore, the model might generate
similar but inconsistent words. One solution is separating the semantics of similar words,
which can be done by UL training. The core of UL training is to construct negative word set.
Unlike Welleck et al. [5], our negative word set contains inconsistent words with current word
conditional on persona. As shown in fig. 1, the generated word “20” is inconsistent with “26” in
persona. The negative word set of “26” maybe contains “20”, “25” and “30” et al.
2.3.2. Word2Vec
The challenge is how to construct negative word set used for UL loss. Word2Vec, a method
of learning word embedding, follow the distributional Hypothesis: words that occur in the
same contexts tend to have similar meanings. Word2Vec leverages the previous and following
context to predict current word, which is similar to MLE. We learning word embedding of all
words on dataset by Word2Vec. we approximatively regard similar words computed by cosine
similarity of word embedding as negative word set. For example, the negative word set of “man”
computed by Word2Vec contains “male”, “boy”, and “girl” et al.
2.3.3. ConceptNet
We also observed that there are some noise words in the negative word set constructed by
Word2Vec. The synonym, hyponym, and hypernym have similar context but are consistent
with the word, so we should remove them from the coarse negative word set. Fortunately,
ConceptNet [4], a knowledge graph containing common sense knowledge, provides these three
relations of one word. For example, “male” is a synonym of “man” and “dog” is a hyponym for
“pet”. Besides, ConceptNet also provides antonyms that can be added to negative word set. For
example, “man” and “woman” is a pair of antonym, and they have similar context but opposite
semantics.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Training Model</title>
        <p>
          We use GPT2 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] as our basic model because it shows strong performance in dialogue generation.
During the training phase, we combine the UL loss with the MLE loss as follows:
 =  
+   
(3)
where   aims to promote true words, training model to assign the highest probabilities to
such words. On the other hand,    focuses on negative words, so that the model can learn to
rank negative words lower than true words efectively.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <p>
        We verify our method on PersonaChat [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For automatic evaluation, we employ a classification
model to evaluate the persona-consistency of the generated responses. The results contain three
categories: consistent (Consi.), contradictory (Contr.), and neutral. We use perplexity (PPL)
to measure the fluency of responses. For human evaluation, we randomly select 100 samples
per method and ask three professional annotators to evaluate the quality of these samples.
Annotators also label generated responses as consistent (Consi.), contradictory (Contr.), and
neutral with persona. The fluency (Flue.) of responses is rated on a 3-scale, with higher scores
indicating better fluency.
      </p>
      <p>Table 1 shows that the model trained by UL loss achieves better results on all metrics than
the base model. Higher consistent ratio and lower contradictory ratio indicate our method can
separate semantics of similar but inconsistent words. The consistency of responses is further
improved after using ConceptNet, which means the knowledge such as synonyms provided by
ConceptNet is useful to construct higher-quality negative word set.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion References</title>
      <p>In this poster, we propose a method to construct negative word set for unlikelihood training to
separate semantics of similar but inconsistent words. Experiments demonstrate that our method
can improve persona consistency of dialogue generation. In future work, we are interested in
leveraging more eficient loss and constructing more appropriate data to improve the persona
consistency of dialogue generation.
Annual Meeting of the Association for Computational Linguistics, Online, Association for
Computational Linguistics, 2020, pp. 4715–4728.
[4] R. Speer, J. Chin, C. Havasi, Conceptnet 5.5: An open multilingual graph of general
knowledge, in: Proceedings of the 31st AAAI Conference on Artificial Intelligence, San
Francisco, California, USA, AAAI Press, 2017, pp. 4444–4451.
[5] S. Welleck, I. Kulikov, S. Roller, E. Dinan, K. Cho, J. Weston, Neural text generation with
unlikelihood training, in: Proceedings of the 8th International Conference on Learning
Representations, Addis Ababa, Ethiopia, OpenReview.net, 2020.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , E. Dinan,
          <string-name>
            <given-names>J.</given-names>
            <surname>Urbanek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Szlam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <article-title>Personalizing dialogue agents: I have a dog, do you have pets too?, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics</article-title>
          , Melbourne, Australia, Association for Computational Linguistics,
          <year>2018</year>
          , pp.
          <fpage>2204</fpage>
          -
          <lpage>2213</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <article-title>Language models are unsupervised multitask learners</article-title>
          ,
          <source>OpenAI</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roller</surname>
          </string-name>
          , I. Kulikov,
          <string-name>
            <given-names>S.</given-names>
            <surname>Welleck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Boureau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <article-title>Don't say that! making inconsistent dialogue unlikely with unlikelihood training</article-title>
          ,
          <source>in: Proceedings of the 58th</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>