<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Y. Zhu)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>The Empathic Dialogue Generation Model Based on Emotion Cause Perception</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yun Su</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bozhen Fan</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haoran Bian</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Runhe Huang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yunhao Zhu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hosei University</institution>
          ,
          <addr-line>Tokyo, 1848584</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Northwest Normal University of China</institution>
          ,
          <addr-line>Lanzhou ,730070</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Current methods for generating empathy dialogues often overlook the emotional triggers that lead to changes in emotions. To address this issue, we present a novel framework that enhances empathetic response generation by identifying emotional causes within conversations. Our framework consists of two modules: one that comprehends emotions originating from both content and context, and another that features an emotional attention mechanism for empathy expression. Experimental results demonstrate that our proposed model is capable of perceiving emotional causes and can improve the quality of empathy expression.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;emotional conversation generation</kwd>
        <kwd>emotion cause detection</kwd>
        <kwd>empathetic response generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The perception and expression of emotion is very important to the
generation of dialogue. Emotional causes are events that trigger
changes in the speaker’s emotions . Failure to analyze emotional
causes could lead to poor emotional perception [1]. To address this
issue, we propose a framework that improves the generation of
empathetic responses by endowing the empathetic dialogue model
with the ability to reason about human emotions in conversations.</p>
      <p>Our framework comprises two components: an emotion reasoner
and a response generator. The experimental results show that our
proposed model outperforms other compared methods by
considering emotional causes in generating more empathetic responses.
[, 1, 2, ..., ], Similar to [], [] is an emotion
classifier token. For each emotional word  in the input, the
encoder assigns a word embedding vector  , a position
embedding vector  , and an emotional state embedding vector  ,
which capture the emotional information associated with each
word. Then the multi-resolution emotional context is represented
as  = [˜, ˜1, ˜2, ..., ˜].</p>
      <p>To perceive the emotional information in dialogue context, a
linear layer with softmax operation projects the concatenation of 0
and 0 into an emotion category distribution  over the
coarsedgrained emotional label  to identify the emotion signal user
expressed:
(|) =  ([˜0; ˜0])</p>
      <p>(1)
The emotion cause detection is a sequence labelling problem. Each
2. Approach word in the sequence is labelled with an emotion cause-oriented
Our model architecture is illustrated in Figure 1, and it consists of label ∈ {0, 1}, indicating whether the word is related to the emotion
tepwrraeotdoimrc.taiTinnhgembfiroostdthumlteohsde: uctlhoee,nttehemxetoeotmifoontthioreenaemsroeoantseioornnaenrc,daiustshreeesarpneosdnptsohinbesleecofgorerrne-- ftcuoanutcshteeidoe.nmT:hoteinoncocmaupsuetewtihthe parloinbeaabrililtayy er coof uthpeled-twhiwthoardsoreftlamteadx
sponding emotion tag. The second module, the response generator, (|) =  ( + ) (2)
integrates the information provided by the emotion reasoner to Note that the [] token is always labeled with 1. The sequence
generate an appropriate response. of emotion cause-oriented labels will later be used to select the</p>
      <p>For the emotion reasoning, two encoders, that is, semantic and emotion cause-related words in the input sequence to attend to for
emotional encoders are employed to understand the conversation the response generator.
context from both a content and emotional perspective and locate Finally, the two encoders are combined to generate the final
dithe words related to emotional causes. alogue representation [; ]. At the same time, based on the</p>
      <p>The semantic encoder is used to process the historical dialogue semantic context vector representation,  = [0, 1, ..., ] is
obinput, which is denoted as  = [, 1, 2, ..., ], [] is a se- tained through the full connection layer, and each word in the
mantic classifier token. For each word  in the input, the semantic conversation context is assigned an emotional reason tag, where
encoder assigns a word embedding vector  , a position embedding  ∈ {0, 1} . The tag sequence of emotional cause is marked to
vector  , and a conversation state embedding vector  , which see whether each word in the conversation is the emotional reason
capture the semantic information, location in the context, and inter- word that causes the user’s emotional changes so that the model can
locutor information of each word, respectively. The obtained final better understand the user’s emotion caused by emotional reasons.
context representations are denoted as  = [˜, ˜1, ˜,..., ˜]. The emotion expression process is based on the decoder of the</p>
      <p>Similarly, the emotional encoder is used to process emotional transformer. The emotion attention mechanism is set after the
crosswords in the semantic context  , which is denoted as  = attention mechanism so that the dialogue generation model can
better focus on the emotion caused by user vector input. Then,
the decoder exports the target vector  = [1, 2, ...,  ] from the
dialogue context.</p>
      <p>At the same time, to improve the ability of emotion recognition
and semantic perception of the model, we only use the generated
confrontation network in the training process. The discriminator
part is inspired by [2]. It comprises two parts: emotion discriminator
MIME
0.30
0.32
0.29
0.32
36.9
37.0
37.2
33.4
3.47
3.63
3.58
3.77
3.88
3.6
3.91
3.69
3.68
4.28
3.67
3.73</p>
    </sec>
    <sec id="sec-2">
      <title>3. Experiments</title>
    </sec>
    <sec id="sec-3">
      <title>4. Conclusion</title>
      <p>The paper introduces a new framework that can enhance
empathetic response generation by incorporating information about the
causes of emotions. The evaluations demonstrate that the proposed
models can generate more meaningful and empathetic responses
compared to other existing approaches. By integrating emotional
reasoning into conversation models, our framework has the
potential to significantly improve the quality of human-computer
interaction, particularly in scenarios where empathetic communication
is essential.</p>
      <p>Dataset To better capture the emotional content in user utterances,
two diferent dataset are used: Empathtic Dialogues with emotional 5. Acknowledgments
causes labels[3]. And the NRC Word-Emotion Association Lexicon This work was supported by the National Natural Science
Founda(EmoLex) [4]. EmpatheticDialogues provides coarse-grained emo- tion of China (No. 61862058 and 8226070356), and in part by the
tional labels for the dialogues, while EmoLex provides fine-grained China Scholarship Council(CSC).
emotional labels for individual words. The emotion cause is
identiifed at the discourse level in the dialogues using an existing emotion
cause detection model and label them accordingly in EmpatheticDia- References
logues. This approach allows us to better understand the emotional
context of the dialogues and provide more accurate emotional labels [1] H. Herjanto, M. Amin, F. Okumus, and C. Cobanoglu, (2022).
for the model training. Airline service: Low-cost-carriers (LCCs) failure and passenger</p>
      <p>Baselines To assess our model efectiveness in capturing and emotional experience. Tourism Review, 77(3), 945-963.
generating empathetic responses with subtle emotional nuances, [2] Q. Li, H. Chen, Z. Ren, P. Ren, Z. Tu, Z. Chen,
we compare our model’s performance against several baselines, EmpDG:Multi-resolution Interactive Empathetic Dialogue
Genincluding the MoEL model [5], which is an extension of the Trans- eration, Proc. 28th Int. Conf. Comput. Linguist. (2020). URL:
former model that combines response representations from diferent https://doi.org/10.48550/arXiv.1911.08698
decoders optimized for diferent emotions; the MIME model [6] is [3] J. Gao, Y. Liu, H. Deng, W. Wang, Y. Cao, J. Du, R. Xu,
another Transformer-based model that considers emotion clustering Improving Empathetic Response Generation by
Recognizand emotional mimicry, and introduces sampling stochasticity dur- ing Emotion Cause in Conversations, Find. Assoc.
Coming training; the EMPDG model [2] is a kind of empathic dialogue put. Linguist. Find. ACL EMNLP 2021. (2021) 807–819. URL:
generation model based on generative adversarial network. https://doi.org/10.18653/v1/2021.findings-emnlp.70.</p>
      <p>Evaluation Results As shown in table 1, our results have certain [4] S. M. Mohammad, and P. D. Turney, (2013).
Crowdsourcadvantages in the accuracy of emotion recognition and the PPL of ing a word–emotion association lexicon. Computational
indialogue, which shows that our reasoning process on emotional telligence, 29(3), 436-465. URL:
https://doi.org/10.1111/j.1467causes helps the model to perceive emotion better, and at the same 8640.2012.00460.x
time, produces a more sympathetic expression. At the same accuracy [5] Z. Lin, A. Madotto, J. Shin, P. Xu, and P. Fung, MOEL:
rate of emotion recognition, our model has more advantages in Mixture of empathetic listeners, EMNLP-IJCNLP 2019
the value of ppl, which shows that our model can better perceive 2019 Conf. Empir. Methods Nat. Lang. Process. 9th Int. Jt.
the subtle emotional reasons and respond accordingly with the Conf. Nat. Lang. Process. Proc. Conf. (2019) 121–132. URL:
same recognition efect. These automatic evaluation results suggest https://doi.org/10.48550/arXiv.1908.07687
that our approach is efective in generating empathetic responses [6] N. Majumder, P. Hong, S. Peng, J. Lu, and D. Ghosal, A. Gelbukh,
with subtle emotional nuances and diverse language. At the same R. Mihalcea, S. Poria, MIME: Mimicking emotions for
empatime, the results of the manual evaluation show that our empathy thetic response generation, in Proc. Conf. Empirical Methods
expression and fluency of sentences are also better. Natural Lang. Process. (EMNLP), (2020), pp. 8968–8979. URL:
https://doi.org/10.48550/arXiv.2010.01454</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>