<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Target-guided Knowledge-aware Recommendation Dialogue System: An Empirical Investigation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dongding Lin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jian Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wenjie Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computing, The Hong Kong Polytechnic University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>The target-guided recommendation dialogue system aims to make high-quality recommendations through interactive conversations proactively and naturally. Existing methods still struggle to incorporate background knowledge for coherent response generation, and to recommend appropriate items with respect to dialogue context, user preference and recommendation target. In this paper, we investigate the problem of target-guided knowledge-aware recommendation dialogue and design a dialogue generation system to alleviate the above-mentioned issues. Specifically, we employ pre-trained language models with multi-task learning to jointly learn response generation and goal prediction towards the target. We also present a knowledge-preserving encoding strategy to maintain the facts in background knowledge. Extensive experiments on two benchmark datasets show that our system significantly outperforms various competitive models in terms of both automatic and manual evaluations. We further provide analysis and discussions to demonstrate that our system is efective in leveraging both related knowledge and planned goals to generate fluent, informative and coherent responses towards the target of recommendation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Recommendation Dialogue</kwd>
        <kwd>Background Knowledge</kwd>
        <kwd>Target Guiding</kwd>
        <kwd>Multi-task Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>knowledge helps a dialogue system better understand
user interests and make recommendations via coherent
Building a human-like dialogue system is one of the long- natural language communication. Since users may not
cherished goals in natural language processing (NLP) have a clear preference for the unfamiliar new items
[1]. Dialogue systems can be mainly used for chat- recommended, especially in many sociable
recommendating with users for entertainment, i.e., open-domain di- tion domains such as music, movies and news, it is also
alogues [2, 3], or accomplishing specific tasks, i.e., task- important for a dialogue system to proactively lead the
oriented dialogues [4, 5, 6]. Recent years, recommen- conversation to the recommendation target with high
dation dialogue systems [7, 8] have been recognized as user engagement and enjoyment.
an important special type of task-oriented dialogue sys- Recently, the emergence of the DuRecDial [15] dataset
tems with the aim of discovering user preferences and provided new insights towards the development of
targetmaking recommendations through conversations. The guided knowledge-aware recommendation dialogue
sysgrowing research interests mainly come from the bene- tems. As the example shown in Figure 1, the whole
userifts that dialogue provides an efective channel to han- bot dialogue is grounded on a user profile, background
dle the cold-start problem in recommendations while knowledge, and a goal sequence. The bot needs to take
recommendation-oriented tasks promotes technological both the user’s interests and the knowledge graph into
advance for dialogue systems [9]. consideration to decide an optimal goal path to achieve</p>
      <p>Many existing methods have focused on various as- the target of recommending and playing a music (i.e,
pects of both recommendation and conversation, includ- “Days of Friendship”). Here, the goal path is a sequence of
ing user preference modeling [10], conversation strategy goals, with each goal specifying a goal type (e.g.,
“Greet[11, 12], and dialogue generation [13, 14]. Today, incor- ing” or “Movie Recommendation”) and a goal topic (e.g.,
porating knowledge graphs (KG) has been recognized the movie “Orphans of the Zhao Family” or the music
as crucial to bridge the gap between recommendation “Ode to Joy”). It is challenging to build such an
efecreasoning and response generation [13]. Background tive recommendation dialogue system, as the planning of
goals which is required to ensure a natural and smooth
conversation must take into consideration of both the
user interests and the ultimate target goal. The joint
modeling of dialogue context, user profile and
recommendation target remains under-explored in the existing
work.</p>
      <p>In this paper, we investigate two key challenges in
recommendation dialogue systems: (1) how to efectively</p>
      <p>UNAIDS China</p>
      <p>Goodwill
Ambassador</p>
      <p>User Profile
Ren Xinfeng
Female
36-50
Huang Xiaoming, Yang Mi
Legend of the Grand Master
You Hate Me Like That.</p>
      <p>Four Seasons of the Earth.</p>
      <p>Knowledge Graph
Has a good reputation.</p>
      <p>comment
Orphans of
the Zhao</p>
      <p>Family
award
Chinese Film
Media Awards</p>
      <p>comment
As a fellow, I will
always support</p>
      <p>you.</p>
      <p>The theme song of the movie
"Young and Dangerous"</p>
      <p>award
act
XiHauoamnigng sing Ode to Joy</p>
      <p>sing comment
Days of
Friendship
comment
12th China Film</p>
      <p>Society Of
Performing Arts</p>
      <p>Award
It conveys a lot of
positive energy.</p>
      <p>The 19th top ten
Chinese Golden
Melody in 1996.</p>
      <p>Goal Conversation</p>
      <p>1. Bot: Good evening, Ms. Ren Xinfeng.</p>
      <p>Greeting 23.. BUoset:r: GHooowdisevitentoindga.y?</p>
      <p>4. User: I feel like the best day of the month because I got my salary today!</p>
      <p>Movie 5. Bot: iHsavhear,ylenti’csewaantdchhaasmaogvoieodanredpruetlaatxi.onT.hIerreeccoemntmoenned,"itOtropyhoaun.s of the Zhao Family",
Recommendation 6. User: I don’t want to talk about movies now. Sorry.</p>
      <p>7. Bot: Okay, how about its actor Huang Xiaoming? He’s a good actor and won the 12th</p>
      <p>China Film Society Of Performing Arts Award.</p>
      <p>Chat about 8. User: Really? I like him very much.
the Star 9. Bot: Many people like him. He is also the UNAIDS China Goodwill Ambassador. He is</p>
      <p>very kind-hearted.
10. User: Really? He is the perfect idol in my mind.
11. Bot: And Huang Xiaoming sings very well. This song "Ode to Joy" is recommended to</p>
      <p>you, which conveys a lot of positive energy.</p>
      <p>12. User: I'm not interested in this song, sorry.</p>
      <p>Music 13. Bot: What about the "Days of Friendship" he sang? The song won the 19th top ten
Recommendation 14. User: CRehainlleys?eIGtsoolduenndsMgeoloodd.y in 1996.</p>
      <p>15. Bot: Many people likes it. It’s very popular and worth listening to.
16. User: Yeah, I want to have a try.</p>
      <p>17. Bot: Should I play it for you?
Play Music 1198.. BUoste:r: OYeksa,yl,eitt’’sspallaryeaidt.y playing. Please enjoy it.</p>
      <p>20. User: Wow, it’s really nice!</p>
      <p>Goodbye 2212.. BUoste:r: HOakhaay,,gtohoendbIywe.on't disturb you listening to the song. Goodbye.
incorporate related facts in background knowledge and egy to better maintain background knowledge facts
user profile in dialogue generation, and (2) how to make in order to enhance the system ability to generate
appropriate goal planning to proactively lead the con- appropriate responses by incorporating background
versation. To this end, we employ multi-task learning to knowledge.
jointly learn response generation and goal prediction
towards the nfial target based on pre-trained language mod- (3) The evaluation results show that our system achieves
els. Specifically, we adopt ERNIE-GEN [ 16], an enhanced significant improvement compared to various
commulti-flow pre-training and fine-tuning framework for petitive models.
natural language generation, as our backbone model. In
addition, we also present a knowledge-preserving encod- 2. Related Work
ing strategy to maintain the background knowledge facts
for dialogue generation. Extensive experiments on two The two research lines that motivate our study are
conbenchmark datasets show that our system significantly versational recommendation systems (CRS) and
recomoutperforms various competitive models in terms of both mendation dialogue systems. We briefly introduce some
automatic and manual evaluations. We have submitted representative works as below.
our best model to Baidu Language and Intelligence
Challenge 2021 (LIC 20211), where we achieved the 4-th rank 2.1. Conversational Recommendation
among 862 teams. It reveals that our methods are
efective to generate informative, coherent and appropriate System
responses and to achieve the target of recommendation. A conversational recommendation system (CRS) is a
rec</p>
      <p>Overall, our contributions are three folds: ommendation system (RS) that provides personalized
rec(1) Towards building a target-guided recommendation di- ommendation through natural language conversations.
alogue system, we adopt multi-task learning to jointly Christakopoulou et al. [17] argued that asking
quesmodel goal planning and dialogue generation based tions benefit a RS , which can better understand user
on pre-trained language models. preference based on user feedback. To this end, they
suggested to move from the traditional RS to the CRS.
(2) We present a knowledge-preserving encoding strat- Lei et al. [11] proposed a three-stage framework called
Estimation-Action-Reflection (EAR) to fill the
interaction gap between conversation and recommendation in</p>
      <sec id="sec-1-1">
        <title>1https://aistudio.baidu.com/aistudio/competition/detail/67?is</title>
        <p>FromLuge=true
implicit ways. More explicitly, Lei et al. [12] leveraged set  and each goal , consists of a goal type and a
conversational recommendation as finding a path in a goal topic.  = {, }=1 represents a set of user
prouser-item-attribute graph interactively. To enhance se- files with each profile , in the format of ⟨, ⟩
mantic representations of products and related textual de- pair.  = {, }=1 is the response produced on the
scriptions of products, both Zhou et al. [18] and Rajdeep basis of the , , , and . Here,  and  denote
et al. [19] incorporated external knowledge graphs (KG) the sequence length of  and  respectively.
into CRS, which in turn led to better recommendations. Given explicit goals ′ = {1,  } (i.e., start goal
However, despite the improvement towards high-quality and target goal), a dialogue history ′ paired with the
recommendations, these methods have limited abilities related knowledge facts ′, and the user profile  ′, the
to generate natural and informative dialogues. objective of target-guided recommendation dialogue is to
decide an appropriate goal  at each turn to determine
2.2. Recommendation Dialogue System where the dialogue should go with the aim of proactively
leading the dialogue from the start goal to the target goal,
A recommendation dialogue system is a special type of and meanwhile generating a coherent and informative
task-oriented dialogue system, which is expected to en- response to achieve the goal .
courage natural human-machine interaction with a clear
target. To facilitate the research along this line, several 3.2. Model Architecture
recommendation dialogue datasets have been released,
including GoRecDial [8] and INSPIRED [20]. To further Backbone Model To tackle the issue of proactively
investigate whether the system can lead a multi-type di- planning goals for target-guided recommendation
diaalogue to approach the target of recommendation with logue, we jointly model goal prediction and dialogue
rich interaction behavior, Liu et al. [15] created a large- generation based on pre-trained language models, as
scale dialogue dataset, namely DuRecDial. Existing rec- shown in Figure 2 (a). Pre-trained language models have
ommendation dialogue approaches mainly focus on how been widely used in dialogue generation on a basis of
to efectively integrate interactive recommendation and pre-training fine-tuning framework, where they
generdialogue generation. Cai et al. [21] contributed two hi- ally concatenate diferent information sources such as
erarchical taxonomies for classifying user intents and knowledge facts and dialogue history as input, and
generrecommendation actions. To bridge the gap between rec- ate responses autoregressively. In this paper, we employ
ommendation reasoning and response generation, Ma et ERNIE-GEN [16], an enhanced multi-flow pre-training
al. [13] performed tree-structured reasoning on knowl- and fine-tuning framework for natural language
generaedge graphs, which can then be mapped to hierarchical tion, as our backbone framework. ERNIE-GEN bridges
dialogue acts to guide generation. More recently, Bai the discrepancy between training and inference with an
et al. [14] proposed a goal-oriented knowledge copy infilling generation mechanism using multi-flow
attennetwork to discern the knowledge facts that are highly tion. In light of the fact that entities and phrases are
orcorrelated to the dialogue, which assisted to generate ganized span by span, ERNIE-GEN adopts span attention
accurate knowledge-aware responses. mask matrices (see Figure 2 (b)) to determine whether</p>
        <p>In this paper, we aim to build a target-guided dialogue each token and each span can attend to each other. To
system towards recommendation. It requires the system better capture coherent semantic information of the
conto make high-quality recommendations by considering text, both word-by-word flow and span-by-span flow are
external knowledge and user preference. More impor- integrated together (see Figure 2 (c)), where the the
spantantly, the system should also be able to lead the conver- by-span generation flow aims to predict semantically
sation towards the target goal naturally by generating complete spans consecutively. In view of the fact that
appropriate responses. specific entities or spans (e.g., musics, movies, and news)
should be generated in the response as the recommended
items, we believe ERNIE-GEN is a good choice with the
3. Method advantages described above.</p>
        <sec id="sec-1-1-1">
          <title>3.1. Problem Definition</title>
          <p>Suppose a target-guided dialogue corpus is denoted as
 = {(, , , , )}=1, where  = {ℎ,}=1</p>
          <p>represents dialogue history with multiple turns,  =
{, }=1 is a set of background knowledge facts that
correspond to this conversation and each element ,
is formulated as a triplet.  = {, }=1 is a goal
sequence which is constructed upon the knowledge facts
Knowledge-preserving Encoding It is dificult for
existing pre-trained models to encode concatenated
background knowledge facts because it often exceeds the
encoding length limitation of the models. In particular,
according to our statistics, the concatenated background
knowledge facts of each dialogue in the DuRecDial [15]
dataset contains more than 1,700 tokens on average. It
substantially exceeds the encoding length limitation (i.e.,</p>
          <p>Feed
Forward</p>
          <p>Response</p>
          <p>Infilling Decoding
Multi-Flow Attention
in ERNIE-GEN</p>
          <p>Feed</p>
          <p>Forward</p>
          <p>Pre-trained
Language Model</p>
          <p>Supervised</p>
          <p>Training Data
(a) Overview of our system
Q K, V R S T U</p>
          <p />
          <p>Q K, V R S T U
R
S
T
U
R
S


T

U
Can be attended</p>
          <p>Cannot be attended</p>
          <p>Y!"#$</p>
          <p>Y%&amp;'(


 ×</p>
          <p>Feed Forward</p>
          <p>Feed Forward</p>
          <p>Multi-Flow Attention
Word-by-Word</p>
          <p>Flow
Contextual Flow share</p>
          <p>Span-by-Span</p>
          <p>Flow</p>
          <p>Contextual Flow
A)
+
,

+
P
%
*
+
,</p>
          <p>A+
+
,




(b) Span attention mask matrices
(c) Overview of Multi-Flow Attention
512) of many pre-trained language models including Multi-task Learning As described in Section 3.1, the
ERNIE-GEN [16]. To address this issue, we present a system should generate a coherent and informative
reknowledge-preserving encoding strategy to better main- sponse following an appropriate goal, which is decided by
tain background knowledge facts. First, all knowledge the system itself at each turn with the aim of proactively
triplets  in -th dialogue are sorted according to the to- leading the dialogue from the start goal to the target goal.
ken length of each triplet after being concatenated. Then, Intuitively, the goal planning process has an important
we put these knowledge triplets into a number of buckets efect on dialogue generation. To this end, we propose
{, }=1 with the short-length-first-in priority, where to add goal prediction at each turn as an auxiliary task,
, ⊆ ,  denotes the number of buckets. The ca- which are jointly fine-tuned with the dialogue
generapacity of each bucket , is tuned by a hyper-parameter tion task in a multi-task learning manner. Concretely, we
, which denotes that , contains no more than  divide the task of goal prediction into two sub-tasks, goal
tokens in total. We hope that after concatenating , type prediction and goal topic prediction. We feed the
with other information sources, the total input length hidden representation of ERNIE-GEN’s encoding output
fulfills the encoding length limitation (i.e., 512). To this to two individual fully-connected feed forward neural
end, the -th dialogue sample is split into  dialogue networks, followed by a softmax operator, both of which
samples. Note that the system will generate multiple are optimized using cross-entropy loss. As shown in
Figresponses during inference with this strategy. We adopt ure 2 (a), the fine-tuning objective during the training
a simple unsupervised strategy to select the “best” one. stage is to jointly optimize the goal type prediction loss
We calculate mutual F1 scores by treating one response ℒ, goal topic prediction loss ℒ, and the response
as the “ground-truth” and the others as the candidate generation loss ℒ. We minimize the following overall
generated results. The average F1 score of the candidate loss:
results will be regarded as the selection score for the ℒ =  1ℒ +  2ℒ + ℒ (1)
“ground-truth”. Therefore, each response will obtain a
corresponding selection score. We select the response
with the highest selection score as the final generated
response.
where  1,  2 are two hyper-parameters controlling the
impact of the goal type and the goal topic. Under the
supervision of goal planning in the training stage, the
system will learn to naturally generate coherent responses
so as to achieve goal transition during the inference stage.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Experiments</title>
      <sec id="sec-2-1">
        <title>4.1. Datasets</title>
        <p>We conduct extensive experiments on two
knowledgeaware recommendation dialogue datasets, i.e., DuConv
[22] and DuRecDial [15] that are accompanied with the
explicitly specified goals. We also use some other
dialogue datasets to enhance the fine-tuning process. All
datasets are in Chinese.
• DuConv: It consists of about 30k dialogues and 270k
utterances in the movie domain. Each dialogue
contains about 14 background knowledge triplets on
average. The goal sequence of each dialogue is an explicit
path “[] → _ → _” over the
knowledge graph, indicating how a dialogue is led from any
start point relevant to _ to the final _.
Here,  represents one entity in the background
knowledge.
• DuRecDial: It is composed of about 10k dialogues
and 156k utterances over multi-type domains,
including chit-chat, question answering (QA), and
music/movie/news recommendation, etc. Each dialogue
session consists of about 15 turns on average, with
about 22 background knowledge triplets and a
speciifed user profile (e.g., age, gender, preference) in the
format of ⟨, ⟩ pairs. The goal sequence is
constructed upon the knowledge and user profiles, with
each goal containing a goal type and a goal topic
(entity). There are altogether 21 goal types.
• Other Datasets: Since it is important to select
appropriate entities or phrases from background knowledge
facts for a recommendation dialogue system, we also
utilize additional large-scale dialogue datasets to help
ifne-tune our system because of their similar settings
for incorporating knowledge in dialogue generation.
The datasets include ESTC [23], Tencent [24], and
KdConv [25]. Both ESTC and Tencent datasets are
collected from open-domain conversations, with about
900k and 5.5M dialogues respectively. The KdConv
dataset covers conversations about movie, music, and
tourism, which has more than 3k dialogues. We will
discuss the efect of model performance with and
without using these datasets in Section 5.1.
dialogue history, goal topics, knowledge triplets as well
as the input sequence, we observe that it fulfills the
maximum input length (i.e., 512) of ERNIE-GEN in most cases.
For those few samples that exceed the length limitation,
we simply take the last 512 tokens (i.e., Chinese
characters) as input. For the DuRecDial dataset, it has an
average of about 15 dialogue turns and an average of about
22 knowledge triplets. After taking user profiles and goal
sequences into account, the average length of each
concatenated input sequence substantially exceeds the
maximum input length (i.e., 512) of ERNIE-GEN. Therefore, we
adopt the knowledge-preserving encoding strategy
described in Section 3.2 to better maintain the background
knowledge facts for each dialogue.</p>
      </sec>
      <sec id="sec-2-2">
        <title>4.3. Baselines</title>
        <p>We compare our system with baseline models and several
competitive methods as follows.
• Seq2Seq2 [26] is a generative baseline used in many
dialogue generation tasks. We concatenate dialogue
history, knowledge facts, and other sources (if any)
together as the input sequence and feed it to the vanilla
sequence-to-sequence (Seq2Seq) model with the
attention mechanism to generate responses.
• MGCG_R/G [15] include a retrieval-based model and
a generation-based model for multi-goal driven
conversation generation. They are presented as the second
baseline on the DuRecDial dataset.
• UniLM [27] is a unified pre-trained language model
that can be used for language generation by controlling
generation with specific self-attention masks.
• GPT-2 [28] is an autoregressive pre-trained language
model and has been successfully used in many
downstream language generation tasks. The pre-training on
large-scale text corpora makes it easy to be fine-tuned
for dialogue generation.
• GOKC3 [14] is a generation-based model with a
goaloriented knowledge discernment mechanism, which
discerns the knowledge facts that are highly correlated
to the dialogue goal and the dialogue context. Note
that GOKC is the publicly available state-of-the-art
model on both the DuConv dataset and the DuRecDial
dataset.</p>
      </sec>
      <sec id="sec-2-3">
        <title>4.2. Data Preprocessing</title>
      </sec>
      <sec id="sec-2-4">
        <title>4.4. Implementation Details</title>
        <sec id="sec-2-4-1">
          <title>To better understand the characteristics of diferent</title>
          <p>datasets, we conduct data analysis and preprocessing
ifrst. The statistics of DuConv and DuRecDial datasets
are reported in Table 1. For the DuConv dataset, it has
an average of 4.5 dialogue turns and an average of 14.2
knowledge triplets. After concatenating the multi-turn</p>
        </sec>
        <sec id="sec-2-4-2">
          <title>Our dialogue system is built on top of the oficial</title>
          <p>open-source code of ERNIE-GEN4. During training
(finetuning), both  1 and  2 are set to 1.0 and the batch size</p>
        </sec>
        <sec id="sec-2-4-3">
          <title>2https://opennmt.net/OpenNMT-py/. 3https://github.com/jq2276/Learning2Copy 4https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-gen</title>
          <p>is set to 8. We use Adam [29] optimizer with the initial • BLEU-1/2 scores: They are also calculated at the
charlearning rate of 1 × 10− 4, the 2 weight decay of 0.01 acter level, representing 1-gram and 2-gram overlaps
and the learning rate warm-up over the first 10% train- between the generated response and the gold response.
ing steps with linear decay. During generation, we adopt
beam search decoding algorithm with a beam size of 5. • Distinct (DIST)-1/2 scores: They are used to
evaluThe details are described below. ate the 1-gram diversity and 2-gram diversity of the
generated response.</p>
          <p>Fine-tuning We start fine-tuning from the pre-trained • Perplexity (PPL): It is widely used to estimate how
Chinese version of ERNIE 1.0 model [30], as it is compat- well a probability model predicts a sample. A low
ible with the ERNIE-GEN framework and its pre-trained perplexity indicates the model is good at predicting
model checkpoint can be directly loaded. We first fine- the sample.
tune our system on 3 large-scale dialogue datasets (as
described in Section 4.1) for 5 epochs. Due to the large Human Evaluation With human evaluation, we
ransize of the ESTC dataset and the Tencent dataset, we domly select 100 dialogue samples from the testset, and
randomly extract 400K dialogue samples from each origi- then invite 5 evaluators to independently assign the
ratnal dataset. We continue to fine-tune our system on the ing score for the output of each model following the
mettarget dialogue datasets (DuConv and DuRecDial) for 10 rics suggested in [15]. The score of each metric is ranged
epochs, with the bucket capacity  setting to 360. from 0 to 2. Furthermore, we also report the human
evaluation in the Baidu LIC 2021, where crowed-sourcing
Vocabulary Expansion We find that ERNIE-GEN annotators are invited to conduct about 10 multi-round
may generate unknown words (i.e., [UNK]), i.e., the conversations with each submitted system and to judge
words out of the vocabulary. Therefore, we add the ad- the dialogue quality. The metrics used in our evaluation
ditional tokens with high occurrence extracted from the and in Baidu LIC 2021 are in consistent, including:
datasets to expand the original vocabulary. The final
vocabulary size is 18,000, which can cover almost all the • Informativeness (Info.): It measures if the model
Chinese characters and common special tokens in the makes full use of knowledge facts in the generated
datasets. response.</p>
          <p>Deduplication We observe that our system tends to
generate repeated words or phrases sometimes, which is
a common issue that is still under exploration in natural • Knowledge accuracy (Know Acc.): It evaluates the
language generation. To make the generated response accuracy of the selected knowledge in the generated
look more fluent, we remove the consecutive repeated response.
words using regular expression rules.
• Coherence (Cohe.): It measures the overall fluency
of the whole dialogue generation.</p>
        </sec>
      </sec>
      <sec id="sec-2-5">
        <title>4.5. Evaluation Metrics</title>
        <p>Automatic Evaluation Following the common
practice [15, 14], we adopt the following automatic evaluation
metrics.
• F1 score: It indicates whether the model can generate
appropriate entities in the response.
• Recommendation success rate (Rec. Succ.): It
estimates how well the target recommendation goal is
achieved.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Results and Analysis</title>
      <sec id="sec-3-1">
        <title>5.1. Automatic Evaluation</title>
        <p>The automatic evaluation results on the DuConv dataset
and the DuRecDial dataset are reported in Table 2.
improvements on F1, BLEU-1, and BLEU-2 evaluation
metrics. When using external dialogue datasets, our
model achieves about 3%, 15.3%, and 26.7% improvements
accordingly. Our model with multi-task learning
further outperforms baseline methods on all metrics, which
Our model outperforms all the compared models, and demonstrates that joint modeling of goal planning and
achieves a significant improvement over most of the eval- dialogue generation is efective to help the system select
uation metrics. Specifically, on the DuConv dataset, the the appropriate knowledge from the background facts to
normalized models (i.e., norm retrieval and norm gen- facilitate generation. Besides, we also observe that the
eration) refer to using normalized data by replacing the perplexity of our model is much lower, indicating that
specific two goals in the knowledge path with “topic_a” our model is more likely to generate fluent responses. It
and “topic_b” respectively, following [22]. As shown in should be noted that based on its default setting, GOKC
Table 2, our model yields substantial improvement over actually assumes that the full goal sequence is provided
existing pre-trained models including UniLM and GPT-2 and thus does not require any goal planning [15].
Thereon both F1 and BLEU-1/2. It demonstrates that our model fore, for fair comparison we also report our evaluation
can generate more coherent and informative responses results using the available full goal sequence in Table 2.
in the -gram’s level. Compared to the state-of-the-art The results further show the efectiveness of multi-task
model GOKC, our model without using external dialogue learning for our system. Overall, our model achieves
datasets (w/o ext) still achieves about 0.13%, 8%, and 32% significant improvements over competitive methods in
improvements in terms of F1, BLEU-1, and BLEU-2, re- terms of all automatic evaluation metrics.
spectively. After using external dialogue datasets (w/
ext), our model further achieves 1.39%, 9.7%, and 35.6%
improvements of F1, BLEU-1, and BLEU-2 compared to 5.2. Human Evaluation
GOKC, which indicates that fine-tuning on large-scale The human evaluation results of baseline models and
task relevant dialogue datasets is efective to improve our model are presented in Table 3. As shown in
Tathe performance in the final target-guided knowledge- ble 3, our model obtains the highest human scores on
aware recommendation dialogue task. Note that the nor- both the DuConv and DuRecDial datasets, which shows
malized retrieval method achieves the highest DIST-1/2 the efectiveness to generate informative and coherent
scores. The retrieval-based methods that directly select responses with correct knowledge and consistent
inforresponses from a list of candidates is more likely to retain mation. Specifically, our model achieves significant
imthe diversity of the natural responses. provement in terms of knowledge accuracy, which
fur</p>
        <p>As shown in Table 2, our model also achieves superior ther verifies that fine-tuning on large-scale task relevant
performance than all baseline methods on the DuRec- dialogue datasets is efective to improve the ability of
Dial dataset. In particular, compared to the competi- our model to incorporate knowledge into generation. We
tive model GOKC, our model without using external observe that our model with multi-task learning obtains
dialogue datasets obtains about 2.8%, 15%, and 26.1%
much better recommendation success rate on the
DuRecDial dataset. This verifies that our joint modeling of goal
prediction and response generation enables the system
to make more accurate recommendations with respect
to the given goals and the user profile.</p>
        <p>We submitted our best model to the Baidu LIC 2021
and achieved the 4-th rank among 862 teams. The human
evaluation results on the leaderboard are shown in Table
4. Note that the human evaluation here is more
challenging due to two aspects: (1) The decision of the current
goal relies on the previously predicted goals, and (2) the
generation of the response at the current turn will be
further decided by the current goal. It is likely to cause
error accumulations for a model during multi-turn
conversations. Therefore, the evaluation results can better
reveal the abilities of diferent models to guide the
conversation to the target. As shown in Table 4, our system
is competitive compared to others. However, our model
performs inferior on the DuRecDial dataset in terms of
the recommendation success rate. It encourages us to
further improve goal planning strategies in future work.</p>
      </sec>
      <sec id="sec-3-2">
        <title>5.3. Discussion</title>
        <p>Analysis of Implementation Details We study the
contribution of each part in our system by conducting
experiments with several variants of our system. The
results are shown in Table 5. Here is our findings. (1) All
strategies are efective to improve dialogue generation
performance. (2) The knowledge-preserving encoding
strategy contributes significantly to dialogue generation
especially on the DuRecDial dataset where the input
source sequence is much longer. Compared to previous
methods that truncate tokens when the sequence exceeds
the encoding length limitation of pre-trained models, our
proposed encoding strategy better maintains the
knowledge facts. (3) After expanding the vocabulary, our model
achieves significant improvements over most of the
metFuture Research Direction In real world,
recommending new target items that possibly attract users is
meaningful since users often have no definite preference
for many unknown items. We are trying to achieve this
objective through the development of the target-guided
knowledge-aware recommendation dialogue system. We
understand that it is not suficient by simply modeling
the target and dialogue with multi-task learning as
investigated in this paper. We will leave the problem of
proactively planning goals step by step towards the target
goal as our future research direction.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusion</title>
      <p>In this paper, we explore target-guided knowledge-aware
recommendation dialogue based on the pre-training
finetuning framework, which aims to proactively lead the
conversation and learn to make high-quality
recommendations. We present a knowledge-preserving encoding
strategy and a multi-task learning approach to enable our
system to efectively recommend appropriate items and
to generate fluent and coherent responses. The
experimental results on two benchmark datasets demonstrate
the efectiveness and superiority of our system compared
to the other competitive models in terms of both
automatic and manual evaluations. We also discuss the
implementation details and our future research direction.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The work described in this paper was supported by
Research Grants Council of Hong Kong (PolyU/15207920,
PolyU/15207821), National Natural Science Foundation
of China (61672445, 62076212) and PolyU Internal Grants
(ZVVX, ZG7H, ZVQ0).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>mas</surname>
          </string-name>
          , W. Wang (Eds.),
          <source>The Thirteenth ACM Interna</source>
          [1]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Turing</surname>
          </string-name>
          ,
          <article-title>Computing machinery</article-title>
          and intelli- tional
          <source>Conference on Web Search and Data Mining</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          gence, in: Parsing the turing test,
          <year>2009</year>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>65</lpage>
          . (WSDM),
          <year>2020</year>
          , pp.
          <fpage>304</fpage>
          -
          <lpage>312</lpage>
          . [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , Se- [12]
          <string-name>
            <given-names>W.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Miao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          , L. Chen,
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>ceedings of the 57th Annual Meeting of the As-</article-title>
          Y. Liu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Prakash</surname>
          </string-name>
          (Eds.),
          <source>The 26th ACM</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>sociation for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <source>SIGKDD Conference on Knowledge Discovery and</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          3696-
          <fpage>3709</fpage>
          .
          <source>Data Mining (KDD)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>2073</fpage>
          -
          <lpage>2083</lpage>
          . [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , [13]
          <string-name>
            <given-names>W.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Takanobu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huang</surname>
          </string-name>
          , Bridg-
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Artificial</given-names>
            <surname>Intelligence</surname>
          </string-name>
          ,
          <year>2019</year>
          , pp.
          <fpage>7281</fpage>
          -
          <lpage>7288</lpage>
          . arXiv:
          <year>2010</year>
          .
          <volume>10333</volume>
          (
          <year>2020</year>
          ). [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eric</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Krishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Charette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          , Learning
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>alogue, in: Proceedings of the 18th Annual SIG tion</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>dial Meeting on Discourse and Dialogue</source>
          ,
          <year>2017</year>
          , pp.
          <source>Artificial Intelligence</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>12535</fpage>
          -
          <lpage>12543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          37-
          <fpage>49</fpage>
          . [15]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-Y.</given-names>
            <surname>Niu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Che</surname>
          </string-name>
          , T. Liu, To[5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Madotto</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-S. Wu</surname>
          </string-name>
          , P. Fung,
          <article-title>Mem2Seq: Efec- wards conversational recommendation over multi-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>tively incorporating knowledge bases into end-to- type dialogs</article-title>
          ,
          <source>in: Proceedings of the 58th Annual</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <article-title>of the 56th Annual Meeting of the Association for guistics (ACL</article-title>
          ),
          <year>2020</year>
          , pp.
          <fpage>1036</fpage>
          -
          <lpage>1049</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Computational</surname>
          </string-name>
          <article-title>Linguistics (ACL) (Volume 1: Long</article-title>
          [16]
          <string-name>
            <given-names>D.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tian</surname>
          </string-name>
          , H. Wu,
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>Papers)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1468</fpage>
          -
          <lpage>1478</lpage>
          . H.
          <string-name>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>ERNIE-GEN: an enhanced multi-flow [</article-title>
          6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <article-title>Global-to-local memory pre-training and fine-tuning framework for natural</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <article-title>pointer networks for task-oriented dialogue, in: 7th language generation</article-title>
          , in: C.
          <string-name>
            <surname>Bessiere</surname>
          </string-name>
          (Ed.), Pro-
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>tions (ICLR)</source>
          ,
          <year>2019</year>
          .
          <source>Conference on Artificial Intelligence (IJCAI)</source>
          ,
          <year>2020</year>
          , [7]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cen</surname>
          </string-name>
          , H. Yang, pp.
          <fpage>3997</fpage>
          -
          <lpage>4003</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>Towards knowledge-based recommender</article-title>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Christakopoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Radlinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          , To-
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <article-title>dialog system</article-title>
          ,
          <source>in: Proceedings of the 2019</source>
          Confer
          <article-title>- wards conversational recommender systems</article-title>
          , in:
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>ence on Empirical Methods in Natural Language Proceedings of the 22nd ACM SIGKDD</source>
          Interna-
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>Processing and the 9th International Joint Con- tional Conference on Knowledge Discovery and</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>ference on Natural Language Processing (EMNLP- Data Mining</surname>
          </string-name>
          ,
          <year>2016</year>
          , pp.
          <fpage>815</fpage>
          -
          <lpage>824</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>IJCNLP)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1803</fpage>
          -
          <lpage>1813</lpage>
          . [18]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          , [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Balakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Crook</surname>
          </string-name>
          , Y.-L.
          <article-title>Improving conversational recommender systems</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>oriented dialogue</article-title>
          ,
          <source>in: Proceedings of the 2019 Con- 26th ACM SIGKDD Conference on Knowledge Dis-</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <article-title>ference on Empirical Methods in Natural Language covery and Data Mining (KDD</article-title>
          ),
          <year>2020</year>
          , pp.
          <fpage>1006</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <source>Processing and the 9th International Joint Con- 1014.</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>ference on Natural Language Processing</source>
          (EMNLP- [19]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sarkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Goswami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arcan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          , Sug-
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>IJCNLP)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1951</fpage>
          -
          <lpage>1961</lpage>
          .
          <article-title>gest me a movie for tonight: Leveraging knowl</article-title>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Manzoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>A survey edge graphs for conversational recommendation,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <article-title>on conversational recommender systems</article-title>
          ,
          <source>ACM in: Proceedings of the 28th International Confer-</source>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>Computing</given-names>
            <surname>Surveys</surname>
          </string-name>
          (CSUR)
          <volume>54</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          . ence on Computational
          <source>Linguistics (COLING)</source>
          ,
          <year>2020</year>
          , [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Moon</surname>
          </string-name>
          , H. Liu,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          , User pp.
          <fpage>4179</fpage>
          -
          <lpage>4189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <article-title>memory reasoning for conversational recommen</article-title>
          - [20]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Hayati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yu</surname>
          </string-name>
          , IN-
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          dation,
          <source>in: Proceedings of the 28th International SPIRED: Toward sociable recommendation dialog</source>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>Conference on Computational</surname>
          </string-name>
          <article-title>Linguistics (COL- systems</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on</source>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <source>ING)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>5288</fpage>
          -
          <lpage>5308</lpage>
          . Empirical Methods in
          <source>Natural Language Processing</source>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Miao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hong</surname>
          </string-name>
          , M. Kan,
          <source>(EMNLP)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>8142</fpage>
          -
          <lpage>8152</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Chua</surname>
          </string-name>
          ,
          <string-name>
            <surname>Estimation-</surname>
            action-reflection: Towards [21]
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Cai</surname>
          </string-name>
          , L. Chen,
          <article-title>Predicting user intents and sat-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          ommendations,
          <source>in: Proceedings of the 28th ACM</source>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <surname>sonalization</surname>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>42</lpage>
          . [22]
          <string-name>
            <given-names>W.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , R. Lian,
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          arXiv:
          <year>1906</year>
          .
          <volume>05572</volume>
          (
          <year>2019</year>
          ). [23]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          , Emo-
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>Intelligence</surname>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>730</fpage>
          -
          <lpage>739</lpage>
          . [24]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Bi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          , S. Shi,
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <source>ings of the 2019 Conference on Empirical Methods</source>
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <source>in Natural Language Processing and the 9th Inter-</source>
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <string-name>
            <surname>cessing (EMNLP-IJCNLP)</surname>
          </string-name>
          ,
          <year>2019</year>
          , pp.
          <fpage>1866</fpage>
          -
          <lpage>1875</lpage>
          . [25]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , Kd-
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <source>in: Proceedings of the 58th Annual Meeting of the</source>
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <year>2020</year>
          , pp.
          <fpage>7098</fpage>
          -
          <lpage>7108</lpage>
          . [26]
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          , Sequence
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <source>Neural Information Processing Systems</source>
          <volume>27</volume>
          : Annual
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <source>tems</source>
          <year>2014</year>
          ,
          <year>2014</year>
          , pp.
          <fpage>3104</fpage>
          -
          <lpage>3112</lpage>
          . [27]
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <source>cessing Systems 32: Annual Conference on Neural</source>
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <source>Information Processing Systems (NeurIPS)</source>
          ,
          <year>2019</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          pp.
          <fpage>13042</fpage>
          -
          <lpage>13054</lpage>
          . [28]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <article-title>vised multitask learners</article-title>
          ,
          <source>OpenAI blog 1</source>
          (
          <year>2019</year>
          )
          <article-title>9</article-title>
          . [29]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochastic</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          optimization, in: Y. Bengio, Y. LeCun (Eds.),
          <fpage>3rd</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          <source>tions (ICLR)</source>
          ,
          <year>2015</year>
          . [30]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          , H. Zhang,
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          <string-name>
            <surname>tion</surname>
          </string-name>
          , arXiv preprint arXiv:
          <year>1904</year>
          .
          <volume>09223</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>