-

History Items:

Survey of Holistic Conversational Recom mender Systems

Chuang Li

lichuang@u.nus.edu

Hengchang Hu

hengchang.hu@u.nus.edu 1

Yan Zhang

eleyanz@nus.edu.sg 1

Min-Yen Kan

kanmy@comp.nus.edu.sg 1

Haizhou Li

haizhou.li@nus.edu.sg 0 NUS Graduate School for Integrative Sciences and Engineering , Singapore 1 National University of Singapore , Singapore 2 The Chinese University of Hong Kong , Shenzhen , China

001 026

Conversational recommender systems (CRS) generate recommendations through an interactive process. However, not all CRS approaches use human conversations as their source of interaction data; the majority of prior CRS work simulates interactions by exchanging entity-level information. As a result, claims of prior CRS work do not generalise to real-world settings where conversations take unexpected turns, or where conversational and intent understanding is not perfect. To tackle this challenge, the research community has started to examine holistic CRS, which are trained using conversational data collected from real-world scenarios. Despite their emergence, such holistic approaches are under-explored.

1. Introduction

Conversational Recommender Systems (CRS) integrate conversational and recommendation system technologies, to facilitate users in achieving recommendationrelated goals through conversational interactions [1]. In contrast to traditional recommendation systems, which port multiple rounds of interaction, allowing the system to make multiple attempts in recommendation.

In much prior work on CRS, the multiple rounds of interaction are simulated by entity-level interaction, conample in Figure 1(a), the entity-level interaction process is illustrated by how the system selects the “Feature ID” of <Genre-Disney> from its feature list, and the simulated human response of <Yes> will be directly returned to the mendation and decision-making strategies, which neglect the conversational element, such as possible inaccuracies in understanding the human language that makes up the LGOBE https://github.com/lichuangnus/CRS-Paper-List (C. Li) 0009-0006-8112-3505 (C. Li); 0000-0001-7847-0641 (H. Hu); conversation. Inaccurate conversation comprehension, gauging of intent and incorrect response generation [4, 5] as well as information inconsistency [6] are a regular occurrence in human conversation, yet much research on

CRS have simply abstracted away from these defining

characteristics. This is due to its presumption that the entity-level interaction is invariably accurate [3]. As a real-world situations pose significant challenges.

Thus there is a dichotomy in CRS research. Most CRS

do not assume actual human conversations for interaction, only simulating the interaction with entity-level that relax this constraint and tackle conversational recommendation based on actual human conversations [8, 9].

Besides recommendation and decision strategy, these works also tackle the aforementioned conversational chalplanning and knowledge engagement. To distinguish these two forms of CRS research, we divide the current research works in CRS into standard CRS (the former, more prevalent form of prior CRS work), and what we term holistic CRS (which assumes a wider scoping of the CRS task) based on the input and output formats, as shown in Figure 3.

Research on holistic CRS is burgeoning, and it is timely to comprehensively survey such works to better organise and make sense of their contributions and gauge their potential future directions. This is needed to efectively utifrom real-world scenarios [10, 8] that train them, in pracsystem. Such a framing of the CRS task focuses on recom- lenges in language understanding, generation, topic/goal Entity-level Interaction

Feature ID: A05 (Genre-Action) Feature ID: B07 (Genre-Disney)

  Rec Items: 201, 202, …, 208 Rec Items: 301, 302, …, 308 (a) Standard CRS

Hit Target: No

A05: No B07: YES

Hit Target: No Hit Target: Yes Good evening, how are you doing today? Good, I am looking for a movie to watch together with my family.

Conversation-level Interaction

Would you prefer to try a

new action movie as last time?

Emm, this time I want one that I can watch with my children.

Dialogue Goals A It is nice to watch movie with children. [Chitchat] B What movie genre would you like for tonight? [QA] C No problem, how about Disney movies? [Rec]   (b) Holistic CRS   tical contexts. Holistic CRS adopt real, conversation-level 1. We provide a clear landscape of the tasks, models interaction and target multiple dialogue goals, as shown and hierarchical structure of holistic CRS. in Figure 1. Given the same entity pair <Genre-Disney> 2. We summarise, analyze and critique the existing as the standard CRS in subfigure (a), the holistic sys- methods, datasets and evaluation methods for tem in subfigure (b) must generate questions like “What selected works in a well-structured manner. movie genre would you like for tonight?” and understand 3. We outline key challenges, constraints and future its related response correctly, before they use <Genre- directions for holistic CRS. Disney> for the recommendation. For the same question, the user may give unexpected answers like “Show me a 2. Definition and Background new movie this year!”, inconsistent with the movie genre.

Moreover, holistic CRS is required to leverage the rich In Figure 3, we split the field of CRS research into two contextual information inferred from the conversations distinct branches: standard and holistic CRS, further [11] and from the semantic context. For example, given delineating them into Types 0, 1, and 2, based on their the input “Emm” in the user’s second response, a holistic input–output dynamics.

CRS might infer that the previous recommendation was unsatisfactory, prompting it to make a new and diferent Type 0 standard CRS, limited to entity-level inputs recommendation. and outputs, is restricted in scope of interaction; e.g., [2, 3].

The main challenges in the task of a holistic CRS are Type 1 holistic CRS takes conversation as input and thus ones such as the following: How to understand the yields either entity-level recommendations or conversausers’ intentions with limited contextual information? How tional responses, encompassing query interpretation and should we generate reasonable responses with high recom- tailored linguistic outputs; e.g., [8, 12]. mendation quality? When faced with diferent inferred Type 2 holistic CRS is more expansive, accepting and conversation goals, which goal should be pursued now? producing unrestricted inputs–outputs formats includ

We systematically analyse the current holistic CRS ing conversations, knowledge and multimedia; e.g., [13, 14]. work solving the above problems (§4), decomposing them into three components: 1) a backbone language Holistic CRS difer from standard CRS approaches in model, and optional components incorporating 2) ex- the following aspects: 1) The final goal for holistic CRS ternal knowledge and 3) external guidance. We follow is to guide or convince users to accept the recommendathis with an analysis of the datasets (§5) and evaluation tion through multi-rounds of conversations. 2) Holistic methods (§6). We investigate the key challenges and CRS start from the conversations and ends by generating promising research trends in this area (§7). To the best either recommendation results or responses. 3) Holistic of our knowledge, this is the first survey on CRS with a CRS methods are evaluated on both recommendation special focus on conversational (“holistic”) approaches. and language quality using both automatic and human Our contributions are: evaluation measures. recommendation, decision and generation units while standard CRS only contain recommendation and decision units. Right: End-to-end holistic CRS with an encoder–decoder structure.

Type 2 Type 1 Type 0

Type 2 Holistic CRS (Unrestricted inputs and outputs)

Type 1 Holistic CRS (Conversational inputs and outputs)

Type 0 Standard CRS (Restricted inputs and outputs) 2.1. Task Definition In a task-oriented dialogue system, we restrict our consideration to the scenario where a singular system interacts with one individual user, denoted by , and pre-determined items, represented by . Each dialogue contains turns of conversations, denoted as ={[ 1 , 1 ], ..., [ ,

tory of past -th turn is denoted as and dialogue history with past -th turns is denoted ={ (1)

, ..., () } vide knowledge or external guidance, which we denote as . The target function for holistic CRS is expressed in two parts: to generate 1) next item prediction +1 and 2) next system response +1

. In summary, at the -th turn, given the user’s interaction history and contextual history, CRS generates either an entity-level recommendation results +1 or a conversation-level system response +1 , shown in Formula 1.

∗ = ∏ ( +1 , +1 | , , )

(1) =1 2.2. Structure of CRS particularly in their handling of conversational data. = { gle turn from the system and its associated response , } =1 , where each turn contains a sin- focusing on Types 1 and 2 of our hierarchy. Our primary sources comprise leading NLP and Information Retrieval from the user. The user’s entity-level interaction his- (IR) conferences and journals, as exemplified by premier ]}. Some methods pro- “conversational recommender systems”. Matching work 2.Knowledge - Structured - Unstructured 1. Language Models 3.Guidance - Recommendation - Topic or Goal - Temporal response generation. Penha and Hauf evaluated BERT’s innate ability for recommendations using text-format probes for item or genre predictions without finetuning. In another line of work, Hayati et al. enhanced conversational tasks by adapting PLMs to produce varied recommendation responses incorporating social strategies, like encouragement or persuasion [12, 19].

Taking a multifaceted approach, Deng et al. segmented recommendation response generation into multiple tasks, including goal or topic planning, item recommendation and response generation. While having distinct tasks, they pre-trained a PLM end-to-end, underscoring the connection between holistic CRS and LMs and validating the efectiveness of the end-to-end training paradigm.

Discussion. While PLMs can generate context-specific recommendation responses, they often fall short of meeting the dual requirements of recommendation accuracy Works centred on Type 0 standard CRS, given their lack and language quality, resulting from the phases of 1) preof conversational aspects, are intentionally omitted. training and 2) online training.

The inherent limitation of PLMs stems from their design for universal application. In contrast, recommendation 4. Main Approaches & Discussion tasks are focused and specific to certain domains [ 8, 23]. The implicit knowledge derived from general pre-training Current holistic CRS approaches are primarily structured is insuficient to support them in making high-quality recaround three main components, as illustrated in Figure 4: ommendations. Pre-training LMs with explicit task-specific 1) Language Models (LMs); 2) Knowledge; and 3) Guid- knowledge is a solution, but comes associated with high ance. A majority of holistic CRS systems hinge on LMs costs and complications [24, 22]. Transferring such knowl(§4.1), encompassing machine learning, deep learning, edge across diverse domains or user groups for real-world and pre-trained language models (PLMs), for founda- applications still poses a considerable challenge. tional dialogue operations. However, these LMs often Holistic CRS rely heavily on online training, enabled fall short in recommendation and commonsense reason- by conversational interactions with benchmark datasets ing. To bridge this gap, additional external knowledge (§5). However, the restricted knowledge available in those (§4.2) and guidance (§4.3) are integrated, either indepen- datasets poses a formidable challenge for PLMs to generate dently or jointly. This section delineates the evolutionary quality recommendation responses, necessitating a model path of their development, ofering insights into their lim- capable of integrating additional knowledge or guidance itations and potential avenues for future progress. to facilitate preference tracking and response generation. 4.1. Language Models 4.2. External Knowledge LMs serve as the backbone for holistic CRS in recom- Inherent limitations regarding implicit knowledge stored mendation response generation with the evolution from in PLMs are addressed in holistic CRS by integrating machine learning [10], deep learning [8, 18] to PLMs external knowledge. This enhances their capabilities [15, 12, 19]. The most popular LMs for response genera- in prediction, reasoning, and explanation. Methods augtion are HRED-based sequential models and transformer- mented with knowledge often utilize graph convolutional based PLMs. These language models adopt a framework networks (GCNs) [25] or relational graph convolutional of end-to-end training, enabling them to be simultane- networks (R-GCNs) [26] to extract knowledge represenously trained in both conversation and recommendation tation from structured sources like knowledge graphs tasks [8, 18]. (KGs), or unstructured ones such as reviews. This repre

Recent advancements in natural language processing sentation is then incorporated into PLMs through seman(NLP) highlight the eficacy of PLMs like BERT and tic alignment or knowledge fusion techniques, enabling GPT [20, 21] in language generation and commonsense the production of refined recommendations [ 27, 28, 29]. reasoning. Although those PLMs are not inherently We now delve into holistic CRS approaches that leverage optimized for CRS, researchers have explored their capa- both structured and unstructured knowledge sources. bilities for holistic CRS tasks like recommendations and 4.2.1. Structured knowledge 4.2.2. Unstructured knowledge Knowledge Graphs (KGs) are a prevalent source of struc- In unstructured knowledge sources (e.g., reviews or doctured knowledge. However, to be employed for holistic uments), a text retriever is employed to extract relevant CRS tasks, they need to be transformed into an appro- textual segments from external documents. These segpriate representation before the knowledge and textual ments are subsequently either transformed into nodes features can be integrated. or edges of a new KG or merged into an existing KG

KGs are typically represented by triplets comprising [39, 29, 40, 41, 42]. The resultant KG can then be transentities and relationships; e.g., <Movie A-Genre-Disney> ferred into knowledge representations [41, 42, 43]. This where nodes representing item entities (Movie A) are con- method allows unstructured knowledge to supplement nected to non-item entities (Disney) via edges that indi- static knowledge graphs with contemporary information, cate relationships (Genre). In knowledge-enhanced CRS, allowing holistic CRS to be more versatile. the entities mentioned in conversations are first matched Knowledge Fusion and Semantic Alignment with entities in external KGs. Subsequently, graph prop- serve as the primary strategies to bridge the entity agation is performed to encode the KG’s structural and and semantic spaces in graph reasoning, leveraging relational information into knowledge representations both structured and unstructured knowledge resources. [30]. Techniques like GCN and RGCN are employed in Knowledge Fusion integrates graph embeddings from this stage to recurrently update node representations KGs with text embeddings from LMs, enhancing based on their neighbouring nodes. With the obtained both entity recommendations and conversational knowledge representations, there are two main research preference interpretations [30, 28]. Recently, Zhou et al. directions in applying KGs to holistic CRS, which we de- demonstrate a method that surpasses the performance note as 1) node-level entity prediction and 2) edge-level of current fusion methods for entities and dialogues. path reasoning [31]. They address the semantic gap between conversations

Node-level entity prediction in holistic CRS en- and external knowledge with fine-grained semantic hances response generation by incorporating additional alignment techniques that align word-level semantic item entities from the KG [30, 32]. In this usage, LMs graphs with entity-level KGs [44, 45, 46]. Similarly, extract knowledge representations from the KG and con- for models utilizing unstructured knowledge bases, vert them into item-specific vocabularies, which are then contrastive learning strategies bridge the semantic gap integrated into recommendation responses. As a result, across embeddings in dialogues, KGs and document such responses are more fluent and informative, aligning reviews, potentially leveraging a spectrum of such closely to the original conversations and consistent with knowledge resources [28]. the user’s interests [30, 32, 33].

Edge-level path reasoning provides a better ap- Discussion. The existing knowledge sources for holistic proach to interpret users’ preferences and dynamic shift CRS are constrained in item space. However, as LMs bein interests through the knowledge presentation than come more robust, the reliance on conventional knowledge node-level entities [34, 35, 31, 36]. A strict, 2-hop KG rea- sources might decrease, while the necessity for guidance soning is first proposed to interpret the user’s preference in other modalities may increase. Specifically, specialized through two steps (e.g.,Movie A ⇒Actor1⇒Movie B). For knowledge (such as user profile representation and user– instance, given the user’s watching history of Movies A item relationship extraction) is likely to become crucial. and B, the model can infer the user’s preference for Ac- The advent of powerful large language models (LLMs) tor 1 and subsequently confirm its inference through serving as LMs, reduces reliance on external knowledge conversation. However, due to the rule-based setting, 2- sources. This potentially makes the use of external sources hop reasoning works well only when users have clearly- redundant [47, 48]. The integration of external knowledge defined and straightforward preferences [ 35]. In situa- within LMs should start by evaluating a model’s capabiltions where users demonstrate shifting interests, a multi- ities before knowledge incorporation, such as examining hop or tree-structure reasoning method is more suitable, the capability of PLM in processing content-based recomtranslating implicit preference paths in KGs to explicit mendations [47, 49]. Recognizing the limitations of LMs explanations in dialogues [34, 37, 38]. before introducing the appropriate knowledge sources is a

Well-constructed KGs enhance comprehensive knowl- key issue in the advancement of holistic CRS. edge representation in entity-level item selection and conversation-level preference reasoning or interpreta- 4.3. External Guidance tion [31, 38]. However, due to the static nature of KGs, inferring the latest features of an item from structured knowledge sources poses significant challenges.

Holistic CRS using external guidance train models for supplementary tasks — inclusive of recommendation, topic/goal planning, and temporal feature representation

REDIAL TG-ReDial* DuRecDial* INSPIRED OpenDialKG GoRecDial MultiWOZ # P — in contrast to knowledge-enhanced models which fuse multifaceted nature of users’ preferences [59]. This knowledge into PLMs. Results from these tasks serve diferentiation allows the modelling of historical user as auxiliary guidance for LMs during recommendation preferences and continues to gather fresh preferences response generation. Some models align both external from active interactions. Additionally, such features knowledge and guidance, adopting a hybrid strategy that aid in the construction of user profiles based on past capitalizes on both dimensions for more robust response behaviours, facilitating the retrieval of similar user generation. profiles based on their relevance, enhancing preference

Recommendation guidance utilises approaches akin modelling in a time-aware collaborative manner [60, 58]. to template-based generation methods, decoupling con- In a distinct approach, Xu et al. put forth the idea versation and recommendation result generation. LMs of a user temporal KG, which contains both ofline are conditioned to separately produce dialogues with user knowledge in historical conversations and online placeholders that align with the original context and knowledge in current or future conversation sessions. suggested items or attributes consistent with the user’s Representing a leap beyond traditional static knowledge history [50, 51, 52, 32, 53]. These placeholders are later graphs, temporal KGs have garnered significant interest substituted with corresponding recommendations. [60, 37]. In the context of holistic CRS, dynamic

Topic or goal guidance enhances the LM’s profi- reasoning utilizing temporal KGs represents an inciency in topic or goal planning. Although reinforcement novative and burgeoning research domain [37, 61, 38, 46]. learning techniques are predominantly employed in traditional CRS for action or goal planning, they are chal- Discussion. Present methodologies for integrating exlenging to adapt as a representation for LMs [3, 18, 22]. ternal knowledge or guidance largely involve training LMs

Topic-guided systems initiate by building topic to interpret fed knowledge or representation, rather than graphs, capturing or predicting specific target topics like guiding them to independently explore and extract the re“action movie” or “Disney movie”. LMs subsequently use quired information from external resources. This method, these graphs to guide recommendation response gener- akin to “spoon-feeding” LMs with knowledge or guidance, ation [54, 55, 33]. Goal-guided systems create hierar- contrasts with the envisioned future for holistic CRS. In our chical goal-type graphs derived from existing KGs and view, LMs should be provided with a knowledge “bufet”, dialogues. The goal-planning module of the LMs is then empowering autonomous gathering of necessary informatrained on diverse dialogue goals, encompassing “QA”, tion and prioritising reasoning over interpretation [62]. “recommendation”, “greeting” or “chitchat” [9, 49, 56, 22].

These objectives also influence the dialogue policy and decision-making processes within holistic CRS. 5. Datasets feaTtuemrespotroalfgourmidualantcee ian tCimReS-ainwcaorreporreapteressteenmtaptoioranl, In the realm of holistic CRS, the interaction between emphasizing the explicit and dynamic shift in users’ users and systems has led to the collection of several preferences [57, 58]. Unlike traditional sequential benchmark datasets. While some surveys have primarily recommendation systems that have access to users’ summarized data from an item space perspective [1], our historical profiles, holistic CRS often lack this depth of focus is to dive deeper into the publicly-available holistic historical data. To address this gap, temporal features CRS datasets. Our intention is understand datasets bediscern between historical dialogue sessions and yond traditional boundaries, expounding specifically on the ongoing dialogue session, thereby capturing the two dimensions: entity information and language quality [8, 12, 54, 18, 31, 63, 54].

Recommendation Accuracy Metrics # Papers

Metrics # Papers

Human Evaluation # Papers 5.1. Statistical analysis lfow of conversation as seekers are already privy to the target item’s identity. Second, a significant proportion of datasets predominantly focus on the movie domain [8, 30], potentially damaging the generalizability of conclusions drawn on CRS research. Third, current datasets do not ofer suficient labels outside the confines of the item space [8, 12]. Addressing these shortcomings will be pivotal for productive future research in holistic CRS.

Table 1 presents a statistical analysis of various datasets, detailing each dataset in terms of both entity and linguistic characteristics. In terms of entity space, the scale of a dataset is measured by the number of conversations and items it contains; while the informativeness is measured by the number of conversation turns and the number of mentions of specific items within them. Interestingly, our analysis reveals that a longer conversation does not necessarily correspond to mentions of more items. Rather 6. Evaluation Methods we believe that ensuring a consistent frequency of item mentions is paramount for the recommendation system’s CRS generate both recommendation results and relearning eficacy [ 64]. sponses. Their evaluation require appropriate mecha

From the perspective of language, most datasets are nisms to assess the quality of both the recommended compiled from predominately English data and focus items and the resulting dialogue as a whole. Existing on the movie domain. Recent datasets indicate a de- evaluation methods examine both recommendation accline in the ratio of informative turns. This trend aligns curacy (as in traditional recommendation systems) and with real-world conversational patterns, where interac- language quality (as in NLP language modelling) sepations are transforming into conversations that contain a rately, using both metrics and human evaluation. We growing amount of general or chit-chat content [12, 19]. compile the frequency of these methods from the works This observation reinforces our belief that an optimal in §4 as Table 2. dataset should capture authentic human behaviour and not merely translate entity-centric data into dialogues. 6.1. Recommendation Evaluation The data also suggests that positive turns — ones that provide constructive or afirmative feedback –— are more valuable for recommendations compared to negative ones [65, 66]. In sum, it is not merely about the volume of training data, but about the quality, authenticity, and informativeness of the conversations therein.

Recommendation evaluation metrics categorise along three lines: point-wise accuracy methods (RMSE), decision support methods (F1) and ranking-based methods (Recall@K). The evaluation metrics for holistic CRS are similar to those in standard CRS, where they mostly evaluate the recommendation from the item level. However, 5.2. Limitations for holistic CRS, it is equally important to evaluate the recommendation performance separately at the converThe objective of Holistic CRS datasets is to accurately em- sation level in order to ensure information consistency ulate real-world scenarios and ofer labelled information in response generation [32]. for eficient learning. However, our evaluation reveals three primary limitations in the existing datasets: First, 6.2. Language Evaluation some datasets diverge from real-world conversations, which impedes the quality of learned interactions [18]. A While most of the recommendation results can be evalunotable example is the game setting where the dialogue’s ated with metrics, it still requires human beings to evaluobjective is to guess a target item, disrupting the natural ate the language generation quality as the golden standard. Metric-based approaches, as auxiliary solutions, in more pertinent recommendations [74]. Additionally, provide a fast and simple evaluation of holistic CRS. Lan- incorporating other LMs or AI-generated content (AIGC) guage evaluation metrics such as Distinct n-gram, BLEU into recommendation feedback could also be a promising and Perplexity evaluate language quality regarding diver- avenue [75, 76]. sity and fluency. Unified model for holistic CRS . Large Language

Human evaluation provides a fair evaluation of dif- Models (LLMs) have significantly advanced task-oriented ferent models from the viewpoints of users and in a dialogue systems, allowing for integrated handling of vardouble-blind way [51, 10]. It is relatively fast and con- ious tasks in a conversational manner [77, 78]. In the venient for human annotators to provide a high-quality realm of recommendation systems, some research has evaluation in terms of fluency and informativeness. How- adopted a two-phase training approach (pre-training and ever, as the human evaluation may only be limited to one ifne-tuning), leveraging text for recommendations, reaor few turns over the whole conversation, it is challeng- soning and explanation [61, 79, 80]. Yet, while there’s ing for the annotators to fully examine the coherence and a push to integrate PLMs into CRS tasks using a textconsistency, which generally requires the full understand- to-text paradigm, the broader holistic CRS research doing of dialogue [6]. main has not achieved a standardized problem frame

Unlike recommendation systems which merely com- work, which would enable seamless integration with pare item rankings with respect to the target item, in task-specific models and swift adaptation to similar tasks holistic CRS, implicit features like personality, persua- across diferent domains [ 32, 44, 24]. LLMs, on their own, sion, and encouragement also contribute to the success cannot address every CRS challenge. Current holistic of a recommendation [12]. Evaluating a system based CRS models lean heavily on complex ensemble architecon user experience remains challenging. It is impera- tures that merge LMs with external knowledge or guidtive to introduce automatic assessment methods for both ance. As such, crafting a unified model framework with system-generated quality and user-centric experiences. consistent problem definitions remains a pivotal research [17, 67, 68, 69]. avenue [32, 44].

7. Challenges & Future Trends 8. Conclusion

As we have detailed the development of holistic CRS, Despite the rising interest in standard conversational recwe now highlight current challenges and suggest future ommendation systems which are restricted to entity-level directions to round out our overview. input and output, our study reveals the necessity and cur

Language generation quality and style. Current rent negligence of holistic CRS, which encompasses all holistic CRS methods do not meet the requirements for forms of input and output, catering for real-world sitpractical application due to their inferior language qual- uations. In this paper, we systematically describe the ity scores in human evaluation, even when compared to important components of holistic CRS, including 1) lanretrieval-based methods [70, 51, 71]. Successful recom- guage models, 2) knowledge resources, and 3) external mendation responses need to supplement explicit pre- guidance. To the best of our knowledge, our survey is diction results by accounting for implicit features like the first systematic review specifically dedicated to holissocial strategy and language styles (e.g., encouragement tic CRS with conversational approaches, which further and informativeness [12, 65, 66]). As recommendation summarized common datasets, evaluation methods and outcomes often draw from an external or enriched knowl- challenges. Existing ascendant works enlighten a number edge structure, future research should focus on 1) ele- of promising future directions from the above perspecvating language quality to garner positive user feedback tives. Through clear landscapes in holistic CRS, we hope [72], and 2) emphasizing preferred language styles to to attract more attention to explore a more natural and enhance user acceptance [73]. realistic setting in this challenging but promising area.

User-centric holistic CRS. Holistic CRS has made strides towards user-centricity by facilitating conversational feedback between the user and the system. References Nonetheless, its feedback and recommendation spectrum is still restricted. To enhance its eficacy, future versions [1] D. Jannach, A. Manzoor, W. Cai, L. Chen, A survey of holistic CRS should prioritize personalised experiences on conversational recommender systems, ACM for individual users by harnessing multi-modal data from Comput. Surv. 54 (2021). doi:10.1145/3453154. item categories and user profiles. Moreover, attending to [2] Y. Zhang, X. Chen, Q. Ai, L. Yang, W. B. Croft, Tousers’ personal feedback and latent preferences is key for wards conversational search and recommendation: building a superior user modelling framework, resulting System ask, user respond, in: Proceedings of the 27th acm international conference on information A survey of joint intent detection and slot filling and knowledge management, 2018, pp. 177–186. models in natural language understanding, ACM [3] W. Lei, X. He, Y. Miao, Q. Wu, R. Hong, M.-Y. Kan, Computing Surveys 55 ( 2022 ) 1–38.

T.-S. Chua, Estimation-action-reflection: Towards [12] S. A. Hayati, D. Kang, Q. Zhu, W. Shi, Z. Yu, Indeep interaction between conversational and rec- spired: Toward sociable recommendation dialog ommender systems, in: Proceedings of the 13th systems, in: Proceedings of the 2020 Conference on International Conference on Web Search and Data Empirical Methods in Natural Language Processing Mining, WSDM ’20, Association for Computing (EMNLP), Association for Computational LinguisMachinery, New York, NY, USA, 2020, p. 304–312. tics, Online, 2020, pp. 8142–8152. URL: https://www. doi:10.1145/3336191.3371769. aclweb.org/anthology/2020.emnlp-main.654. [4] J. Ni, T. Young, V. Pandelea, F. Xue, E. Cambria, [13] T. Yu, Y. Shen, H. Jin, Towards hands-free visual Recent advances in deep learning based dialogue dialog interactive recommendation, in: AAAI, 2020. systems: A systematic survey, Artificial intelligence [14] S. Uppal, S. Bhagat, D. Hazarika, N. Majumder, review ( 2022 ) 1–101. S. Poria, R. Zimmermann, A. Zadeh, Multimodal re[5] Y. Dai, H. Li, Y. Li, J. Sun, F. Huang, L. Si, X. Zhu, Pre- search in vision and language: A review of current view, attend and review: Schema-aware curriculum and emerging trends, Information Fusion 77 ( 2022 ) learning for multi-domain dialogue state tracking, 149–171. URL: https://www.sciencedirect.com/ in: Proceedings of the 59th Annual Meeting of the science/article/pii/S1566253521001512. doi:https: Association for Computational Linguistics and the //doi.org/10.1016/j.inffus.2021.07.009. 11th International Joint Conference on Natural Lan- [15] G. Penha, C. Hauf, What does bert know about guage Processing (Volume 2: Short Papers), Associ- books, movies and music? probing bert for conation for Computational Linguistics, Online, 2021, versational recommendation, in: Proceedings of pp. 879–885. URL: https://aclanthology.org/2021. the 14th ACM Conference on Recommender Sysacl-short.111. doi:10.18653/v1/2021.acl-short. tems, RecSys ’20, Association for Computing Ma111. chinery, New York, NY, USA, 2020, p. 388–397. [6] Z. Jiang, F. F. Xu, J. Araki, G. Neubig, How can we doi:10.1145/3383313.3412249. know what language models know?, Transactions [16] F. Radlinski, C. Boutilier, D. Ramachandran, I. Venof the Association for Computational Linguistics 8 drov, Subjective attributes in conversational recom( 2020 ) 423–438. mendation systems: challenges and opportunities, [7] C. Gao, W. Lei, X. He, M. de Rijke, T.-S. Chua, in: Proceedings of the AAAI Conference on ArtifiAdvances and challenges in conversational cial Intelligence, volume 36, 2022, pp. 12287–12293. recommender systems: A survey, AI Open 2 (2021) [17] D. Jannach, Evaluating conversational recom100–126. URL: https://www.sciencedirect.com/ mender systems: A landscape of research, Artificial science/article/pii/S2666651021000164. doi:https: Intelligence Review 56 ( 2023 ) 2365–2400. //doi.org/10.1016/j.aiopen.2021.06.002. [18] D. Kang, A. Balakrishnan, P. Shah, P. A. [8] R. Li, S. Kahou, H. Schulz, V. Michalski, L. Charlin, Crook, Y. Boureau, J. Weston, RecommendaC. Pal, Towards deep conversational recommen- tion as a communication game: Self-supervised dations, in: Proceedings of the 32nd International bot-play for goal-oriented dialogue, CoRR Conference on Neural Information Processing Sys- abs/1909.03922 ( 2019 ). URL: http://arxiv.org/abs/ tems, NIPS’18, Curran Associates Inc., Red Hook, 1909.03922. arXiv:1909.03922.

NY, USA, 2018, p. 9748–9758. [19] A. Manzoor, D. Jannach, Inspired2: An improved [9] Z. Liu, H. Wang, Z.-Y. Niu, H. Wu, W. Che, T. Liu, To- dataset for sociable conversational recommendawards conversational recommendation over multi- tion, arXiv preprint arXiv:2208.04104 ( 2022 ). type dialogs, in: Proceedings of the 58th An- [20] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: nual Meeting of the Association for Computa- Pre-training of deep bidirectional transformers for tional Linguistics, Association for Computational language understanding, in: Proceedings of the Linguistics, 2020, pp. 1036–1049. URL: https:// 2019 Conference of the North American Chapaclanthology.org/2020.acl-main.98. doi:10.18653/ ter of the Association for Computational Linguisv1/2020.acl-main.98. tics: Human Language Technologies, Volume 1 [10] K. Christakopoulou, F. Radlinski, K. Hofmann, To- (Long and Short Papers), Association for Comwards conversational recommender systems, in: putational Linguistics, Minneapolis, Minnesota, Proceedings of the 22nd ACM SIGKDD interna- 2019, pp. 4171–4186. URL: https://aclanthology.org/ tional conference on knowledge discovery and data N19-1423. doi:10.18653/v1/N19-1423. mining, 2016, pp. 815–824. [21] Y. Zhang, S. Sun, M. Galley, Y.-C. Chen, C. Brockett, [11] H. Weld, X. Huang, S. Long, J. Poon, S. C. Han, X. Gao, J. Gao, J. Liu, B. Dolan, Dialogpt: Largescale generative pre-training for conversational re- alKG: Explainable conversational reasoning with sponse generation, arXiv preprint arXiv:1911.00536 attention-based walks over knowledge graphs, in: ( 2019 ). Proceedings of the 57th Annual Meeting of the As[22] Y. Deng, W. Zhang, W. Xu, W. Lei, T.-S. Chua, sociation for Computational Linguistics, AssociaW. Lam, A unified multi-task learning framework tion for Computational Linguistics, Florence, Italy, for multi-goal conversational recommender sys- 2019, pp. 845–854. URL: https://aclanthology.org/ tems, ACM Trans. Inf. Syst. 41 ( 2023 ). doi:10.1145/ P19-1081. doi:10.18653/v1/P19-1081. 3570640. [32] L. Wang, H. Hu, L. Sha, C. Xu, K. Wong, D. Jiang, [23] C. Yang, Y. Hou, Y. Song, T. Zhang, J.-R. Wen, W. X. Finetuning large-scale pre-trained language Zhao, Modeling two-way selection preference for models for conversational recommendation person-job fit, in: Proceedings of the 16th ACM with knowledge graph, CoRR abs/2110.07477 Conference on Recommender Systems, RecSys ’22, (2021). URL: https://arxiv.org/abs/2110.07477. Association for Computing Machinery, New York, arXiv:2110.07477.

NY, USA, 2022, p. 102–112. doi:10.1145/3523227. [33] J. Zhang, Y. Yang, C. Chen, L. He, Z. Yu, KERS: 3546752. A knowledge-enhanced framework for recommen[24] S. Geng, S. Liu, Z. Fu, Y. Ge, Y. Zhang, Recommen- dation dialog systems with multiple subgoals, dation as language processing (rlp): A unified pre- in: Findings of the Association for Computatrain, personalized prompt predict paradigm (p5), tional Linguistics: EMNLP 2021, Association for in: Proceedings of the 16th ACM Conference on Computational Linguistics, Punta Cana, DominiRecommender Systems, RecSys ’22, Association for can Republic, 2021, pp. 1092–1101. URL: https: Computing Machinery, New York, NY, USA, 2022, //aclanthology.org/2021.findings-emnlp.94. doi:10. p. 299–315. doi:10.1145/3523227.3546767. 18653/v1/2021.findings-emnlp.94. [25] T. N. Kipf, M. Welling, Semi-supervised classifi- [34] J. Zhou, B. Wang, R. He, Y. Hou, CRFR: Improvcation with graph convolutional networks, CoRR ing conversational recommender systems via flexabs/1609.02907 (2016). URL: http://arxiv.org/abs/ ible fragments reasoning on knowledge graphs, 1609.02907. arXiv:1609.02907. in: Proceedings of the 2021 Conference on Em[26] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. v. d. Berg, pirical Methods in Natural Language ProcessI. Titov, M. Welling, Modeling relational data with ing, Association for Computational Linguistics, graph convolutional networks, in: European se- 2022, pp. 4324–4334. URL: https://aclanthology. mantic web conference, Springer, 2018, pp. 593–607. org/2021.emnlp-main.355. doi:10.18653/v1/2021. [27] W.-C. Kang, J. McAuley, Self-attentive sequential emnlp-main.355.

recommendation, 2018. URL: http://arxiv.org/abs/ [35] W. Ma, R. Takanobu, M. Tu, M. Huang, Bridg1808.09781. arXiv:1808.09781 [cs]. ing the gap between conversational reason[28] T. Zhang, Y. Liu, P. Zhong, C. Zhang, H. Wang, ing and interactive recommendation, CoRR C. Miao, KECRS: Towards knowledge-enriched abs/2010.10333 ( 2020 ). URL: https://arxiv.org/abs/ conversational recommendation system, 2010.10333. arXiv:2010.10333. 2021. URL: http://arxiv.org/abs/2105.08261. [36] H. Xu, S. Moon, H. Liu, B. Liu, P. Shah, P. S. Yu, arXiv:2105.08261 [cs]. User memory reasoning for conversational recom[29] Y. Zhou, K. Zhou, W. X. Zhao, C. Wang, P. Jiang, mendation, arXiv preprint arXiv:2006.00184 ( 2020 ).

H. Hu, C2-crs: Coarse-to-fine contrastive learning [37] H. Xu, S. Moon, H. Liu, B. Liu, P. Shah, B. Liu, for conversational recommender system, in: Pro- P. Yu, User memory reasoning for conversaceedings of the Fifteenth ACM International Con- tional recommendation, in: Proceedings of the ference on Web Search and Data Mining, 2022, pp. 28th International Conference on Computational 1488–1496. Linguistics, International Committee on Compu[30] Q. Chen, J. Lin, Y. Zhang, M. Ding, Y. Cen, H. Yang, tational Linguistics, Barcelona, Spain (Online), J. Tang, Towards knowledge-based recommender 2020, pp. 5288–5308. URL: https://aclanthology. dialog system, in: Proceedings of the 2019 Confer- org/2020.coling-main.463. doi:10.18653/v1/2020. ence on Empirical Methods in Natural Language coling-main.463.

Processing and the 9th International Joint Con- [38] W. Li, W. Wei, X. Qu, X.-L. Mao, Y. Yuan, W. Xie, ference on Natural Language Processing (EMNLP- D. Chen, TREA: Tree-structure reasoning schema IJCNLP), Association for Computational Linguis- for conversational recommendation, in: Proceedtics, Hong Kong, China, 2019, pp. 1803–1813. URL: ings of the 61st Annual Meeting of the Association https://aclanthology.org/D19-1189. doi:10.18653/ for Computational Linguistics (Volume 1: Long v1/D19-1189. Papers), Association for Computational Linguis[31] S. Moon, P. Shah, A. Kumar, R. Subba, OpenDi- tics, Toronto, Canada, 2023, pp. 2970–2982. URL: acl-main.305. doi:10.18653/v1/2020.acl-main. minican Republic, 2021, pp. 1913–1918. URL: https: 305. //aclanthology.org/2021.emnlp-main.145. doi:10. [58] J. Zou, E. Kanoulas, P. Ren, Z. Ren, A. Sun, C. Long, 18653/v1/2021.emnlp-main.145.

Improving conversational recommender systems [66] Y. Wu, C. Macdonald, I. Ounis, Multimodal convia transformer-based sequential modelling, in: versational fashion recommendation with posiProceedings of the 45th International ACM SIGIR tive and negative natural-language feedback, in: Conference on Research and Development in Infor- Proceedings of the 4th Conference on Conversamation Retrieval, SIGIR ’22, Association for Com- tional User Interfaces, CUI ’22, Association for puting Machinery, New York, NY, USA, 2022, p. Computing Machinery, New York, NY, USA, 2022. 2319–2324. doi:10.1145/3477495.3531852. doi:10.1145/3543829.3543837. [59] S. Li, R. Xie, Y. Zhu, X. Ao, F. Zhuang, Q. He, [67] R. Lowe, M. D. Noseworthy, I. V. Serban, User-centric conversational recommendation with N. Angelard-Gontier, Y. Bengio, J. Pineau, Tomulti-aspect user modeling, in: Proceedings wards an automatic turing test: Learning to of the 45th International ACM SIGIR Confer- evaluate dialogue responses, CoRR abs/1708.07149 ence on Research and Development in Infor- (2017). URL: http://arxiv.org/abs/1708.07149. mation Retrieval, 2022, pp. 223–233. URL: http: arXiv:1708.07149. //arxiv.org/abs/2204.09263. doi:10.1145/3477495. [68] C. Zhang, L. F. D’Haro, R. E. Banchs, T. Friedrichs, 3532074. arXiv:2204.09263 [cs]. H. Li, Deep am-fm: Toolkit for automatic dialogue [60] L. Wang, S. Joty, W. Gao, X. Zeng, K.-F. Wong, evaluation, in: Conversational Dialogue Systems Improving conversational recommender system for the Next Decade, Springer, 2021, pp. 53–69. via contextual and time-aware modeling with less [69] S. Zhang, K. Balog, Evaluating conversational recdomain-specific knowledge, 2022. URL: http://arxiv. ommender systems via user simulation, in: Proorg/abs/2209.11386. arXiv:2209.11386 [cs]. ceedings of the 26th ACM SIGKDD International [61] F. Sun, J. Liu, J. Wu, C. Pei, X. Lin, W. Ou, P. Jiang, Conference on Knowledge Discovery amp; Data Bert4rec: Sequential recommendation with bidirec- Mining, KDD ’20, Association for Computing Mational encoder representations from transformer, chinery, New York, NY, USA, 2020, p. 1512–1520. CoRR abs/1904.06690 ( 2019 ). URL: http://arxiv.org/ doi:10.1145/3394486.3403202.

abs/1904.06690. arXiv:1904.06690. [70] A. Manzoor, D. Jannach, Generation-based vs [62] B. Peng, M. Galley, P. He, H. Cheng, Y. Xie, Y. Hu, retrieval-based conversational recommendation: A Q. Huang, L. Liden, Z. Yu, W. Chen, et al., Check user-centric comparison, in: Proceedings of the your facts and try again: Improving large language 15th ACM Conference on Recommender Systems, models with external knowledge and automated RecSys ’21, Association for Computing Machinery, feedback, arXiv preprint arXiv:2302.12813 ( 2023 ). New York, NY, USA, 2021, p. 515–520. doi:10.1145/ [63] Z. Liu, H. Wang, Z.-Y. Niu, H. Wu, W. Che, DuRec- 3460231.3475942.

Dial 2.0: A bilingual parallel corpus for conversa- [71] S. Zhang, K. Balog, Evaluating conversational tional recommendation, in: Proceedings of the recommender systems via user simulation, in: 2021 Conference on Empirical Methods in Natu- Proceedings of the 26th ACM SIGKDD Internaral Language Processing, Association for Compu- tional Conference on Knowledge Discovery & tational Linguistics, Online and Punta Cana, Do- Data Mining, 2020, pp. 1512–1520. URL: http: minican Republic, 2021, pp. 4335–4347. URL: https: //arxiv.org/abs/2006.08732. doi:10.1145/3394486. //aclanthology.org/2021.emnlp-main.356. doi:10. 3403202. arXiv:2006.08732 [cs]. 18653/v1/2021.emnlp-main.356. [72] D. Jannach, A. Manzoor, End-to-end learning for [64] H. Hu, X. He, J. Gao, Z.-L. Zhang, Modeling person- conversational recommendation: A long way to alized item frequency information for next-basket go?, in: IntRS@ RecSys, 2020, pp. 72–76. recommendation, in: Proceedings of the 43rd In- [73] P. P. Rau, Y. Li, D. Li, Efects of communiternational ACM SIGIR Conference on Research cation style and culture on ability to accept and Development in Information Retrieval, 2020, recommendations from robots, Computpp. 1071–1080. ers in Human Behavior 25 (2009) 587–595. [65] V. Bursztyn, J. Healey, N. Lipka, E. Koh, D. Downey, URL: https://www.sciencedirect.com/science/ L. Birnbaum, “it doesn’t look good for a date”: article/pii/S0747563208002367. doi:https: Transforming critiques into preferences for conver- //doi.org/10.1016/j.chb.2008.12.025, sational recommendation systems, in: Proceedings including the Special Issue: State of the Art of the 2021 Conference on Empirical Methods in Research into Cognitive Load Theory. Natural Language Processing, Association for Com- [74] D. Pramod, P. Bafna, Conversational recommender putational Linguistics, Online and Punta Cana, Do- systems techniques, tools, acceptance, and adopREDIAL [8] TG-ReDial* [54] DuRecDial* [9, 63] GoRecDial [18] OpenDialKG [31] INSPIRED [12, 19] MultiWoz [81]

Description First CRS dataset collected from crowd workers using a paired mechanism, where one person acts as a recommender and the other person acts as a movie seeker. Crowd workers are free to generate dialogues that meet the basic quality instructions.

A Chinese CRS datasets with topic-guided dialogues. Using real watching records of real online users to create diferent topic threads that further generate conversations.

A bi-lingual CRS datasets with additional annotation of users’ profile, dialogue goals(QA, chitchat, recommendation) and knowledge. It is collected in Chinese with paired mechanisms and translated into the English version.

A goal-driven CRS dataset where the recommender aims to look for the target items by chatting with the seeker. A pair mechanism is adopted and candidate items are provided for each conversation.

A dialogue dataset on movie and book domain with annotated knowledge graphs and relation paths related to each conversation.

First CRS dataset proposed to create dialogues with diferent social strategies and preference elicitation strategies using the paired mechanism. Crowd workers are asked to finish 3 pre-task personality tests and a post-task survey with demographic questions.

A large transcript of human-to-human dialogues among 7 domains, eg: hotels, restaurants, attractions, taxis, trains, hospitals, police. It contains a large corpus of multi-domain dialogues with labelled dialogue states.

https://aclanthology.org/ 2023 . acl-long . 167 . [48]

He ,

Xie ,

Jha ,

Steck ,

Liang ,

Feng , B. P. [39]

Liao ,

Takanobu ,

Ma ,

Yang ,

Huang , Majumder,

Kallus , J. McAuley , Large language

travel, CoRR abs/ 1907 .00710 ( 2019 ). URL: http: arXiv preprint arXiv:2308.10053 ( 2023 ).

//arxiv.org/abs/ 1907 .00710. arXiv: 1907 . 00710 . [49]

Liu ,

Zhou , H. Liu,

Wang ,

Z.-Y.

Niu , [40]

Lu ,

Bao ,

Song ,

Ma , S. Cui,

Wu ,

He ,

Wu ,

Che , T. Liu,

Xiong , Graph-grounded

ommendation , 2021 . URL: http://arxiv.org/abs/2106. ( 2022 ) 1 - 1 . doi: 10 .1109/TKDE. 2022 . 3147210 , con-

00957. arXiv: 2106 .00957 [cs]. ference Name: IEEE Transactions on Knowledge [41]

Li ,

Peng ,

Shen ,

Mao ,

Liden ,

Yu , and Data Engineering.

Gao , Knowledge-grounded dialogue genera- [50]

Manzoor ,

Jannach , Generation-based vs .

Proceedings of the 2022 Conference of the North user-centric comparison , in: Fifteenth ACM Con-

putational Linguistics: Human Language Tech- pp. 515 - 520 . URL: https://dl.acm.org/doi/10.1145/

nologies , Association for Computational Linguis- 3460231 .3475942. doi: 10 .1145/3460231.3475942.

tics , Seattle, United States, 2022 , pp. 206 - 218 . [51]

Manzoor ,

Jannach , Towards retrieval-based

URL: https://aclanthology.org/ 2022 .naacl-main. 15 . conversational recommendation, Information Sys-

doi:10 .18653/v1/ 2022 .naacl-main. 15 . tems ( 2022 ) 102083 . [42]

Yang , C. Han,

Li ,

Zuo ,

Yu , Improving con- [52]

Wang ,

Zhou ,

J.-R.

Wen ,

W. X.

Zhao , Towards

ings of the Association for Computational Linguis- ceedings of the 28th ACM SIGKDD Conference

tics: NAACL 2022 , Association for Computational on Knowledge Discovery and Data Mining , KDD

Linguistics , Seattle, United States, 2022 , pp. 38 - 48 . '22, Association for Computing Machinery, New

URL: https://aclanthology.org/ 2022 .findings-naacl. York, NY, USA, 2022 , p. 1929 - 1937 . doi: 10 .1145/

4. doi: 10 .18653/v1/ 2022 .findings-naacl. 4 . 3534678.3539382. [43]

Zhang ,

Xin ,

Li ,

Liu ,

Ren ,

Chen , [53]

Liang ,

Hu ,

Xu ,

Miao ,

He ,

Chen ,

mendation, in: Proceedings of the Sixteenth ACM in: Proceedings of the 2021 Conference on Em-

Mining , 2023 , pp. 231 - 239 . ing, Association for Computational Linguistics, [44]

Zhou ,

W. X.

Zhao ,

Bian ,

Zhou ,

J.-R.

Wen , Online and

Punta

Cana , Dominican Republic,

Yu , Improving conversational recommender sys- 2021 , pp. 7821 - 7833 . URL: https://aclanthology.

tems via knowledge graph based semantic fusion , org/ 2021 .emnlp-main. 617 . doi: 10 .18653/v1/ 2021 .

in: Proceedings of the 26th ACM SIGKDD Interna- emnlp-main.617.

tional Conference on Knowledge Discovery & Data [54]

Zhou ,

W. X.

Zhao ,

Wang ,

J.-R.

Wen ,

Mining , 2020 , pp. 1006 - 1014 . Towards topic-guided conversational recommender [45] J. Wu , B.

Yang , D.

Li , L.

Deng , A semantic relation- system, 2020 .

aware deep neural network model for end-to-

end [55] L.

Liao , R.

Takanobu , Y.

Ma , X.

Yang , M.

Huang , T.-S.

puting 132 ( 2023 ) 109873 . in multiple domains, IEEE Transactions on Knowl [46]

Zhou ,

Wang ,

Huang ,

Zhao ,

Huang , edge and Data Engineering 34 ( 2022 ) 2485 - 2496 .

He ,

Hou , Aligning recommendation and con- doi:10.1109/TKDE. 2020 . 3008563 .

versation via dual imitation , in: Proceedings of [56]

Lin ,

Wang ,

Li , Target-guided knowledge-

the 2022 Conference on Empirical Methods in Nat- aware recommendation dialogue system: An em-

tational Linguistics , Abu Dhabi, United Arab Emi- KaRS & ComplexRec Workshop. CEUR-WS, 2021 .

rates , 2022 , pp. 549 - 561 . URL: https://aclanthology. [57]

Zeng ,

Li ,

Wang ,

Mao , K.-F. Wong ,

org/2022.emnlp-main.36 . Dynamic online conversation recommendation , [47]

Petroni ,

Rocktäschel ,

Lewis , A . Bakhtin, in : Proceedings of the 58th Annual Meeting of

knowledge bases? , arXiv preprint arXiv: 1909 .01066 Association for Computational Linguistics, 2020 ,

( 2019 ). pp. 3331 - 3341 . URL: https://aclanthology.org/ 2020 .

Applications 203 ( 2022 ) 117539 . tics, Brussels, Belgium, 2018 , pp. 5016 - 5026 . [75] H.-C. Kuo , Y.-N. Chen , Zero-shot prompting for im- URL: https://aclanthology.org/D18-1547.

plicit intent prediction and recommendation with doi : 10 .18653/v1/ D18 - 1547.

commonsense reasoning , in: Findings of the [82]

Iovine ,

Narducci , M. de Gemmis , A dataset

2023, Association for Computational Linguistics, systems ., in: CLiC-it, 2019 .

Toronto , Canada, 2023 , pp. 249 - 258 . URL: https: [83]

Lu ,

Bao ,

Ma , X. Han, Y . Wu,

Cui ,

He , AU-

//aclanthology.org/ 2023 .findings-acl.17. GUST : an automatic generation understudy for syn [76]

Wang ,

Lin ,

Feng ,

He , T.-S. Chua, thesizing conversational recommendation datasets,

generation recommender paradigm , arXiv preprint Linguistics: ACL 2023 , Association for Compu-

arXiv:2304.03516 ( 2023 ). tational Linguistics , Toronto, Canada, 2023 , pp. [77] T. B. Brown , B.

Mann , N.

Ryder , M. Subbiah, 10538 - 10549 . URL: https://aclanthology.org/ 2023 .

Kaplan ,

Dhariwal ,

Neelakantan , P. Shyam, findings-acl. 670 .

D. M. Ziegler , J.

Wu , C.

Winter , C.

Hesse , M. Chen, 9 . Appendix

Amodei , Language models are few-shot learners, 9.1 . Datasets for CRS

in: Proceedings of the 34th International Confer- We provide a detailed description of each dataset in Table

ence on Neural Information Processing Systems , 3. From the perspective of language, each dataset has

NIPS'20 , Curran Associates Inc., Red

Hook

, NY, a diferent focus . GoRecDial [ 18] uses a game setting

USA , 2020 . to guide the dialogues while TG-Redial [54] uses topic [78]

Thoppilan ,

D. D.

Freitas ,

Hall , N. Shazeer, to guide the crowd workers . That guidance are utilized

Du ,

Li ,

Lee ,

H. S.

Zheng ,

Ghafouri , alKG [31] pairs each conversation with a corresponding

Menegali ,

Huang ,

Krikun ,

Lepikhin , KG path while DuRecDial [9] further includes user pro-

Doshi ,

R. D.

Santos ,

Duke ,

Soraker , B. Zeven- models . INSPIRED [12] emphasizes more on the so-

Aroyo ,

Rajakumar ,

Butryna , M. Lamm, egy. MultiWoz [81] collects mainly human-to-human

plications, CoRR abs/2201 .08239 ( 2022 ). URL: https: life. Other datasets that are not publicly available are not

//arxiv.org/abs/2201.08239. arXiv: 2201 .08239. included in this survey [ 36 , 82 , 83] [79]

Chen ,

Liu ,

Gao ,

Jiao ,

Zhang , Y. Ji, Hitter:

beddings, CoRR abs/ 2008 .12813 ( 2020 ). URL: https:

//arxiv.org/abs/ 2008 .12813. arXiv: 2008 . 12813 . [80]

Li ,

Wang ,

Li ,

Fu ,

Shen ,

Shang ,

tion , arXiv preprint arXiv:2305.13731 ( 2023 ). [81]

Budzianowski ,

T.-H.

Wen ,

B.-H.

Tseng ,

elling, in: Proceedings of the 2018 Conference