=Paper=
{{Paper
|id=Vol-2431/paper11
|storemode=property
|title=Towards a Taxonomy of User Feedback Intents for Conversational Recommendations
|pdfUrl=https://ceur-ws.org/Vol-2431/paper11.pdf
|volume=Vol-2431
|authors=Wanling Cai,Li Chen
|dblpUrl=https://dblp.org/rec/conf/recsys/CaiC19
}}
==Towards a Taxonomy of User Feedback Intents for Conversational Recommendations==
Towards a Taxonomy of User Feedback Intents for Conversational Recommendations Wanling Cai Li Chen Department of Computer Science Department of Computer Science Hong Kong Baptist University Hong Kong Baptist University Hong Kong, China Hong Kong, China cswlcai@comp.hkbu.edu.hk lichen@comp.hkbu.edu.hk ABSTRACT Understanding users’ feedback on recommendation in natural language is crucially important for assisting the system to refine its understanding of the user’s preferences and provide more accurate recommendations in the subsequent interactions. In this paper, we report the results of an exploratory study on a human-human dialogue dataset centered around movie recommendations. In particular, we manually labeled a set of over 200 dialogues at the utterance level, and then conducted descriptive analysis on them from both seekers’ and recommenders’ perspectives. The results reveal not only seekers’ feedback intents as well as the types of preferences they have expressed, but also the reactions of human recommenders that have finally led to successful recommendation. A taxonomy for feedback intents is established along with the results, which could be constructive for improving conversational recommender systems. CCS CONCEPTS • Human-centered computing → Empirical studies in interaction design; User models; • In- formation systems → Recommender systems. ACM RecSys 2019 Late-breaking Results, 16th-20th September 2019, Copenhagen, Denmark Copyright ©2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Taxonomy of User Feedback Intents ACM RecSys 2019 Late-breaking Results, 16th-20th September 2019, Copenhagen, Denmark KEYWORDS Related Work Dialogue-based conversational recommender systems; user feedback; intent taxonomy Current dialogue-based conversational recommender systems (DCRS) have mostly focused on question generation INTRODUCTION and selection before giving the rec- In recent years, dialogue systems have become increasingly popular in our daily life, with applications ommendation. For instance, an active in various domains such as education, healthcare, e-commerce, business, etc. They often mimic learning and bandit learning based human-like behavior to converse with users for addressing their chit-chatting or information-seeking conversational framework was proposed requirements [9]. Given that users often explicitly request recommendations when they communicate in [3], which aims to adjust question with a task-oriented dialogue system [9], more efforts have been put in integrating recommending selection strategy in real time. [8] trained approaches into the system, so called the Dialogue-based Conversational Recommender System (DCRS) a deep policy network to decide when the [3]. However, most of existing systems have provided one-shot recommendation, with the focus on system should conduct facet preference selecting most informative questions to ask users [3, 8]. The dialogue often ends when the system elicitation. However, little work on DCRS produces one or a list of recommendations to the user (see related work in the left bar). But in has explicitly studied the feedback issue reality, users may not get the desired recommendation within a single turn, in which case it becomes that occurs when the user is not satisfied important to allow users to freely provide their feedback on the recommendation, so that the system with the current recommendation. In could help them to find the desired item in the subsequent interactions. Our work is actually motivated the broader area, critiquing-based recom- by the real dialogue that can occur between two persons [6]. For example, if a seeker does not like mender systems [2] have been developed the recommended movies from the recommender, s/he can give feedback such as “I don’t like any of to elicit users’ feedback in graphical those movies, too much talking” to refine her/his preferences. The user feedback issue has been studied user interfaces (GUI), for which several in a broader area of recommender systems, such as critiquing-based recommender systems that elicit major types of critiquing are supported users’ feedback in graphical user interfaces [2], but little work has been done on user feedback in such as user-initiated critiquing and natural language. Since the language-based feedback can be in diverse, free styles, it is meaningful to system-suggested critiques. However, this investigate how users express it (e.g., what intents they may have and what kinds of preferences they kind of system limits the way users can want to convey), which should be constructive for developing more dedicated preference elicitation post their feedback since their interactions and intent prediction strategies for DCRS. Therefore, we manually labeled a set of human-human are restricted to traditional GUI elements dialogues (over 200) centered around movie recommendations [6] with our established taxonomy for (e.g., menu, form, button). The advantage user feedback intents, and analyzed the utterances starting from the point when a seeker did not like of dialogue systems is that the interaction one recommendation till s/he accepted another one. The results analysis reveals not only the seeker’s is not limited to a pre-defined procedure intents and preferences, but also the human recommender’s responses that eventually helped the or a fixed set of attributes. But to the seeker find a satisfactory item. It is hence inspiring for boosting the human-like aspect of current best of our knowledge, few studies have dialogue systems. investigated users’ goals, intents, and ways of expressing preferences when they interact with DCRS [4], not to mention DIALOGUE-BASED RECOMMENDATION DATASET their feedback on recommendations. In this section, we present our data selection, taxonomy definition, and data annotation procedure. Taxonomy of User Feedback Intents ACM RecSys 2019 Late-breaking Results, 16th-20th September 2019, Copenhagen, Denmark Data Selection 1 https://redialdata.github.io/website/ The original ReDail1 dataset contains 11,348 human-human dialogues [6]. We first filtered out dia- 2 One conversation turn denotes a consecu- logues with less than 3 conversation turns2 . We also removed those with inconsistent answers from tive utterance-response pair: Utterance is from seekers and recommenders regarding the post-conversation reflective questions, because it may be seeker and response is from recommender. due to carelessness or dialogue ambiguity [6]. We then selected the dialogues containing at least two movies suggested by the recommender, among which one was not liked by the seeker while another subsequent recommendation was accepted by her/him. This process was mainly to capture Table 1: Statistics of our selected dialogue the seeker’s feedback on recommendation in case s/he was not satisfied with it, as well as how the data (from ReDail [6]) human recommender responded to the seeker and helped her/him to find a satisfactory item later. As a result, we got 225 dialogues (see Table 1 with the statistics of our selected dialogue data). Items Dialogue data # Conversations 225 # Human seekers 111 (# utterances: 1,537) Taxonomy for User Feedback Intents # Human recommenders 134 (# utterances: 1,565) Based on literature survey, we first established an initial taxonomy to classify user feedback on # Movies suggested 1,096 # Turns per dialogue mean=6.64, min=3, max=19 recommendations, which basically covers all of the feedback types, such as the three types of feedback # Words per utterance mean=10.83, min=1, max=72 modality (i.e., similarity-based, quality-based, quantity-based) in critiquing-based recommender systems [2], the session-aware intents (i.e., add filter condition, see-more, negation) in task-oriented dialogue systems [9], and the follow-up query strategies (i.e., refine, reformulate, and start over) when Data Annotation users ask for recommendations [4]. Then, we refined the taxonomy by applying the open coding Two annotators were involved into the labeling and theme identification approaches [7] to our dialogue data. Four new categories (i.e., Inquire, Seen, work. They were instructed to carefully read Provide Details, Ask) were added through keywords-in-context method; and some existing categories the taxonomy table before they started. For were modified or merged into seven categories (i.e., Reject, Critique-Add, Critique-Compare, Critique- each utterance, the annotator was encouraged Feature, Restate, Restate with Further Constraints, Restate with Clarification) based on the real dialogues to choose all labels that s/he thinks can repre- through constant comparison method. We went through the standard classification procedure (i.e., sent the seeker’s intents. They first indepen- propose-annotate-refine) three times, and finally came up with the taxonomy for user feedback intents dently labeled 143 random dialogues. The inter- (see Table 2). The data annotation work is shown in the left bar. rater agreement across their intent labels is 0.87 (through Fuzzy Kappa [5]), which indicates sat- isfactory annotation quality and consistency. DATA ANALYSIS & RESULTS They then labeled the remaining dialogues, and Seeker Feedback Intents and Preference Expression met to discuss and resolve disagreements. Feedback Intent Distribution. The feedback intent distribution is shown in Table 2, where we can see Reject, Seen, Critique-Feature, Provide Details, and Inquire more frequently occur than others, which suggests that the seeker may tend to explicitly express her/his negative opinions on a recommen- dation, and attempt to explain why s/he dislikes it as well as providing more preference info to the recommender. Relatively, some seekers are also inclined to critique the recommendation by adding further constraints, or start a new query if they feel it is difficult to receive a satisfactory result with the current query. Taxonomy of User Feedback Intents ACM RecSys 2019 Late-breaking Results, 16th-20th September 2019, Copenhagen, Denmark Table 2: A taxonomy for user feedback intents during the interaction with a dialogue-based recommender, and intent distribution in our dataset User Feedback Intent (Code) Description Example Percentage Reject (REJ) Seeker dislikes the recommended item. “I hated that movie. I did not even crack a smile once.” 19.2% Seen (SEE) Seeker has seen the recommended item before. “I have seen that one and enjoyed it.” 16.3% Critique-Feature (CRI-F) Seeker makes critique on specific features of the current recommendation. “That’s a bit too scary for me.” 11.8% Provide Details (PRO) Seeker provides detailed preferences for the item s/he is looking for. “I usually enjoy movies with Seth Rogen and Jonah Hill.” 11.7% Inquire (INQ) Seeker wants to know more about the recommended item. “I haven’t seen that one yet. What’s it about?” 10.9% Critique-Add (CRI-A) Seeker adds further constraints on top of the current recommendation. “I would like something more recent.” 8.5% Start Over (STO) Seeker starts a new query. “Anything that I can watch with my kids under 10.” 5.2% Neutral Response (NRE) Seeker does not indicate her/his preferences for the current recommendation. “I have actually never seen that one.” 5.1% Critique-Compare (CRI-C) Seeker requests something similar to the current recommendation. “Den of Thieves (2018) sounds amazing. Any others like that?” 2.9% Answer (ANS) Seeker answers the question issued by the recommender. “Maybe something with more action.” (Q: “What kind of fun movie you look for?”) 2.8% Ask (ASK) Seeker asks the recommender’s personal opinions. “I really like Reese Witherspoon. How about you?” 1.6% Restate with Further Constraints (RES-CO) Seeker restates her/his query with further constraints. “Do you have something that is a thriller but not too scary?” 1.6% Restate (RES) Seeker completely restates her/his query. “Maybe I am not being clear. I want something that is in the theater now.” 1.5% Restate with Clarification (RES-CL) Seeker restates her/his query with clarification. “I’m fine with any sort of horrors, jump scares, clowns, etc.” 0.4% Others (OTH) The utterance cannot be categorized into any other categories. “Sorry about the weird typing.” 0.4% Intent Co-occurrence. We find 40.5% of utterances contain more than one intent label. The undirected NRE ASK graph of feedback intent co-occurrence weighted by the co-occurrence frequency is shown in Figure 1. It can be seen that Reject often co-occurs with Critique-Feature, Critique-Add, Seen, Provide Details, and INQ Start Over, which may explain the reasons why some seekers reject a recommendation, i.e., because it CRI-C SEE PRO does not satisfy their preferences for some specific features, miss values on some constraints they CRI-A REJ have not stated, or it was already seen by the seeker. Besides, rather than critiquing the current CRI-F recommendation, some seekers try to provide more detailed preferences, or start a new query when they reject an item. STO Preference Expression. We then analyzed how seekers actually express their preferences in the feedback, RES-CO which is inspired by [4] that classifies user queries into three-level goals: objective, subjective, and navigational 3 . We refined this classification scheme by linking them to the concepts that the seeker Figure 1: Seeker feedback intent co- may mention [1]: Entity (like a movie or a series of movies that can be with subjective or navigational occurrence. The edge with larger weight goal), attribute (with objective or subjective goal), and purpose (the general uses of the item, e.g., (co-occurrence frequency greater than 10) “Anything that I can watch with my kids under 10?”). The results show that seekers more frequently is in solid line and that with lower weight express their preferences at the attribute level, which is much more often than the mentions of entity in dashed line. and purpose concepts (see Figure 2). Moreover, they like to express subjective opinions on entity when 3 Objective refers to the user’s criteria on the they mention it, but have more objective criteria for attributes (slightly higher than the proportion of item’s attributes (e.g., movie genre), subjective attribute-level subjective goals). involves the user’s emotional or opinionate pref- erences (e.g., “happy movie”), and navigational Recommender Actions is in relation to a movie the user refers to (e.g., From the human recommender’s perspective, we investigated what actions s/he may carry out in “Star Wars movies”). response to the seeker’s feedback. We first identified five major types of actions (see Table 3), and then Taxonomy of User Feedback Intents ACM RecSys 2019 Late-breaking Results, 16th-20th September 2019, Copenhagen, Denmark 0.6 60.17% asked annotators to label all recommenders’ responses. From Table 3, we can see that, in nearly half of the cases, the recommender tends to recommend one or more other items when the seeker rejects Percentange of utterances (N=462) 0.5 Subjective the current one. In the other cases, the recommender tries to explain why the new recommendation 0.4 would be good to the seeker, respond to the seeker’s requests, answer the seeker’s explicit question, or ask for the seeker’s preferences. 0.3 25.54% FUTURE WORK 0.2 ubjective O jb ective S In this work, we established a taxonomy for user feedback intents and analyzed a set of human-human 0.1 4.98% dialogues centered around movie recommendations. As the next step, we plan to label more dialogues Navigational to further validate the taxonomy. We also want to perform temporal analysis so as to reveal the 0.0 Entity Attribute Purpose frequent conversation patterns that may occur between seekers and recommenders. Based on the findings from our analysis, we intend to develop a dedicated user intent prediction model to predict Figure 2: Seeker preference expression on the three concepts respectively: entity, at- users’ intents given their utterances, which is believed as an important component that could help tribute, and purpose. Note that our dataset DCRS to track users’ current states, refine their preference model, and then select an approporiate consists of 462 utterances which start from action to respond to users. the point when a seeker did not like REFERENCES one recommendation till s/he accepted an- other one. [1] Joyce Yue Chai, Malgorzata Budzikowska, Veronika Horvath, Nicolas Nicolov, Nanda Kambhatla, and Wlodek Zadrozny. 2001. Natural Language Sales Assistant - A Web-Based Dialog System for Online Sales. In Proceedings of the Thirteenth Conference on Innovative Applications of Artificial Intelligence Conference. 19–26. [2] Li Chen and Pearl Pu. 2012. Critiquing-based Recommenders: Survey and Emerging Trends. User Modeling and User-Adapted Interaction 22, 1-2 (April 2012), 125–150. https://doi.org/10.1007/s11257-011-9108-6 Table 3: Recommender reactions to seeker [3] Konstantina Christakopoulou, Filip Radlinski, and Katja Hofmann. 2016. Towards Conversational Recommender Systems. feedback, and action distribution in our In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). dataset 815–824. https://doi.org/10.1145/2939672.2939746 [4] Jie Kang, Kyle Condiff, Shuo Chang, Joseph A. Konstan, Loren Terveen, and F. Maxwell Harper. 2017. Understanding How Action (Code) Description Percentage People Use Natural Language to Ask for Recommendations. In Proceedings of the Eleventh ACM Conference on Recommender Recommender provides Systems (RecSys ’17). 229–237. https://doi.org/10.1145/3109859.3109873 Recommend (REC) 43.8% one or more recommendations. [5] Andrei P. Kirilenko and Svetlana Stepchenkova. 2016. Inter-Coder Agreement in One-to-Many Classification: Fuzzy Kappa. Recommender explains Explain (EXP) 30.0% PLOS ONE 11, 3 (03 2016), 1–14. https://doi.org/10.1371/journal.pone.0149787 why the item is recommended. Recommender responds to [6] Raymond Li, Samira Ebrahimi Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, and Chris Pal. 2018. Towards Respond (RES) 12.4% Deep Conversational Recommendations. In Advances in Neural Information Processing Systems 32. 9748–9758. any other queries by the seeker. Recommender answers [7] Gery W Ryan and H Russell Bernard. 2003. Techniques to Identify Themes. Field Methods 15, 1 (2003), 85–109. Answer (ANS) 10.2% the question from the seeker. [8] Yueming Sun and Yi Zhang. 2018. Conversational Recommender System. In The 41st International ACM SIGIR Conference Recommender requests for Request (REQ) 3.1% on Research & Development in Information Retrieval (SIGIR ’18). 235–244. https://doi.org/10.1145/3209978.3210002 the seeker’s preferences. [9] Zhao Yan, Nan Duan, Peng Chen, Ming Zhou, Jianshe Zhou, and Zhoujun Li. 2017. Building Task-oriented Dialogue Systems for Online Shopping. In Thirty-First AAAI Conference on Artificial Intelligence.