How Do Different Levels of User Control Affect Cognitive Load and Acceptance of Recommendations? Yucheng Jin Bruno Cardoso Katrien Verbert Department of Computer Department of Computer Department of Computer Science, KU Leuven Science, KU Leuven Science, KU Leuven Leuven, Belgium Leuven, Belgium Leuven, Belgium yucheng.jin@cs.kuleuven.be bruno.cardoso@cs.kuleuven.be katrien.verbert@cs.kuleuven.be ABSTRACT develop and enhance algorithmic techniques such as content- User control has been recognised as an important feature in based filtering, collaborative filtering, knowledge-based fil- recommender system, as it allows users to steer the recommen- tering and hybridisations. However, many researchers have dation process. Most typical user controls relate to providing argued that other factors beyond accuracy may influence the ratings, editing user data, and adjusting weights of the algo- user experience with recommender-based platforms [22, 17]. rithm. The cognitive load of the user may increase when using Recently, user-centred research has gained a lot of attention in more advanced user controls. We divided common user con- the field of recommender systems and various metrics [21, 16] trols into three levels (high, middle, and low) and conducted a of user experience assessment have been proposed, including study (N=90) to investigate how different levels of user control diversity, serendipity, trust, transparency, and controllability. affect cognitive load and quality of recommendations. We de- Enhancing the user experience from these perspectives re- signed a visualisation on top of a music recommender system quires effective user interaction with the system. Much of the that incorporates three levels of control. The study results show existing literature proposes to address the well-known “black- that high level control tends to produce the best recommen- box” issue by focusing on providing visualisations that expose dations, while requiring the highest cognitive load. However, the recommender algorithm to the user. Such visualisations only participants with rich experience in recommender sys- empower the user to inspect the recommender process and tems are more likely to tweak such high level control, while further tune the system to receive better recommendations. the majority of participants still prefers low and middle level control. We validated the robustness of our findings with three The metric of controllability is of particular relevance to this different algorithms. work and indicates how much the system supports the user to configure the recommender process to improve the recommen- ACM Classification Keywords dations. It has been regarded as an important index to evaluate H.5.m. Information Interfaces and Presentation (e.g. HCI): the overall user experience of recommender systems, as lower Miscellaneous levels of user control negatively influence the perceived quality of recommendations [10]. For example, a system that keeps recommending hotels to a user who has booked a hotel recently Author Keywords may annoy the user if the system does not provide a mecha- User control; Cognitive load; Acceptance of nism to reject recommendations or adjust her preferences. In recommendations. order to address this problem, a variety of recommender sys- tems have components to rate recommendations, modify user INTRODUCTION data, and adjust various settings of the recommender engine Recommender systems are ubiquitous today and we can find itself, such as parameter weight [8]. However, user interfaces them in many application domains. These recommendation may become difficult to understand when containing many algorithms and powerful big data technologies allow appli- control components [3]. Therefore, we assume that levels of cations to provide high quality recommendations to users, user control may influence the cognitive load of the user when increasing their acceptance potential and, in turn, leading to using the system. improved user satisfaction and perceived effectiveness. Ex- To investigate this hypothesis, we used the Spotify API 1 tensive research has been conducted in the past decades to to design a music recommender system and to explore how different levels of user control influence the cognitive load of system use. We visualise recommendations by a column based diagram and use colour to link related items in each column. It is suitable for representing the relationship between user data and recommendations. The recommender system integrates Joint Workshop on Interfaces and Human Decision Making for Recom- three recommender algorithms. The first one is based on the mender Systems, Como, Italy. ©2017. Copyright for the individual papers remains with the authors. 1 https://developer.spotify.com/web-api Copying permitted for private and academic purposes. This volume is pub- lished and copyrighted by its editors. Control Recommender Explanation and the perceived quality of their recommendations. With level components regards to related work, our contributions are the following: Low Recommendations Sort and rate the recommendations 1. We define three levels of user control (low, middle, high) level based on estimated work load of tweaking each level of Middle User data Select which user data will be used in Level the recommender engine and check addi- control. tional info of user data High Medium data Modify the weight of the selected or gen- 2. By leveraging the metaphors of “processing” and “produc- level erated data in the recommender engine tion”, we design and develop an interactive music recom- Table 1. Three levels of user control are defined in our study. mender with a drag and drop user interface to help the user understand the recommendation process. 3. We conduct a user study to investigate the user cognitive top seeds (top artists, top tracks and top genres) generated by load and the perceived quality of recommendation under the user. The second one is an item-item collaborative filtering the three defined levels of user control. We also validate our algorithm that lists the top tracks of artists who are related findings with three recommender algorithms. to followed artists. The third one is a hybrid algorithm that combines these two algorithms. 4. Based on our findings, we discuss the possible ways to bal- ance levels of user control and required cognitive load in the Usually, measuring the cognitive load relies on self-reported recommendation process. In addition, we also demonstrate data or analysis of physiological data. The approach of self- what kind of users are more likely to benefit from each level reporting uses questionnaires such as NASA-TLX 2 to ask of user control. users about their experience after performing tasks. In turn, the physiological data approach usually analyses EEG and eye- This paper is organised as follows: we first introduce related tracking data to predict cognitive load during the tasks. Both work covering interactive recommenders that support user con- approaches have their strengths and weaknesses. Although trol, and research on cognitive load of recommender visualisa- using physiological data can provide real-time information, it tions. We then describe the system design of our recommender is difficult to set up for online studies. Therefore, we use a clas- system. The next section introduces the design of study, fol- sic cognitive load testing questionnaire, the NASA-TLX, to lowed by results of the user study. Finally, we conclude with a assess cognitive load on six aspects: mental demand, physical discussion of study findings and limitations. demand, temporal demand, performance, effort, and frustra- tion. In addition, we also investigate the effects of different RELATED WORK levels of user control on acceptance of recommendations by User Control in Recommender Systems asking users to rate recommended songs. Many HCI researchers [25, 18] count controllability as one of The interactive recommendation framework proposed by He most prominent factors that influence overall user experience et al. [11] defines three main components in interactive rec- with recommender systems. Current user control research ommenders: user data and context, medium, and recommen- focuses on rating recommendations, revising the user profile, dations. We therefore define different levels of user control and adjusting recommendation parameters such as weight [11]. for each component in Table 1. User control has been an integral part of research on interactive recommender systems. Previous work shows a positive effect Our study aims to provide the groundwork for developing high- of user control on user satisfaction [20, 10] and perceived quality recommender systems offering sufficient user control, quality [22] of recommendations. We review several typical while demanding acceptable cognitive load. Specifically, we systems that increase user involvement in various stages of investigate the following questions: the recommendation process, through different levels of user RQ1: Do different levels of user control have an impact on control. the cognitive load of using recommender systems and, if so, TasteWeights [5], LinkedVis [6] and SetFusion [20] use slid- what is the impact? ers to revise user profile data and adjust the weights of the RQ2: Do different levels of user control have an effect on recommender engine components, thereby improving recom- acceptance of recommendations? mendation accuracy and user experience. As a result, users gain insight into how their actions affect the recommendations RQ3: Will different recommender algorithms influence the in real-time. Some systems [19, 4, 14] use the distance be- answers to RQ1 and RQ2? tween data nodes and the active user to represent the weight Andjelkovic et al. [3] already show that users spend more of the selected node, which allows users to modify recommen- effort with systems offering higher levels of user control than dation preferences by adjusting the distances. PARIS-Ad [12] with systems with lower levels of user control. However, researches the effects of user control on targeted advertising. to the best of our knowledge, no comprehensive work has It allows the user to adjust her profile with drop-down lists yet investigated to what extent varying levels of user control and check-lists, and visualises the recommendation process influence the cognitive load of using recommender systems in a flowchart. MusiCube [24] refines the recommendations by asking the user to rate as many of the resulting items as 2 https://humansystems.arc.nasa.gov/groups/tlx possible. All these systems demonstrate that user control has Figure 1. Visualisation of the seed based algorithm. a): the recommendation source shows available top artist, tracks and genre tags. b): the recommen- dation processor enables users to adjust the weight of the input data type and individual data items. c): play-list style recommendations. a prominent impact on the accuracy and effectiveness of rec- Quiroga et al. [23] pointed out that information filtering and ommendations. However, it is not clear if varying levels of building profiles on users’ organisational behaviour is essential user control affect the robustness of the findings. We there- to reduce cognitive load. fore intend to compare recommendation ratings in different experimental tasks entailing different levels of user control. Although we do not find that related work reveals the relation between levels of user control and cognitive load, Andjelkovic Cognitive Load et al. [3] observed in their music recommender that additional The construct of “cognitive load” is usually used to measure control to new aspects such as avatars might increase cogni- how many cognitive resources are taken up by activities that tive load. In addition, Adil Yalçinn et al. [26] presented the facilitate learning [9]. In general, cognitive load measurement Cognitive Exploration Framework, providing guidelines to re- is performed through the application of a post-study in the duce the cognitive load in their defined six stages of cognitive form of a self-assessment questionnaire, or the analysis of activities in visual data exploration. physiological data collected during task execution. The NASA In our study, we not only aim to provide effective user control task-load index (NASA-TLX) is one of the most widely used to enhance the user experience with recommender systems, but questionnaires to measure cognitive load, along six dimen- also to investigate how different levels of user control affect sions: mental demand, physical demand, temporal demand, the cognitive load and recommendation quality. Moreover, we own perception of performance, effort and frustration. Al- provide groundwork for designing user-centred recommender though it is not designed to measure cognitive load in real-time, systems that also adapt to different levels of user cognitive it is easy to apply and reliable in many conditions. load. The information visualisation community has adopted vari- SYSTEM DESIGN AND INTERACTIONS ous physiological data to measure the cognitive load of using different visualisation techniques [27]. Typically, researchers Recommendation Algorithms analyse eye tracking [1] and brain activity [2] data to estimate In order to validate our research findings with different rec- the cognitive load while performing tasks. However, even ommender approaches, we implemented three different al- though physiological methods provide the means to estimate gorithms to generate music recommendations by using the cognitive load in real-time, the cost of hardware such as eye Spotify API. trackers and electroencephalography (EEG) systems and pro- Seed based algorithm fessional training for analysing produced data are substantial The Spotify API provides a recommender service that gener- barriers to the widespread adoption of this approach. ates a play-list-style listening experience based on three types Previous work has demonstrated various ways of decreasing of seeds: artists, tracks and genres. We use the active user’s cognitive load while improving the performance of interactive top artists, tracks and genres as input seeds. It is worth noting recommender systems. Schnabel et al. [9] use shortlists as that the top artists and tracks are calculated by affinity, which digital short-term memory. Since users do not need to keep the is a measure of expected user preference for a particular track considered items in their minds, the cognitive load is reduced. or artist based on her listening history. The number of songs Figure 2. Visualisation of the artist based algorithm. a): the recommendation source panel shows available followed artists, b): the recommendation processor enables users to adjust the weight of related artists of selected followed artists, and c): play-list style recommendations. recommended through the use of a particular seed depends on (c) recommendations: the recommended results are shown in a the weight of the seed’s type and the priority of the used seed play-list style. among the seeds of same type. Visualisation of the seed based algorithm Artist based algorithm As presented in Figure 1(a), we use three distinct colors to The artist-based algorithm uses the item-item collaborative represent types of recommendation source data as visual cues filtering approach. First, the algorithm reads the list of user- (yellow for artists, green for tracks, and blue for genres). Ad- followed artists. Then, the Spotify API allows us to find ditional source data for a particular type is loaded by clicking artists related to a followed artist by calculating the similarity the “+” icon next to the title of source data type. Likewise, between them, which is based on analyses of the Spotify com- we use the same color schema to encode the data type slider munity listening history. The top 20 tracks of these related and selected source data (Figure 1 (b)), and recommendations artists are returned. The number of recommendations by an (Figure 1 (c)). As a result, the visual cues show the relation artist is proportional to the weight of the artist. among the data in three steps of the recommendation process. When users click on a particular data item in the recommenda- Hybrid based algorithm tion processor, the corresponding recommended items will be The hybrid based algorithm combines the seed based algorithm highlighted, and an additional info view displays its details. and artist based algorithm. The same weight is assigned to both algorithms. Visualisation of the artist based algorithm To emphasise the concept of artist relations, this algorithm only User Interface and Visualisations contains artist data items represented by the corresponding The user interface of the recommender was designed using artists’ portraits in addition to their names (Figure 2 (a)). When the metaphor of “processing” and “production”. It consists of users drag an artist and drop it in the selected artists block, three parts: the top five related artists of the dropped artist are shown, each with a slider to adjust its weight (Figure 2 (b)). Similar (a) The recommendations source view works as a warehouse of to the first visualization, recommendations are highlighted source data, such as top artists, top tracks, top genres, and when users click on a particular artist in the recommendation followed artists, generated from past listening history. processor (Figure 2 (c)) to depict their relation. (b) The recommendations processor shows areas in which Interactions and User Controls source items can be dropped from part (a). The dropped Our system offers several interactions to support our three data are bound to UI controls such as sliders or sortable lists levels of user control. for weight adjustment. It also contains an additional info view to inspect details of selected data items. In addition, Low level of user control a pair of radio buttons allows the user to switch between In this level, users can sort the recommendation results by different algorithms. preference through a drop-down menu. Although ratings nor- mally have no immediate effects on recommendations, we recommendation source (moderate level control). Finally, they still regard recommendations feedback as a kind of low level rated each song. user control. The star rating widget beside song title allows users to rate the songs in the recommendation list (Figure 1(c), T3: Users were asked to interact with recommendations by Figure 2(c)). sorting (low level control) recommendations, modifying rec- ommendation source (moderate level control), tweaking the Middle level of user control parameters of algorithms (high level control). Once again, In general, manipulating source data and checking details participants were asked to rate each song. compose the middle level of user control. A drag and drop We split the 90 participants equally into three groups to vali- interface allows users to intuitively add a new source data item date the results with three different settings of recommender to update recommendations (Figure 1(a), Figure 2(a)). When algorithms: the seed-based algorithm (Setting 1), the artist- a preferred source item is dropped to the recommendation based algorithm (Setting 2), and a hybrid of the two algorithms processor, a progress animation will play until the end of the with equal weight (Setting 3). processing. Users are also able to simply remove a dropped data item from the processor by clicking the corresponding Participants of each group tested one algorithm setting with “x” icon. Moreover, by selecting an individual item, users can three experimental tasks. The order of the three tasks has been inspect its detail: artists are accompanied by their name, an mixed to avoid learning effects. image, popularity, genres, and number of followers; tracks are shown with their name, album cover, popularity, and audio Evaluation Procedure clip; and genres are accompanied by their top related artists The participants were asked to watch a task tutorial. Only the and tracks. features of the particular setting were shown in this video. Af- ter interacting with the visualization, participants were asked High level of user control to rate the top-20 recommended songs that resulted from their The high level of user control allows users to tweak the un- interaction, and to fill out the NASA-TLX questionnaire to derlying algorithm as a basis to further manipulate the recom- measure their cognitive load. Users had to complete this ques- mendation process. To support this level of control, multiple tionnaire in the three experimental tasks. At the end of the UI components are developed to adjust the weight associated task, they were asked to fill out a questionnaire that was based with the type of data items, or the weight associated to an on a part of the ResQue to evaluate the perceived quality of individual data item. In the seed based algorithm, users are the recommender with all levels of user control. To assess the able to specify their preferences for each data type by manipu- validity of the responses, we set contradictory questions in this lating a slider for each data type. By sorting a list of dropped questionnaire. In addition, user interactions with the different data items, users can set the weight of each item in this list components of the visualization were recorded in a log. (Figure 1(b)). Similarly, the weight of related artists can be manipulated by moving its associated slider in the artist based RESULTS algorithm visualisation (Figure 2(b)). To analyze the cognitive load, we calculated the score from participant responses to the NASA-TLX questionnaire, which EVALUATION ranges from 0 to 100. The higher the score is, the more cogni- We evaluated our system by conducting a study on Amazon tive load is required. Since we intend to measure the overall Mechanical Turk (MTurk) with 107 participants who are all accuracy of a recommendation list, we apply the Breese’s active users of Spotify. 17 of our participants were rejected R-Score “utility” metric [7] to calculate a utility score. The because of their repetitive and invalid answers. In the end, rating score for a song ranges from 1 to 5, and the default score we had 90 valid participants (48 female, 42 male), their ages is 1. We also analyze responses to the ResQue-based question- ranged from 20 to 48 years (mean age = 29.8 years, SD = naire, and report the results separately for each recommender 7.51, Median = 28). 86.67% of participants are familiar with algorithm. recommender system. We paid $ 1 for each study. The average study completion time was around 33 minutes (SD = 7.23, Cognitive load Median = 33). Setting 1: seed based Descriptive statistics show that participants have the highest Evaluation Design cognitive load in T3 (M=57.14), followed by T2 (M=46.11) We designed a within-subjects study to investigate the effects and T1 (M=31.43). We performed a one-way repeated of different levels of user control on cognitive load and ac- ANOVA to test for significance. There was a significant effect ceptance of recommendations. Therefore, we created three for cognitive load, F(2, 58) = 44.47, p<.001. Bonferroni- experimental tasks T1, T2, and T3 corresponding to the differ- corrected pairwise comparisons (sig. level = .016) revealed ent levels of user control. that T3 requires significantly higher cognitive load than T2 T1 Users were only allowed to interact with recommendations (p<.001) and T1 (p<.001). T2 required a significantly higher by sorting recommendations (low level control) in a list. In the cognitive load than T1 (p<.001). end, they rated each song in the list of recommended items. Setting 2: artist based T2: Users were asked to interact with recommendations by Descriptive statistics show that participants in T3 (M=50.32) sorting (low level control) recommendations and modifying have the highest cognitive load, followed by T2 (M=38.57) and T1 (M=30.24). To test for significance, we performed a one- Settings Low level Middle level High level way repeated ANOVA test. To compensate for violations of the Setting 1 60.5% 27.9% 11.6% sphericity assumption (Mauchly’s (W(df=2) = .721, (p=.010), Setting 2 54.4%, 28.3% 17.3% the significance levels were corrected by Greenhouse-Geisser. Setting 3 51.9%, 26.9% 21.2% The corrected score shows a significant effect for cognitive Table 2. Percentage of interactions with each level of control in task 3. load, F(1.56, 45.36) = 15.42, p<.001. Bonferroni-corrected pairwise comparisons (sig. level = .016) revealed that T3 required significantly higher cognitive load than T2 (p=.001) and T1 (p<.001), and T2 required significantly higher load Overall user experience than T1 (p=.009). The left bar chart (Figure 3) plots users’ attitudes towards the various controls of the recommender systems. Participants Setting 3: hybrid seem to enjoy using a drag-and-drop interface to manipulate Descriptive statistics show that T3 (M=52.14) requires the the recommendation process. The system also allows users to highest cognitive load, followed by T2 (M=45.87) and T1 express their preferences easily. In general, users like to give (M=34.44). To test for significance, we performed a one-way feedback and modify their data. However, it seems that only a repeated ANOVA test. The corrected score shows a significant part of participants would like to control more components of effect for cognitive load, F(2, 58) = 8.54, p=.001. Bonferroni- the system. It is worth noting that 91.1% of the participants corrected pairwise comparisons (sig. level = .016) revealed who would like to tweak the high level control have experience that both T3 (p<.001) and T2 (p=.001) require significantly with recommender systems and 95.6% of them enjoy listening higher cognitive load than T1. to music online. In general, T3 requires a significantly higher cognitive load in The chart on the right side illustrates the users’ positive re- all three settings. But the differences between T2 and T1 and sponses to our system in terms of other user experience aspects between T3 and T2 are not always significant. such as novelty, diversity and confidence. Users indicated that using our system was fun and that they easily became familiar with the system. Despite these merits, some users are not sure Acceptance of recommendations they would use this system frequently to listen to music. Setting 1: seed based Log file data Descriptive statistics show that the list of recommendations in T3 (M=3.49) was rated higher than T2 (M=2.95) and T1 Since we intend to know how often users will interact with (M=2.08). A one-way repeated ANOVA test was conducted each user control, we also analyzed interaction data. for examining significance. A significant effect is found for We report the percentage of interactions for each level of con- user rating, F(2, 58) = 25.04, p<.001. Bonferroni-corrected trol in T3, where all levels of control are presented (Table 2). pairwise comparisons (sig. level = .016) revealed that ratings More than half of the interactions are related to low level con- of recommendations in T3 was rated significantly higher than trols, and around a quarter of clicks are related to middle level. those in T2 (p=.003) and T1 (p=.001), and recommendations Only a small part of clicks were done with high level controls. in T2 were rated significantly higher than those in T1 (p=.001). DISCUSSION Setting 2: artist based In this section, we discuss the results presented in the pre- Descriptive statistics show that the list of recommendations vious section, thereby answering the research questions and in T3 (M=3.54) was rated higher than in T2 (M=2.92) and evaluating the proposed hypothesis. T1 (M=2.41). The result of a one-way repeated ANOVA test shows a significant effect for user rating, F(2, 58) = 14.68, Overall, the results of NASA-TLX show that the higher level p<.001. Bonferroni-corrected pairwise comparisons (sig. level of user control tends to increase cognitive load (RQ1). Spe- = .016) revealed that recommendations in T3 were rated sig- cially, we see that the high level user control has significantly nificantly higher than in T2 (p<.001) and T1 (p=.001). higher cognitive load than the low level, in all of the three set- tings. Previous work [12, 10, 20] has reported that user control Setting 3: hybrid improves the accuracy of recommendations. Furthermore, the Descriptive statistics show that the list of recommendations results of user ratings indicate that the level of control has a in T3 (M=3.27) was rated higher than in T2 (M=3.21) and significant influence on the acceptance of recommendations (RQ2). The poor result in T1 may suffer from the unmodifiable T1 (M=2.59). The result of a one-way repeated ANOVA tags for bootstrapping the system. Also in our data, we can test shows a significant effect for user rating, F(2, 58) = observe that the high level user control increases the quality of 7.80, p<.001. Bonferroni-corrected pairwise comparisons (sig. recommendations in all three settings. The effects of levels of level = .016) revealed that the lists of recommendations in T3 (p=.002) and T2 (p=.004) were rated significantly higher than control on cognitive load and acceptance of recommendations in T1. are not statistically significant when we compare the high level control to the middle level, and the middle level control to By comparing the findings of different settings, we find that the the low level in some settings. By comparing the mean value list of recommendations in T3 was always rated significantly of each result, our findings can be validated with different higher than in other settings. algorithms (RQ3). Besides, it seems that participants have Figure 3. User responses to the ResQue based questionnaire in the three settings. difficult in understanding what they can control and have less in questionnaires, the validity of study results may still suf- interest while performing T3 in Setting 2. A possible explana- fer from inattentive or “spamming” users. Second, the re- tion is that the current visualization does not clearly plot the search finding should be validated in other application do- relations among artists, suggesting that a network graph could mains. Third, the research findings were found based on be a better option. specific user control mechanisms implemented in the study system. Our future work will focus on adapting the user inter- Log file data also suggests that users are more likely to tweak face of recommender systems to address the individual needs the low level and the middle level control. By looking at the and preferences of users. user profile, we find that participants with rich experience with recommender systems and online music tend to tweak Acknowledgements the high level user control more frequently. The majority of participants prefers to have only low and middle level user The research has been partially financed by the KU Leuven control. This may depend on user personal characteristics Research Council (grant agreement no. C24/16/017). and domain knowledge [15]. In addition, a drag-and-drop UI seems to allow users to interact with the system intuitively. REFERENCES In spite of the merits in our system, users hesitate to use it 1. Erik W Anderson. 2012. Evaluating visualisation using for listening to music. A potential reason is that many users cognitive measures. In Proceedings of the 2012 BELIV prefer to listen and discover music on mobile devices with Workshop: Beyond Time and Errors-Novel Evaluation simple interactions rather than on large screens with complex Methods for Visualization. ACM, 5. interaction [13]. 2. Erik W Anderson, Kristin C Potter, Laura E Matzen, Jason F Shepherd, Gilbert A Preston, and Cláudio T Silva. CONCLUSION 2011. A user study of visualisation effectiveness using EEG and cognitive load. In Computer Graphics Forum, We define three levels of user control to investigate the ef- Vol. 30. Wiley Online Library, 791–800. fects of levels of control on cognitive load and acceptance of recommendations. We designed and implemented a music 3. Ivana Andjelkovic, Denis Parra, and John O’Donovan. recommender with three distinct settings of recommender al- 2016. Moodplay: interactive mood-based music gorithms. An online study was performed to answer research discovery and recommendation. In Proceedings of the questions. We conclude with the following findings: 2016 Conference on User Modeling Adaptation and • By incorporating higher level of user control, cognitive load Personalization. ACM, 275–279. tends to increase. 4. Fedor Bakalov, Marie-Jean Meurs, Birgitta König-Ries, Bahar Sateli, René Witte, Greg Butler, and Adrian Tsang. • By incorporating higher level of user control, the recom- 2013. An approach to controlling user models and mendations are more likely to be accepted. personalization effects in recommender systems. In Proceedings of the 2013 international conference on Intelligent user interfaces. ACM, 49–56. • Our research findings are generalizable to different recom- mender algorithms. 5. Svetlin Bostandjiev, John O’Donovan, and Tobias Höllerer. 2012. TasteWeights: a visual interactive hybrid Our study has three main limitations: first, although we have recommender system. In Proceedings of the 6th ACM excluded unqualified users by setting contradictory questions conference on Recommender systems. ACM, 35–42. 6. Svetlin Bostandjiev, John O’Donovan, and Tobias 17. Joseph A Konstan and John Riedl. 2012. Recommender Höllerer. 2013. LinkedVis: exploring social and semantic systems: from algorithms to user experience. User career recommendations. In Proceedings of the 2013 modeling and user-adapted interaction 22, 1 (2012), international conference on Intelligent user interfaces. 101–123. ACM, 107–116. 18. Jakob Nielsen. 1999. Designing web usability: The 7. John S Breese, David Heckerman, and Carl Kadie. 1998. practice of simplicity. New Riders Publishing. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th 19. John O’Donovan, Barry Smyth, Brynjar Gretarsson, conference on Uncertainty in artificial intelligence. Svetlin Bostandjiev, and Tobias Höllerer. 2008. Morgan Kaufmann Publishers Inc., 43–52. PeerChooser: visual interactive recommendation. In Proceedings of the SIGCHI Conference on Human 8. Robin Burke. 2002. Hybrid recommender systems: Factors in Computing Systems. ACM, 1085–1088. survey and experiments. User Modeling and User-Adapted Interaction 12, 4 (2002), 331–370. 20. Denis Parra and Peter Brusilovsky. 2015. User-controllable personalization: a case study with 9. Paul Chandler and John Sweller. 1991. Cognitive load SetFusion. International Journal of Human-Computer theory and the format of instruction. Cognition and Studies 78 (2015), 43–67. instruction 8, 4 (1991), 293–332. 10. F Maxwell Harper, Funing Xu, Harmanpreet Kaur, Kyle 21. Pearl Pu, Li Chen, and Rong Hu. 2011. A user-centric Condiff, Shuo Chang, and Loren Terveen. 2015. Putting evaluation framework for recommender systems. In users in control of their recommendations. In Proceedings of the fifth ACM conference on Proceedings of the 9th ACM Conference on Recommender systems. ACM, 157–164. Recommender Systems. ACM, 3–10. 22. Pearl Pu, Li Chen, and Rong Hu. 2012. Evaluating 11. Chen He, Denis Parra, and Katrien Verbert. 2016. recommender systems from the user’s perspective: survey Interactive recommender systems: a survey of the state of of the state of the art. User Modeling and User-Adapted the art and future research challenges and opportunities. Interaction 22, 4 (2012), 317–355. Expert Systems with Applications 56 (2016), 9–27. 23. Luz M Quiroga, Martha E Crosby, and Marie K Iding. 12. Yucheng Jin, Karsten Seipp, Erik Duval, and Katrien 2004. Reducing cognitive load. In Proceedings of the Verbert. 2016. Go with the flow: effects of transparency Proceedings of the 37th Annual Hawaii International and user control on targeted advertising using flow charts. Conference on System Sciences (HICSS’04)-Track In Proceedings of the International Working Conference 5-Volume 5. IEEE Computer Society, 50131–1. on Advanced Visual Interfaces. ACM, 68–75. 24. Yuri Saito and Takayuki Itoh. 2011. MusiCube: a visual 13. Mohsen Kamalzadeh, Christoph Kralj, Torsten Möller, music recommendation system featuring interactive and Michael Sedlmair. 2016. TagFlip: active mobile evolutionary computing. In Proceedings of the 2011 music discovery with social tags. In Proceedings of the Visual Information Communication-International 21st International Conference on Intelligent User Symposium. ACM, 5. Interfaces. ACM, 19–30. 25. Ben Shneiderman. Designing the User Interface. Pearson 14. Antti Kangasrääsiö, Dorota Glowacka, and Samuel Kaski. Education India. 2015. Improving controllability and predictability of interactive recommendation interfaces for exploratory 26. M Adil Yalçin, Niklas Elmqvist, and Benjamin B search. In Proceedings of the 20th international Bederson. 2016. Cognitive Stages in Visual Data conference on intelligent user interfaces. ACM, 247–251. Exploration. In Proceedings of the Beyond Time and Errors on Novel Evaluation Methods for Visualization. 15. Bart P Knijnenburg, Niels JM Reijmer, and Martijn C ACM, 86–95. Willemsen. 2011. Each to his own: how different users call for different interaction methods in recommender 27. Johannes Zagermann, Ulrike Pfeil, and Harald Reiterer. systems. In Proceedings of the 5th ACM conference on 2016. Measuring cognitive load using eye tracking Recommender systems. ACM, 141–148. technology in visual computing. In BELIV’16: Proceedings of the Sixth Workshop on Beyond Time and 16. Bart P Knijnenburg, Martijn C Willemsen, Zeno Gantner, Errors on Novel Evaluation Methods for Visualization. Hakan Soncu, and Chris Newell. 2012. Explaining the 78–85. user experience of recommender systems. User Modeling and User-Adapted Interaction 22, 4-5 (2012), 441–504.