How Do Different Levels of User Control Affect Cognitive
        Load and Acceptance of Recommendations?
             Yucheng Jin                                     Bruno Cardoso                              Katrien Verbert
       Department of Computer                            Department of Computer                     Department of Computer
         Science, KU Leuven                                Science, KU Leuven                          Science, KU Leuven
           Leuven, Belgium                                  Leuven, Belgium                             Leuven, Belgium
      yucheng.jin@cs.kuleuven.be                      bruno.cardoso@cs.kuleuven.be               katrien.verbert@cs.kuleuven.be


ABSTRACT                                                                   develop and enhance algorithmic techniques such as content-
User control has been recognised as an important feature in                based filtering, collaborative filtering, knowledge-based fil-
recommender system, as it allows users to steer the recommen-              tering and hybridisations. However, many researchers have
dation process. Most typical user controls relate to providing             argued that other factors beyond accuracy may influence the
ratings, editing user data, and adjusting weights of the algo-             user experience with recommender-based platforms [22, 17].
rithm. The cognitive load of the user may increase when using
                                                                           Recently, user-centred research has gained a lot of attention in
more advanced user controls. We divided common user con-
                                                                           the field of recommender systems and various metrics [21, 16]
trols into three levels (high, middle, and low) and conducted a
                                                                           of user experience assessment have been proposed, including
study (N=90) to investigate how different levels of user control
                                                                           diversity, serendipity, trust, transparency, and controllability.
affect cognitive load and quality of recommendations. We de-
                                                                           Enhancing the user experience from these perspectives re-
signed a visualisation on top of a music recommender system
                                                                           quires effective user interaction with the system. Much of the
that incorporates three levels of control. The study results show
                                                                           existing literature proposes to address the well-known “black-
that high level control tends to produce the best recommen-
                                                                           box” issue by focusing on providing visualisations that expose
dations, while requiring the highest cognitive load. However,
                                                                           the recommender algorithm to the user. Such visualisations
only participants with rich experience in recommender sys-
                                                                           empower the user to inspect the recommender process and
tems are more likely to tweak such high level control, while
                                                                           further tune the system to receive better recommendations.
the majority of participants still prefers low and middle level
control. We validated the robustness of our findings with three            The metric of controllability is of particular relevance to this
different algorithms.                                                      work and indicates how much the system supports the user to
                                                                           configure the recommender process to improve the recommen-
ACM Classification Keywords                                                dations. It has been regarded as an important index to evaluate
H.5.m. Information Interfaces and Presentation (e.g. HCI):                 the overall user experience of recommender systems, as lower
Miscellaneous                                                              levels of user control negatively influence the perceived quality
                                                                           of recommendations [10]. For example, a system that keeps
                                                                           recommending hotels to a user who has booked a hotel recently
Author Keywords                                                            may annoy the user if the system does not provide a mecha-
User control; Cognitive load; Acceptance of                                nism to reject recommendations or adjust her preferences. In
recommendations.                                                           order to address this problem, a variety of recommender sys-
                                                                           tems have components to rate recommendations, modify user
INTRODUCTION                                                               data, and adjust various settings of the recommender engine
Recommender systems are ubiquitous today and we can find                   itself, such as parameter weight [8]. However, user interfaces
them in many application domains. These recommendation                     may become difficult to understand when containing many
algorithms and powerful big data technologies allow appli-                 control components [3]. Therefore, we assume that levels of
cations to provide high quality recommendations to users,                  user control may influence the cognitive load of the user when
increasing their acceptance potential and, in turn, leading to             using the system.
improved user satisfaction and perceived effectiveness. Ex-                To investigate this hypothesis, we used the Spotify API 1
tensive research has been conducted in the past decades to                 to design a music recommender system and to explore how
                                                                           different levels of user control influence the cognitive load of
                                                                           system use. We visualise recommendations by a column based
                                                                           diagram and use colour to link related items in each column. It
                                                                           is suitable for representing the relationship between user data
                                                                           and recommendations. The recommender system integrates
Joint Workshop on Interfaces and Human Decision Making for Recom-          three recommender algorithms. The first one is based on the
mender Systems, Como, Italy.
©2017. Copyright for the individual papers remains with the authors.       1 https://developer.spotify.com/web-api
Copying permitted for private and academic purposes. This volume is pub-
lished and copyrighted by its editors.
 Control     Recommender           Explanation                                 and the perceived quality of their recommendations. With
 level       components                                                        regards to related work, our contributions are the following:
 Low         Recommendations       Sort and rate the recommendations           1. We define three levels of user control (low, middle, high)
 level
                                                                                  based on estimated work load of tweaking each level of
 Middle      User data             Select which user data will be used in
 Level                             the recommender engine and check addi-         control.
                                   tional info of user data
 High        Medium data           Modify the weight of the selected or gen-
                                                                               2. By leveraging the metaphors of “processing” and “produc-
 level                             erated data in the recommender engine          tion”, we design and develop an interactive music recom-
     Table 1. Three levels of user control are defined in our study.              mender with a drag and drop user interface to help the user
                                                                                  understand the recommendation process.
                                                                               3. We conduct a user study to investigate the user cognitive
top seeds (top artists, top tracks and top genres) generated by                   load and the perceived quality of recommendation under
the user. The second one is an item-item collaborative filtering                  the three defined levels of user control. We also validate our
algorithm that lists the top tracks of artists who are related                    findings with three recommender algorithms.
to followed artists. The third one is a hybrid algorithm that
combines these two algorithms.                                                 4. Based on our findings, we discuss the possible ways to bal-
                                                                                  ance levels of user control and required cognitive load in the
Usually, measuring the cognitive load relies on self-reported                     recommendation process. In addition, we also demonstrate
data or analysis of physiological data. The approach of self-                     what kind of users are more likely to benefit from each level
reporting uses questionnaires such as NASA-TLX 2 to ask                           of user control.
users about their experience after performing tasks. In turn,
the physiological data approach usually analyses EEG and eye-                  This paper is organised as follows: we first introduce related
tracking data to predict cognitive load during the tasks. Both                 work covering interactive recommenders that support user con-
approaches have their strengths and weaknesses. Although                       trol, and research on cognitive load of recommender visualisa-
using physiological data can provide real-time information, it                 tions. We then describe the system design of our recommender
is difficult to set up for online studies. Therefore, we use a clas-           system. The next section introduces the design of study, fol-
sic cognitive load testing questionnaire, the NASA-TLX, to                     lowed by results of the user study. Finally, we conclude with a
assess cognitive load on six aspects: mental demand, physical                  discussion of study findings and limitations.
demand, temporal demand, performance, effort, and frustra-
tion. In addition, we also investigate the effects of different                RELATED WORK
levels of user control on acceptance of recommendations by                     User Control in Recommender Systems
asking users to rate recommended songs.                                        Many HCI researchers [25, 18] count controllability as one of
The interactive recommendation framework proposed by He                        most prominent factors that influence overall user experience
et al. [11] defines three main components in interactive rec-                  with recommender systems. Current user control research
ommenders: user data and context, medium, and recommen-                        focuses on rating recommendations, revising the user profile,
dations. We therefore define different levels of user control                  and adjusting recommendation parameters such as weight [11].
for each component in Table 1.                                                 User control has been an integral part of research on interactive
                                                                               recommender systems. Previous work shows a positive effect
Our study aims to provide the groundwork for developing high-                  of user control on user satisfaction [20, 10] and perceived
quality recommender systems offering sufficient user control,                  quality [22] of recommendations. We review several typical
while demanding acceptable cognitive load. Specifically, we                    systems that increase user involvement in various stages of
investigate the following questions:                                           the recommendation process, through different levels of user
RQ1: Do different levels of user control have an impact on                     control.
the cognitive load of using recommender systems and, if so,                    TasteWeights [5], LinkedVis [6] and SetFusion [20] use slid-
what is the impact?                                                            ers to revise user profile data and adjust the weights of the
RQ2: Do different levels of user control have an effect on                     recommender engine components, thereby improving recom-
acceptance of recommendations?                                                 mendation accuracy and user experience. As a result, users
                                                                               gain insight into how their actions affect the recommendations
RQ3: Will different recommender algorithms influence the                       in real-time. Some systems [19, 4, 14] use the distance be-
answers to RQ1 and RQ2?                                                        tween data nodes and the active user to represent the weight
Andjelkovic et al. [3] already show that users spend more                      of the selected node, which allows users to modify recommen-
effort with systems offering higher levels of user control than                dation preferences by adjusting the distances. PARIS-Ad [12]
with systems with lower levels of user control. However,                       researches the effects of user control on targeted advertising.
to the best of our knowledge, no comprehensive work has                        It allows the user to adjust her profile with drop-down lists
yet investigated to what extent varying levels of user control                 and check-lists, and visualises the recommendation process
influence the cognitive load of using recommender systems                      in a flowchart. MusiCube [24] refines the recommendations
                                                                               by asking the user to rate as many of the resulting items as
2 https://humansystems.arc.nasa.gov/groups/tlx                                 possible. All these systems demonstrate that user control has
Figure 1. Visualisation of the seed based algorithm. a): the recommendation source shows available top artist, tracks and genre tags. b): the recommen-
dation processor enables users to adjust the weight of the input data type and individual data items. c): play-list style recommendations.


a prominent impact on the accuracy and effectiveness of rec-                  Quiroga et al. [23] pointed out that information filtering and
ommendations. However, it is not clear if varying levels of                   building profiles on users’ organisational behaviour is essential
user control affect the robustness of the findings. We there-                 to reduce cognitive load.
fore intend to compare recommendation ratings in different
experimental tasks entailing different levels of user control.                Although we do not find that related work reveals the relation
                                                                              between levels of user control and cognitive load, Andjelkovic
Cognitive Load                                                                et al. [3] observed in their music recommender that additional
The construct of “cognitive load” is usually used to measure                  control to new aspects such as avatars might increase cogni-
how many cognitive resources are taken up by activities that                  tive load. In addition, Adil Yalçinn et al. [26] presented the
facilitate learning [9]. In general, cognitive load measurement               Cognitive Exploration Framework, providing guidelines to re-
is performed through the application of a post-study in the                   duce the cognitive load in their defined six stages of cognitive
form of a self-assessment questionnaire, or the analysis of                   activities in visual data exploration.
physiological data collected during task execution. The NASA                  In our study, we not only aim to provide effective user control
task-load index (NASA-TLX) is one of the most widely used                     to enhance the user experience with recommender systems, but
questionnaires to measure cognitive load, along six dimen-                    also to investigate how different levels of user control affect
sions: mental demand, physical demand, temporal demand,                       the cognitive load and recommendation quality. Moreover, we
own perception of performance, effort and frustration. Al-                    provide groundwork for designing user-centred recommender
though it is not designed to measure cognitive load in real-time,             systems that also adapt to different levels of user cognitive
it is easy to apply and reliable in many conditions.                          load.
The information visualisation community has adopted vari-
                                                                              SYSTEM DESIGN AND INTERACTIONS
ous physiological data to measure the cognitive load of using
different visualisation techniques [27]. Typically, researchers               Recommendation Algorithms
analyse eye tracking [1] and brain activity [2] data to estimate              In order to validate our research findings with different rec-
the cognitive load while performing tasks. However, even                      ommender approaches, we implemented three different al-
though physiological methods provide the means to estimate                    gorithms to generate music recommendations by using the
cognitive load in real-time, the cost of hardware such as eye                 Spotify API.
trackers and electroencephalography (EEG) systems and pro-
                                                                              Seed based algorithm
fessional training for analysing produced data are substantial
                                                                              The Spotify API provides a recommender service that gener-
barriers to the widespread adoption of this approach.
                                                                              ates a play-list-style listening experience based on three types
Previous work has demonstrated various ways of decreasing                     of seeds: artists, tracks and genres. We use the active user’s
cognitive load while improving the performance of interactive                 top artists, tracks and genres as input seeds. It is worth noting
recommender systems. Schnabel et al. [9] use shortlists as                    that the top artists and tracks are calculated by affinity, which
digital short-term memory. Since users do not need to keep the                is a measure of expected user preference for a particular track
considered items in their minds, the cognitive load is reduced.               or artist based on her listening history. The number of songs
 Figure 2. Visualisation of the artist based algorithm. a): the recommendation source panel shows available followed artists, b): the recommendation
 processor enables users to adjust the weight of related artists of selected followed artists, and c): play-list style recommendations.


 recommended through the use of a particular seed depends on                (c) recommendations: the recommended results are shown in a
 the weight of the seed’s type and the priority of the used seed                play-list style.
 among the seeds of same type.
                                                                             Visualisation of the seed based algorithm
 Artist based algorithm                                                      As presented in Figure 1(a), we use three distinct colors to
 The artist-based algorithm uses the item-item collaborative                 represent types of recommendation source data as visual cues
 filtering approach. First, the algorithm reads the list of user-            (yellow for artists, green for tracks, and blue for genres). Ad-
 followed artists. Then, the Spotify API allows us to find                   ditional source data for a particular type is loaded by clicking
 artists related to a followed artist by calculating the similarity          the “+” icon next to the title of source data type. Likewise,
 between them, which is based on analyses of the Spotify com-                we use the same color schema to encode the data type slider
 munity listening history. The top 20 tracks of these related                and selected source data (Figure 1 (b)), and recommendations
 artists are returned. The number of recommendations by an                   (Figure 1 (c)). As a result, the visual cues show the relation
 artist is proportional to the weight of the artist.                         among the data in three steps of the recommendation process.
                                                                             When users click on a particular data item in the recommenda-
 Hybrid based algorithm                                                      tion processor, the corresponding recommended items will be
 The hybrid based algorithm combines the seed based algorithm                highlighted, and an additional info view displays its details.
 and artist based algorithm. The same weight is assigned to
 both algorithms.                                                            Visualisation of the artist based algorithm
                                                                             To emphasise the concept of artist relations, this algorithm only
 User Interface and Visualisations                                           contains artist data items represented by the corresponding
 The user interface of the recommender was designed using                    artists’ portraits in addition to their names (Figure 2 (a)). When
 the metaphor of “processing” and “production”. It consists of               users drag an artist and drop it in the selected artists block,
 three parts:                                                                the top five related artists of the dropped artist are shown,
                                                                             each with a slider to adjust its weight (Figure 2 (b)). Similar
(a) The recommendations source view works as a warehouse of                  to the first visualization, recommendations are highlighted
    source data, such as top artists, top tracks, top genres, and            when users click on a particular artist in the recommendation
    followed artists, generated from past listening history.                 processor (Figure 2 (c)) to depict their relation.
(b) The recommendations processor shows areas in which                       Interactions and User Controls
    source items can be dropped from part (a). The dropped                   Our system offers several interactions to support our three
    data are bound to UI controls such as sliders or sortable lists          levels of user control.
    for weight adjustment. It also contains an additional info
    view to inspect details of selected data items. In addition,             Low level of user control
    a pair of radio buttons allows the user to switch between                In this level, users can sort the recommendation results by
    different algorithms.                                                    preference through a drop-down menu. Although ratings nor-
mally have no immediate effects on recommendations, we             recommendation source (moderate level control). Finally, they
still regard recommendations feedback as a kind of low level       rated each song.
user control. The star rating widget beside song title allows
users to rate the songs in the recommendation list (Figure 1(c),   T3: Users were asked to interact with recommendations by
Figure 2(c)).                                                      sorting (low level control) recommendations, modifying rec-
                                                                   ommendation source (moderate level control), tweaking the
Middle level of user control                                       parameters of algorithms (high level control). Once again,
In general, manipulating source data and checking details          participants were asked to rate each song.
compose the middle level of user control. A drag and drop          We split the 90 participants equally into three groups to vali-
interface allows users to intuitively add a new source data item   date the results with three different settings of recommender
to update recommendations (Figure 1(a), Figure 2(a)). When         algorithms: the seed-based algorithm (Setting 1), the artist-
a preferred source item is dropped to the recommendation           based algorithm (Setting 2), and a hybrid of the two algorithms
processor, a progress animation will play until the end of the     with equal weight (Setting 3).
processing. Users are also able to simply remove a dropped
data item from the processor by clicking the corresponding         Participants of each group tested one algorithm setting with
“x” icon. Moreover, by selecting an individual item, users can     three experimental tasks. The order of the three tasks has been
inspect its detail: artists are accompanied by their name, an      mixed to avoid learning effects.
image, popularity, genres, and number of followers; tracks are
shown with their name, album cover, popularity, and audio          Evaluation Procedure
clip; and genres are accompanied by their top related artists      The participants were asked to watch a task tutorial. Only the
and tracks.                                                        features of the particular setting were shown in this video. Af-
                                                                   ter interacting with the visualization, participants were asked
High level of user control                                         to rate the top-20 recommended songs that resulted from their
The high level of user control allows users to tweak the un-       interaction, and to fill out the NASA-TLX questionnaire to
derlying algorithm as a basis to further manipulate the recom-     measure their cognitive load. Users had to complete this ques-
mendation process. To support this level of control, multiple      tionnaire in the three experimental tasks. At the end of the
UI components are developed to adjust the weight associated        task, they were asked to fill out a questionnaire that was based
with the type of data items, or the weight associated to an        on a part of the ResQue to evaluate the perceived quality of
individual data item. In the seed based algorithm, users are       the recommender with all levels of user control. To assess the
able to specify their preferences for each data type by manipu-    validity of the responses, we set contradictory questions in this
lating a slider for each data type. By sorting a list of dropped   questionnaire. In addition, user interactions with the different
data items, users can set the weight of each item in this list     components of the visualization were recorded in a log.
(Figure 1(b)). Similarly, the weight of related artists can be
manipulated by moving its associated slider in the artist based    RESULTS
algorithm visualisation (Figure 2(b)).                             To analyze the cognitive load, we calculated the score from
                                                                   participant responses to the NASA-TLX questionnaire, which
EVALUATION                                                         ranges from 0 to 100. The higher the score is, the more cogni-
We evaluated our system by conducting a study on Amazon            tive load is required. Since we intend to measure the overall
Mechanical Turk (MTurk) with 107 participants who are all          accuracy of a recommendation list, we apply the Breese’s
active users of Spotify. 17 of our participants were rejected      R-Score “utility” metric [7] to calculate a utility score. The
because of their repetitive and invalid answers. In the end,       rating score for a song ranges from 1 to 5, and the default score
we had 90 valid participants (48 female, 42 male), their ages      is 1. We also analyze responses to the ResQue-based question-
ranged from 20 to 48 years (mean age = 29.8 years, SD =            naire, and report the results separately for each recommender
7.51, Median = 28). 86.67% of participants are familiar with       algorithm.
recommender system. We paid $ 1 for each study. The average
study completion time was around 33 minutes (SD = 7.23,            Cognitive load
Median = 33).                                                      Setting 1: seed based
                                                                   Descriptive statistics show that participants have the highest
Evaluation Design
                                                                   cognitive load in T3 (M=57.14), followed by T2 (M=46.11)
We designed a within-subjects study to investigate the effects     and T1 (M=31.43). We performed a one-way repeated
of different levels of user control on cognitive load and ac-      ANOVA to test for significance. There was a significant effect
ceptance of recommendations. Therefore, we created three           for cognitive load, F(2, 58) = 44.47, p<.001. Bonferroni-
experimental tasks T1, T2, and T3 corresponding to the differ-     corrected pairwise comparisons (sig. level = .016) revealed
ent levels of user control.                                        that T3 requires significantly higher cognitive load than T2
T1 Users were only allowed to interact with recommendations        (p<.001) and T1 (p<.001). T2 required a significantly higher
by sorting recommendations (low level control) in a list. In the   cognitive load than T1 (p<.001).
end, they rated each song in the list of recommended items.
                                                                   Setting 2: artist based
T2: Users were asked to interact with recommendations by           Descriptive statistics show that participants in T3 (M=50.32)
sorting (low level control) recommendations and modifying          have the highest cognitive load, followed by T2 (M=38.57) and
T1 (M=30.24). To test for significance, we performed a one-               Settings        Low level       Middle level   High level
way repeated ANOVA test. To compensate for violations of the               Setting 1      60.5%           27.9%          11.6%
sphericity assumption (Mauchly’s (W(df=2) = .721, (p=.010),                Setting 2      54.4%,          28.3%          17.3%
the significance levels were corrected by Greenhouse-Geisser.              Setting 3      51.9%,          26.9%          21.2%
The corrected score shows a significant effect for cognitive        Table 2. Percentage of interactions with each level of control in task 3.
load, F(1.56, 45.36) = 15.42, p<.001. Bonferroni-corrected
pairwise comparisons (sig. level = .016) revealed that T3
required significantly higher cognitive load than T2 (p=.001)
and T1 (p<.001), and T2 required significantly higher load          Overall user experience
than T1 (p=.009).                                                   The left bar chart (Figure 3) plots users’ attitudes towards the
                                                                    various controls of the recommender systems. Participants
Setting 3: hybrid                                                   seem to enjoy using a drag-and-drop interface to manipulate
Descriptive statistics show that T3 (M=52.14) requires the          the recommendation process. The system also allows users to
highest cognitive load, followed by T2 (M=45.87) and T1             express their preferences easily. In general, users like to give
(M=34.44). To test for significance, we performed a one-way         feedback and modify their data. However, it seems that only a
repeated ANOVA test. The corrected score shows a significant        part of participants would like to control more components of
effect for cognitive load, F(2, 58) = 8.54, p=.001. Bonferroni-     the system. It is worth noting that 91.1% of the participants
corrected pairwise comparisons (sig. level = .016) revealed         who would like to tweak the high level control have experience
that both T3 (p<.001) and T2 (p=.001) require significantly         with recommender systems and 95.6% of them enjoy listening
higher cognitive load than T1.                                      to music online.
In general, T3 requires a significantly higher cognitive load in    The chart on the right side illustrates the users’ positive re-
all three settings. But the differences between T2 and T1 and       sponses to our system in terms of other user experience aspects
between T3 and T2 are not always significant.                       such as novelty, diversity and confidence. Users indicated that
                                                                    using our system was fun and that they easily became familiar
                                                                    with the system. Despite these merits, some users are not sure
Acceptance of recommendations                                       they would use this system frequently to listen to music.
Setting 1: seed based
                                                                    Log file data
Descriptive statistics show that the list of recommendations
in T3 (M=3.49) was rated higher than T2 (M=2.95) and T1             Since we intend to know how often users will interact with
(M=2.08). A one-way repeated ANOVA test was conducted               each user control, we also analyzed interaction data.
for examining significance. A significant effect is found for       We report the percentage of interactions for each level of con-
user rating, F(2, 58) = 25.04, p<.001. Bonferroni-corrected         trol in T3, where all levels of control are presented (Table 2).
pairwise comparisons (sig. level = .016) revealed that ratings      More than half of the interactions are related to low level con-
of recommendations in T3 was rated significantly higher than        trols, and around a quarter of clicks are related to middle level.
those in T2 (p=.003) and T1 (p=.001), and recommendations           Only a small part of clicks were done with high level controls.
in T2 were rated significantly higher than those in T1 (p=.001).
                                                                    DISCUSSION
Setting 2: artist based                                             In this section, we discuss the results presented in the pre-
Descriptive statistics show that the list of recommendations        vious section, thereby answering the research questions and
in T3 (M=3.54) was rated higher than in T2 (M=2.92) and             evaluating the proposed hypothesis.
T1 (M=2.41). The result of a one-way repeated ANOVA test
shows a significant effect for user rating, F(2, 58) = 14.68,       Overall, the results of NASA-TLX show that the higher level
p<.001. Bonferroni-corrected pairwise comparisons (sig. level       of user control tends to increase cognitive load (RQ1). Spe-
= .016) revealed that recommendations in T3 were rated sig-         cially, we see that the high level user control has significantly
nificantly higher than in T2 (p<.001) and T1 (p=.001).              higher cognitive load than the low level, in all of the three set-
                                                                    tings. Previous work [12, 10, 20] has reported that user control
Setting 3: hybrid                                                   improves the accuracy of recommendations. Furthermore, the
Descriptive statistics show that the list of recommendations        results of user ratings indicate that the level of control has a
in T3 (M=3.27) was rated higher than in T2 (M=3.21) and             significant influence on the acceptance of recommendations
                                                                    (RQ2). The poor result in T1 may suffer from the unmodifiable
T1 (M=2.59). The result of a one-way repeated ANOVA
                                                                    tags for bootstrapping the system. Also in our data, we can
test shows a significant effect for user rating, F(2, 58) =
                                                                    observe that the high level user control increases the quality of
7.80, p<.001. Bonferroni-corrected pairwise comparisons (sig.
                                                                    recommendations in all three settings. The effects of levels of
level = .016) revealed that the lists of recommendations in T3
(p=.002) and T2 (p=.004) were rated significantly higher than       control on cognitive load and acceptance of recommendations
in T1.                                                              are not statistically significant when we compare the high level
                                                                    control to the middle level, and the middle level control to
By comparing the findings of different settings, we find that the   the low level in some settings. By comparing the mean value
list of recommendations in T3 was always rated significantly        of each result, our findings can be validated with different
higher than in other settings.                                      algorithms (RQ3). Besides, it seems that participants have
                              Figure 3. User responses to the ResQue based questionnaire in the three settings.


difficult in understanding what they can control and have less            in questionnaires, the validity of study results may still suf-
interest while performing T3 in Setting 2. A possible explana-            fer from inattentive or “spamming” users. Second, the re-
tion is that the current visualization does not clearly plot the          search finding should be validated in other application do-
relations among artists, suggesting that a network graph could            mains. Third, the research findings were found based on
be a better option.                                                       specific user control mechanisms implemented in the study
                                                                          system. Our future work will focus on adapting the user inter-
Log file data also suggests that users are more likely to tweak           face of recommender systems to address the individual needs
the low level and the middle level control. By looking at the
                                                                          and preferences of users.
user profile, we find that participants with rich experience
with recommender systems and online music tend to tweak
                                                                          Acknowledgements
the high level user control more frequently. The majority of
participants prefers to have only low and middle level user              The research has been partially financed by the KU Leuven
control. This may depend on user personal characteristics                Research Council (grant agreement no. C24/16/017).
and domain knowledge [15]. In addition, a drag-and-drop UI
seems to allow users to interact with the system intuitively.             REFERENCES
In spite of the merits in our system, users hesitate to use it             1. Erik W Anderson. 2012. Evaluating visualisation using
for listening to music. A potential reason is that many users                 cognitive measures. In Proceedings of the 2012 BELIV
prefer to listen and discover music on mobile devices with                    Workshop: Beyond Time and Errors-Novel Evaluation
simple interactions rather than on large screens with complex                 Methods for Visualization. ACM, 5.
interaction [13].                                                          2. Erik W Anderson, Kristin C Potter, Laura E Matzen,
                                                                              Jason F Shepherd, Gilbert A Preston, and Cláudio T Silva.
CONCLUSION
                                                                              2011. A user study of visualisation effectiveness using
                                                                              EEG and cognitive load. In Computer Graphics Forum,
We define three levels of user control to investigate the ef-
                                                                              Vol. 30. Wiley Online Library, 791–800.
fects of levels of control on cognitive load and acceptance of
recommendations. We designed and implemented a music                       3. Ivana Andjelkovic, Denis Parra, and John O’Donovan.
recommender with three distinct settings of recommender al-                   2016. Moodplay: interactive mood-based music
gorithms. An online study was performed to answer research                    discovery and recommendation. In Proceedings of the
questions. We conclude with the following findings:                           2016 Conference on User Modeling Adaptation and
• By incorporating higher level of user control, cognitive load               Personalization. ACM, 275–279.
  tends to increase.                                                       4. Fedor Bakalov, Marie-Jean Meurs, Birgitta König-Ries,
                                                                              Bahar Sateli, René Witte, Greg Butler, and Adrian Tsang.
• By incorporating higher level of user control, the recom-                   2013. An approach to controlling user models and
  mendations are more likely to be accepted.                                  personalization effects in recommender systems. In
                                                                              Proceedings of the 2013 international conference on
                                                                              Intelligent user interfaces. ACM, 49–56.
• Our research findings are generalizable to different recom-
  mender algorithms.                                                       5. Svetlin Bostandjiev, John O’Donovan, and Tobias
                                                                              Höllerer. 2012. TasteWeights: a visual interactive hybrid
Our study has three main limitations: first, although we have                 recommender system. In Proceedings of the 6th ACM
excluded unqualified users by setting contradictory questions                 conference on Recommender systems. ACM, 35–42.
 6. Svetlin Bostandjiev, John O’Donovan, and Tobias               17. Joseph A Konstan and John Riedl. 2012. Recommender
    Höllerer. 2013. LinkedVis: exploring social and semantic          systems: from algorithms to user experience. User
    career recommendations. In Proceedings of the 2013                modeling and user-adapted interaction 22, 1 (2012),
    international conference on Intelligent user interfaces.          101–123.
    ACM, 107–116.
                                                                  18. Jakob Nielsen. 1999. Designing web usability: The
 7. John S Breese, David Heckerman, and Carl Kadie. 1998.             practice of simplicity. New Riders Publishing.
    Empirical analysis of predictive algorithms for
    collaborative filtering. In Proceedings of the 14th           19. John O’Donovan, Barry Smyth, Brynjar Gretarsson,
    conference on Uncertainty in artificial intelligence.             Svetlin Bostandjiev, and Tobias Höllerer. 2008.
    Morgan Kaufmann Publishers Inc., 43–52.                           PeerChooser: visual interactive recommendation. In
                                                                      Proceedings of the SIGCHI Conference on Human
 8. Robin Burke. 2002. Hybrid recommender systems:                    Factors in Computing Systems. ACM, 1085–1088.
    survey and experiments. User Modeling and
    User-Adapted Interaction 12, 4 (2002), 331–370.               20. Denis Parra and Peter Brusilovsky. 2015.
                                                                      User-controllable personalization: a case study with
 9. Paul Chandler and John Sweller. 1991. Cognitive load              SetFusion. International Journal of Human-Computer
    theory and the format of instruction. Cognition and               Studies 78 (2015), 43–67.
    instruction 8, 4 (1991), 293–332.
10. F Maxwell Harper, Funing Xu, Harmanpreet Kaur, Kyle           21. Pearl Pu, Li Chen, and Rong Hu. 2011. A user-centric
    Condiff, Shuo Chang, and Loren Terveen. 2015. Putting             evaluation framework for recommender systems. In
    users in control of their recommendations. In                     Proceedings of the fifth ACM conference on
    Proceedings of the 9th ACM Conference on                          Recommender systems. ACM, 157–164.
    Recommender Systems. ACM, 3–10.                               22. Pearl Pu, Li Chen, and Rong Hu. 2012. Evaluating
11. Chen He, Denis Parra, and Katrien Verbert. 2016.                  recommender systems from the user’s perspective: survey
    Interactive recommender systems: a survey of the state of         of the state of the art. User Modeling and User-Adapted
    the art and future research challenges and opportunities.         Interaction 22, 4 (2012), 317–355.
    Expert Systems with Applications 56 (2016), 9–27.             23. Luz M Quiroga, Martha E Crosby, and Marie K Iding.
12. Yucheng Jin, Karsten Seipp, Erik Duval, and Katrien               2004. Reducing cognitive load. In Proceedings of the
    Verbert. 2016. Go with the flow: effects of transparency          Proceedings of the 37th Annual Hawaii International
    and user control on targeted advertising using flow charts.       Conference on System Sciences (HICSS’04)-Track
    In Proceedings of the International Working Conference            5-Volume 5. IEEE Computer Society, 50131–1.
    on Advanced Visual Interfaces. ACM, 68–75.                    24. Yuri Saito and Takayuki Itoh. 2011. MusiCube: a visual
13. Mohsen Kamalzadeh, Christoph Kralj, Torsten Möller,               music recommendation system featuring interactive
    and Michael Sedlmair. 2016. TagFlip: active mobile                evolutionary computing. In Proceedings of the 2011
    music discovery with social tags. In Proceedings of the           Visual Information Communication-International
    21st International Conference on Intelligent User                 Symposium. ACM, 5.
    Interfaces. ACM, 19–30.
                                                                  25. Ben Shneiderman. Designing the User Interface. Pearson
14. Antti Kangasrääsiö, Dorota Glowacka, and Samuel Kaski.            Education India.
    2015. Improving controllability and predictability of
    interactive recommendation interfaces for exploratory         26. M Adil Yalçin, Niklas Elmqvist, and Benjamin B
    search. In Proceedings of the 20th international                  Bederson. 2016. Cognitive Stages in Visual Data
    conference on intelligent user interfaces. ACM, 247–251.          Exploration. In Proceedings of the Beyond Time and
                                                                      Errors on Novel Evaluation Methods for Visualization.
15. Bart P Knijnenburg, Niels JM Reijmer, and Martijn C               ACM, 86–95.
    Willemsen. 2011. Each to his own: how different users
    call for different interaction methods in recommender         27. Johannes Zagermann, Ulrike Pfeil, and Harald Reiterer.
    systems. In Proceedings of the 5th ACM conference on              2016. Measuring cognitive load using eye tracking
    Recommender systems. ACM, 141–148.                                technology in visual computing. In BELIV’16:
                                                                      Proceedings of the Sixth Workshop on Beyond Time and
16. Bart P Knijnenburg, Martijn C Willemsen, Zeno Gantner,            Errors on Novel Evaluation Methods for Visualization.
    Hakan Soncu, and Chris Newell. 2012. Explaining the               78–85.
    user experience of recommender systems. User Modeling
    and User-Adapted Interaction 22, 4-5 (2012), 441–504.