FULL PAPER

          Proceedings of the ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces (UCERSTI),
                                                         Barcelona, Spain, Sep 30, 2010
                                 Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf


    A User-Centric Evaluation Framework of Recommender
                           Systems
                                Pearl Pu                                                                      Li Chen
            Human Computer Interaction Group                                                   Department of Computer Science
        Swiss Federal Institute of Technology (EPFL)                                            Hong Kong Baptist University
             CH-1015, Lausanne, Switzerland                                                    224 Waterloo Road, Hong Kong
                   Tel: +41-21-6936081                                                               Tel: +852-34117090
                         pearl.pu@epfl.ch                                                       lichen@comp.hkbu.edu.hk

ABSTRACT                                                                          behavior or their explicitly stated preferences. It is no longer a
User experience research is increasingly attracting researchers’                  fanciful website add-on, but a necessary component. According to
attention in the recommender system community. Existing works                     the 2007 ChoiceStream survey,1 45% of users are more likely to
in this area have suggested a set of criteria detailing the                       shop at a website that employs recommender technology.
characteristics that constitute an effective and satisfying                       Furthermore, a higher percentage (69%) of users in the highest
recommender system from the user’s point of view. To combine                      spending category are more likely to desire the support of
these criteria into a more comprehensive framework which can be                   recommendation technology.
used to evaluate the perceived qualities of recommender systems,                  Characterizing and evaluating the quality of user experience and
we have developed a model called ResQue (Recommender                              users’ subjective attitudes toward the acceptance of recommender
systems’ Quality of user experience). ResQue consists of 13                       technology is an important issue which merits attention from
constructs and a total of 60 question items, and it aims to assess                researchers and practitioners in both web technology and human
the perceived qualities of recommenders such as their usability,                  factor fields. This is because recommender technology is
usefulness, interface and interaction qualities, users’ satisfaction              becoming widely accepted as an important component that
of the systems, and the influence of these qualities on users’                    provides both user benefits and enhances the website’s revenue.
behavioral intentions, including their intention to purchase the                  For users, the benefits include more efficiency in finding
products recommended to them, return to the system in the future,                 preferential items, more confidence in making a purchase decision,
and tell their friend about the system. This model thus identifies                and a potential chance to discover something new. For the
the essential qualities of an effective and satisfying recommender                marketer, this technology can significantly enhance user
system and the essential determinants that motivate users to adopt                likelihood to buy the items recommended to them, their overall
this technology. The related questionnaire can be further adapted                 satisfaction and loyalty, increasing users’ likelihood to return to
for a custom-made user evaluation or combined with objective                      the site and recommend the site to their friends. Thus, evaluating
performance measures. We also propose a simplified version of                     user’s perception of a recommender system can help developers
the model with 15 questions which can be employed as a usability                  and marketers understand more precisely if users actually
questionnaire for recommender systems.                                            experience and appreciate the intended benefits. This will, in turn,
                                                                                  help improve the various aspects of the system and more
Categories and Subject Descriptors                                                accurately predict the adoption of a particular recommender.
H1.2 [User/Machine Systems]: Human factors; H5.2 [User
                                                                                  So far, previous research work on recommender system
Interfaces]: evaluation/methodology, user-centered design.
                                                                                  evaluation has mainly focused on algorithm accuracy [9,1],
General Terms                                                                     especially objective prediction accuracy [25,26]. More recently,
Measurement, Experimentation, Human Factors.                                      researchers began examining issues related to users’ subjective
                                                                                  opinions [30, 13] and developing additional criteria to evaluate
Keywords                                                                          recommender systems [18, 33]. In particular, they suggest that
Quality measurement, usability evaluation, recommender systems,                   user satisfaction does not always correlate with high recommender
quality of user experience, e-Commerce recommender, post-study                    accuracy.      Increasingly, researchers are investigating user
questionnaire, evaluation of decision support.                                    experience issues such as identifying determinants that influence
                                                                                  users’ perception of recommender systems [30], effective
1. INTRODUCTION                                                                   preference elicitation methods [19], techniques that motivate users
A recommender system is a web technology that proactively                         to rate items that they have experienced [2], methods that generate
suggests items of interest to users based on their objective                      diverse and more satisfying recommendation lists [43],
                                                                                  explanation interfaces [31], trust formation with recommenders
                                                                                  [6], and design guidelines for enhancing a recommender’s
 Permission to make digital or hard copies of all or part of this work for        interface layout [22]. However, the field lacks a general definition
 personal or classroom use is granted without fee provided that copies are        and evaluation framework of what constitutes an effective and
 not made or distributed for profit or commercial advantage and that              satisfying recommender system from the user’s perspective.
 copies bear this notice and the full citation on the first page. To copy
 otherwise, or republish, to post on servers or to redistribute to lists,
 requires prior specific permission and/or a fee.
 UCERSTI Workshop of RecSys’10, Sept. 26-30, 2010, Barcelona, Spain.              1
                                                                                      2007 ChoiceStream Personalization Survey, ChoiceStream, Inc.


                Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.
                         This volume is published and copyrighted by its editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.


                                                                             14
                                                                     FULL PAPER

         Proceedings of the ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces (UCERSTI),
                                                        Barcelona, Spain, Sep 30, 2010
                                Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf


Our present work aims to review existing usability-oriented                    faster) and satisfaction (increase the ease of use and enjoyment).
evaluation research in the field of recommender systems to                     These aims are very similar to the set of criteria that we have
identify essential determinants that motivate users to adopt this              developed in ResQue, except the fact that we focus more on the
technology. We then apply well-known usability evaluation                      system as a whole rather than just the explanation component.
models, including TAM [7] and SUMI [15], in order to develop a                 Ozok et al. [22] explored recommender systems’ usability and
more balanced framework. The final model, which we call                        user preferences from both the structural (how recommender
ResQue, consists of 13 constructs and a total of 60 question items             systems should look) and content (what information recommender
categorized into four main dimensions: the perceived system                    systems should contain) perspectives. A two-layer interface
qualities, users’ beliefs as a result of these qualities, their                usability evaluation model including both micro- and macro-level
subjective attitudes, and their behavioral intentions. The structure           interface evaluations was proposed, followed by a Survey on
and criteria of our framework is derived on the basis of three                 Usability of E-Commerce Recommender Systems (SUERS). The
essential characteristics of recommender systems: 1) being an                  survey was administered on 131 college-aged online shoppers to
interaction-driven application and a critical part of online e-                measure and rank the importance of structural and content aspects
commerce services, 2) providing information filtering technology               of recommender systems from the shoppers’ perspectives. The
and suggesting recommended items, and 3) providing decision                    main result was a set of 14 design guidelines. The micro-level of
support technology for the users.                                              the guidelines provided suggestions specific to the recommended
The main contribution of this paper is the development of a well-              product such as what attributes (name, price, image, description,
balanced evaluation framework for measuring the perceived                      rating, etc.) to include in the interface. The macro-level of the
qualities of a recommender and predicting users’ behavioral                    guidelines provided suggestions concerning when, where and how
intentions as a result of these qualities. Thus, it is a forecasting           the recommended products should be displayed. The development
model that helps us understand users’ motivation in adopting a                 process of the model was limited, as authors did not go through an
certain recommender. Secondly, the framework aims to help                      iterative process of the evaluation and refinement of the model.
designers and researchers easily perform a usability and user                  Instead, it was purely based on a literature survey of quite limited
acceptance test during any stage of the design and deployment                  past work of subjective evaluations of recommender system. Most
phase of a recommender. These usability tests can be performed                 importantly, it failed to explain how usability issues influence
either on a stand-alone basis or as a post-study questionnaire. The            users’ behavioral intentions such as their intention to buy the
model can be further combined with measurements that address                   items recommended to them, whether they will continue using the
other perceived qualities of a recommender, such as security and               system and recommend the system to their friends.
robustness issues. For those who are interested in a quick usability           Jones and Pu [13] presented the first significant user study that
evaluation, we also propose a simplified version of the model with             aimed to understand users’ initial adoption of the recommender
15 questions.                                                                  technology and their subjective perceptions of the system. Study
                                                                               results show that a simple interface design, a small amount of
2. EVALUATION WORK FROM USERS’                                                 initial effort required by the system to get to know the users, the
POINT OF VIEW                                                                  perceived qualities such as the subjective accuracy, novelty and
Swearingen and Sinha [38] conducted a user study on eleven                     enjoyability of the recommended items are the key design factors
recommender systems in order to understand and discover                        that significantly enhance the website’s ability to attract users.
influential factors, other than algorithm accuracy, that affect
users’ perception. The main results are that transparent system                3. MODEL DEVELOPMENT
logic, recommendation of familiar items, and sufficient supporting             A measurement model consists of a set of constructs, the
information to recommended items is crucial in influencing users’              participating questions for each construct, the scale’s dimensions,
favorable perception towards the system. They also highlighted                 and a procedure for conducting the questionnaire. Psychometric
that trust and willingness to purchase should be noted. In addition,           questionnaires such as the one proposed in this paper require the
the users’ appreciation of online recommendations is compared                  validation of the questions used, data gathering, and statistical
with that of recommendations from their friends, defining the                  analysis before they can be used with confidence. The current
notion of relative accuracy.                                                   model and its constructs were based on our past work in
McNee et al. [20] pointed out that accuracy metrics alone and the              investigating various interface and interaction issues between
commonly employed leave-one-out procedure was very limited in                  users and recommenders. In over 10 user studies, we have
evaluating recommender systems. User satisfaction does not                     carefully and progressively developed and employed user
always correlate with high recommender accuracy. Metrics are                   satisfaction questionnaires to evaluate recommenders’ perceived
needed to determine good and useful recommendations, such as                   qualities such as ease of use, perceived usefulness and users’
the serendipity, salience, and diversity of the recommended items.             satisfaction and behavioral intentions [4,5,6,12,13,14,23,24]. This
                                                                               past research has given us a unique opportunity to synthesize and
Tintarev and Masthoff provided a comprehensive survey of the                   organize the accumulation of existing questionnaires and develop
explanation functionality used in ten academic and eight                       a well-balanced framework.
commercial recommenders [31]. They derived seven main aims of
the explanation facility which can help a recommender                          In the model development process, we also compare our
significantly enhance users’ satisfaction: transparency (explains              constructs with those used in TAM and SUMI, two well-known
why recommendations were generated), scrutability (the ability                 and widely adopted measurement frameworks.
for the user to critique the system), trust (increase users’                   TAM (Technology Acceptance Model) seeks to understand a set
confidence in the system), effectiveness (help users make good                 of perceived qualities of a system and users’ intention to adopt the
decisions), persuasiveness (convince users to try or buy items                 system as a result of these qualities, thus explaining not only the
recommended to them), efficiency (help users make decisions                    desirable outcome of a system but also users’ motivation. The


              Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.
                       This volume is published and copyrighted by its editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.


                                                                          15
                                                                      FULL PAPER

         Proceedings of the ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces (UCERSTI),
                                                        Barcelona, Spain, Sep 30, 2010
                                Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf


original TAM listed three constructs: perceived ease of use of a                3.1 Perceived System Qualities
system, its perceived usefulness and users’ intention to use the                This construct refers to the functional and informational aspect of
system. However, TAM was also criticized for its over-simplicity                a recommender and how the perceived qualities of these aspects
and generality. Venkatesh et al. [32] formulated an updated                     influence users’ beliefs on the ease of use, usefulness and
version of TAM, called the Unified Theory of Acceptance and                     control/transparency of a system. A recommender system is not
Use of Technology. In this more recent theory, four key constructs              simply part of a website, but more importantly a decision support
(performance expectancy, effort expectancy, social influence, and               tool. We focus on three essential dimensions: the quality of the
facilitating conditions) were presented as direct determinants of               recommended items, the interaction adequacy and the interface
usage intentions and behaviors.                                                 adequacy as the recommender helps users reach a purchase
SUMI (Software Usability Measurement Inventory) is a                            decision.
psychometric evaluation model developed by Kirakowski and
Corbett [15] to measure the quality of software from the end-                   3.1.1 Quality of Recommended Items
user’s point of view. The model consists of 5 constructs                        The items proposed by a recommender can be considered one of
(efficiency, affect, helpfulness, control, learnability) and 50                 the main features of the system. Qualities refer to the information
questions. It is widely used to help designers and developers                   quality and genuine usefulness of the suggested items. Presented
assess the quality of use of a software product or prototype and                as a collection of articles, the recommended items are often
can assist with the detection of usability flaws and the comparison             labeled and presented in a certain area of the recommender page.
between software products.                                                      Some systems also propose grouping them into meaningful
                                                                                subareas to increase users’ comprehension of the list and enable
By adapting our past work to the TAM and SUMI models, we                        them to more effectively reach decisions [4]. In our earlier work,
have identified 4 essential constructs of ResQue for a successful               we have found strong correlations of the following qualities of the
recommender system to fulfill from the users’ point of view: 1)                 recommended items to users’ intention to use the system.
user perceived qualities of the system, 2) user beliefs as a result of
these qualities in terms of ease of use, usefulness and control, 3)             Perceived accuracy is the degree to which users feel the
their subjective attitudes, and 4) their behavioral intentions. Figure          recommendations match their interests and preferences. It is an
1 depicts the detailed schema of the constructs of ResQue and                   overall assessment of how well the recommender has understood
some of the scales for each construct.                                          the users’ preferences and tastes. This subjective measure is
                                                                                significantly easier to obtain than the measure of objective
                                                                                accuracy that we used in our earlier work [23]. Our studies show
                                                                                that they are strongly correlated [6]. In other words, if users
                                                                                respond well to this question, it is likely that the underlying
                                                                                algorithm is accurate in predicting users’ interest. In addition, it is
                                                                                useful to use relative accuracy to compare the difference
                                                                                between recommendations a user may get from a system versus
                                                                                those from friends [28]. It can serve as a useful complement to
                                                                                perceived accuracy because it implicitly sets up friends’
                                                                                recommendation quality as a baseline.
                                                                                Familiarity describes whether or not users have previous
                                                                                knowledge of, or experience with, the items recommended to
                                                                                them. Swearingen and Sinha [30] indicated that users like and
                                                                                prefer to get recommendations of previously experienced items
                                                                                because their presence reinforces trust in the recommender system.
   Figure 1: Constructs of an Evaluation Framework on the                       However, users can be frustrated by too much familiarity.
       Perceived Qualities of Recommenders (ResQue).                            Therefore, it is important to know whether or not a recommender
When administering the questionnaires, we assume that a                         website has achieved the proper balance of familiarity and novelty
recommender system being evaluated is part of an online system.                 from the users’ perspective.
To make the evaluation more focused on the recommender                          Novelty (or discovery) is the extent to which users receive new
component, we often give subjects a specific task: “find an ideal               and interesting recommendations. The core concept of novelty is
product to buy/experience from an online site” where the                        related to the recommender’s ability to educate users and help
recommender in question is a constituent component.                             them discover new items [24]. In [20], a similar concept, called
In the following sections, the meaning of each scale as well as its             “serendipity”, was suggested. Herlocker [11] argued that novelty
subscales is defined and explained, and the sample questions that               is different from serendipity, because novelty only covers the
can be used in a questionnaire are suggested in the appendix at the             concept of “new” while serendipity means not only “new” but
end of the paper. It is a common practice in questionnaire                      also “surprising”. However, in conducting the actual user
development to vary the tone of items to control potential                      evaluation procedure, the meticulous distinction between these
response biases. Typically some of the items elicit agreement and               two words will cause confusion for users. Therefore, we suggest
others elicit disagreement. For some of the items, therefore, we                novelty and discovery as two similar questions. More user trials
also suggest reverse scale questions. A 5-point Likert scale from               will be needed to further delineate the serendipity question.
“strongly disagree” (1) to “strongly agree” (5) is recommended to               The Attractiveness of the recommended items refers to whether
characterize users’ responses.                                                  or not recommended items are capable of stimulating users’
                                                                                imagination and evoking a positive emotion of interest or desire.


               Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.
                        This volume is published and copyrighted by its editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.


                                                                           16
                                                                      FULL PAPER

         Proceedings of the ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces (UCERSTI),
                                                        Barcelona, Spain, Sep 30, 2010
                                Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf


Attractiveness is different from accuracy and novelty. An item can              or bad, while the more sophisticated ones show users a set of
be accurate and novel, but not necessarily attractive; a novel item             alternative items that take into account users’ desire for these
is different from anything a user has ever experienced, whereas an              items and the potential superior values they offer, such as better
attractive item stimulates the user in a positive manner. This                  price, more popularity, etc [6].
concept is similar to the salience factor in [20].                              The final interaction quality being measured is the system’s
While judging novelty requires a user to think more about the                   ability to explain the recommended results. Herlocker et al. [10],
distinguishing factors of an item, the aspect of attractiveness                 Sinha and Swearingen [30] and Tintarev and Masthoff [31]
brings to mind the outstanding quality of an item and has a more                demonstrated that a good explanation interface could help inspire
emotional tone to it.                                                           users’ trust and satisfaction by giving them information to
The enjoyability of recommended items refers to whether users                   personally justify recommendations, increasing user involvement
have enjoyed experiencing the items suggested to them. It was                   and educating users on the internal logic of the system [10, 31]. In
found to have a significant correlation to users’ intention to use              addition, Tintarev and Masthoff [31] defined in detail possible
and return to the system [13]. This is the only scale that assesses a           aims of explanation facilities: transparency, scrutability, trust,
user’s actual experience of a recommender. In many online study                 effectiveness, persuasiveness, efficiency, and satisfaction. Pu and
scenarios, it is not possible to immediately measure enjoyability               Chen extensively investigated design guidelines for developing
unless users are told to answer a questionnaire after a few weeks               explanation-based recommender interfaces [4]. They found that
when they have actually received and experienced the item. In                   organization interfaces are particularly effective in promoting
testing music or film recommenders, it is possible to allow users               users’ satisfaction of the system, convincing them to buy items
to answer this question if they are given the opportunity to listen             recommended to them, and bringing them back to the store in the
to a song excerpt or watch a movie trailer.                                     future.

Diversity measures the diversity level of items in the                          3.1.3 Interface Adequacy
recommendation list. As the recommendation list is the first piece              Interface design issues related to recommenders have also been
of information users will encounter before they examine the                     extensively investigated in [10, 20, 31,22]. Most of the existing
details of an individual recommendation, users’ impression of this              work is concerned with how to optimize the recommender page
list is important for their perception of the whole system. At this             layout to achieve the maximum visibility of the recommendation,
stage, it has been found that a low diversity level might disappoint            i.e. whether to use image, text, or a combination of the two. A
users and could cause them to leave this recommender [13].                      detailed set of design guidelines were investigated and proposed
McGinty and Smyth [17] proposed integrating diversity with                      [22]. In our current model, we mainly emphasize users’ subjective
similarity in order to adaptively select the appropriate strategy               evaluations of a recommender interface in terms of its information
(either similar or diverse ones) given each individual user’s past              sufficiency, the interface label and layout adequacy and clarity.
behavior and current needs. Literature also suggests that a
recommendation list as a complete entity should be judged for its               3.2 Beliefs
diversity rather than treating each recommendation as an isolated
item [33].                                                                      3.2.1 Perceived Ease of Use
                                                                                Perceived ease of use, also known as efficiency in SUMI and
Context compatibility evaluates whether or not the                              perceived cognitive effort in our existing work [6,14], measures
recommendations consider general or personal context                            users' ability to quickly and correctly accomplish tasks with ease
requirements. For example, for a movie recommender, the                         and without frustration. We also use it to refer to decision
necessary context information may include a user’s current mood,                efficiency, i.e. the extent to which a recommender system
different occasions for watching the movie, whether or not other                facilitates users to find their preferential items quickly. Although
people will be present, and whether the recommendation is timely.               task completion and learning time can be measured objectively, it
A good recommender system should be able to formulate                           can be difficult to distinguish the actual task completion time from
recommendations considering different kinds of contextual factors               the measured task time for various reasons. Users can be
that will likely take effect.                                                   exploring the website and discovering information unrelated to the
3.1.2 Interaction Adequacy                                                      assigned task. This is especially true if a system is entertaining
Besides issues related to the quality of recommended items, the                 and educational, and its interface and content is very appealing. It
                                                                                is also possible that the user perceives that he/she has consumed
system’s ability to present recommendations, to allow for user
                                                                                less time while the measured task completion time is in fact high.
feedback and to explain the reasons why recommendations
                                                                                Therefore, evaluating perceived ease of use may be more
facilitate purchasing decisions also weighs highly on users’
overall perception of a recommender. Thus, three main interaction               appropriate than using the objective task completion time to
                                                                                measure a system’s ease of use.
mechanisms are usually suggested in various recommenders:
initial preference elicitation, preference revision, and the system’s           Besides the overall perceived ease of use, perceived initial effort
ability to explain its results. Behavioral based recommenders do                should also be taken into account, given the new user problem.
not require users to explicitly indicate their preferences, but                 Perceived initial effort is the perceived effort users contribute to
collect such information via users’ browsing and purchasing                     the system before they get the first set of recommendations. The
history. For rating and preference based recommenders, this                     initial effort could be spent on rating items [19], specifying
process requires a user to rate a set of items or state their                   preferences, or answering personality quizzes [12]. Theoretically
preferences on desired items in a graphical user interface [23].                speaking, recommender systems should try to minimize the effort
Some conversational recommenders provide explicit mechanisms                    users expend for a good recommendation [30].
for users to provide feedback in the form of critiques [6]. The
simplest critiques indicate whether the recommended item is good


               Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.
                        This volume is published and copyrighted by its editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.


                                                                           17
                                                                      FULL PAPER

         Proceedings of the ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces (UCERSTI),
                                                        Barcelona, Spain, Sep 30, 2010
                                Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf


Easy to learn, known as “learnability” in SUMI, initially appears               accuracy of a recommendation is dependent on whether or not the
to be an inadequate dimension since most recommenders require a                 user sees a correspondence between the preferences expressed in
minimal amount of learning by design. However, since some                       the measurement process and the recommendation presented by
users may not initially notice the recommended items or know                    the system.
exactly what they were intended for, especially without clear
labels or explicit explanations on the interface, the learning aspect           3.3 Attitudes
should be included to measure the level of ease for users to                    Attitude is a user’s overall feeling towards a recommender, which
discover the recommended items. In addition, some                               is most likely derived from his/her experience as she interacts
recommenders, such as critiquing-based recommenders, do allow                   with a recommender. An attitude is generally believed to be more
users to provide feedback to increase the personalization of the                long-lasting than a belief. Users’ attitudes towards a recommender
recommender. In this case, the learning construct measures how                  are highly influential on their subsequent behavioral intentions.
easy it is for users to alter their personal profile information in             Many researchers attribute positive attitudes, including users’
order to receive different recommendations.                                     satisfaction and trust of a recommender, as important factors.
                                                                                Evaluating overall satisfaction determines what users think and
3.2.2 Perceived Usefulness
                                                                                feel while using a recommender system. It gives users an
Perceived usefulness of a recommender (called perceived
                                                                                opportunity to express their preferences and opinions about a
competence in our previous work) is the extent to which a user
                                                                                system in a direct way. Confidence inspiring refers to the
finds that using a recommender system would improve his/her
                                                                                recommender’s ability to inspire confidence in users, or its ability
performance, compared with their previous experiences without
                                                                                to convince users of the information or products recommended to
the help of a recommender [4]. This element requests users’
                                                                                them. Trust indicates whether or not users find the whole system
opinion as to whether or not this system is useful to them. Since
                                                                                trustworthy. Studies show that consumer trust is positively
recommenders used in e-commerce environments mainly assist
                                                                                associated with their intentions to transact, purchase a product,
users in finding relevant information to support their purchase
                                                                                and return to the website [8]. The trust level is determined by the
decision, we further qualify the usefulness in two aspects:
                                                                                reputation of online systems [8], as well as the recommender
decision support and decision quality.
                                                                                system’s ability to formulate good recommendations and provide
Recommender technology provides decision support to users in                    useful explanation interfaces [4,10,19]. However, as trust is a
the process of selecting preferential items, for example making a               long-term relationship between a user and an online system, it is
purchase in an e-commerce environment. The objective of                         sometimes difficult to measure trust purely after a short-period
decision technologies in general is to overcome the limit of users’             interaction with a system. Thus, we recommend observing the
bounded rationality and to help them make more satisfying                       trust formation over time, as users are incrementally exposed to
decisions with a minimal amount of effort [29]. Recommender                     the same recommender.
systems specifically help users manage an overwhelming flood of
information and make high-quality decisions under limited time                  3.4 Behavioral Intentions
and knowledge constraints. Decision support thus measures the                   Behavioral intentions towards a system is related to whether or
extent to which users feel assisted by the recommended system.                  not the system is able to influence users’ decision to use the
In addition to the efficiency of decision making, the quality of the            system and purchase some of the recommended results.
decision (decision quality) also matters. The quality of a system-              One of the fundamental goals for an e-commerce website is to
facilitated decision can be assessed by confidence criterion, which             maximize user loyalty and the lifetime value to stimulate users’
is the level of a user’s certainty in believing that he/she has made            future visits and purchases. User loyalty evaluates the system’s
a correct choice with the assistance of a recommender.                          ability to convince users to reuse the system, or persuade them to
                                                                                introduce the system to their friends in order to increase the
3.2.3 Control and Transparency                                                  number of clients. Accordingly, this dimension consists of the
User control measures whether users felt in control in their                    following criteria: user agreement to use the system, user
interaction with the recommender. The concept of user control                   acceptance of the recommended items (resulting in a purchase),
includes the system’s ability to allow users to revise their                    user retention and intention to introduce this system to her/his
preferences, to customize received recommendations, and to                      friends. By using a questionnaire, the user’s intention to return
request a new set of recommendations. This aspect weighs heavily                can be measured as a satisfactory approximation of actual user
in the overall user experience of the system. If the system does not            retention, because the Theory of Planned Behavior [32] states that
provide a mechanism for a user to reject recommendations that                   behavioral intention can be a strong predictor of actual behavior.
he/she dislikes, a user will be unable to stop the system from                  Although the website’s integrity, reputation and price quality will
continuously recommending items which might cause him/her to                    also likely impact user loyalty, the most important factor for a
be disappointed with the system.                                                recommender system is to help users effectively find a satisfying
Transparency determines whether or not a system allows users to                 product, i.e. the quality of its recommendations [7].
understand its inner logic, i.e. why a particular item is
recommended to them. A recommender system can convey its                        4. SIMPLIFIED MODEL
inner logic to the user via an explanation interface [4,10,30,31].              In the previous sections, we described the development process of
To date, many researchers have emphasized that transparency has                 a subjective evaluation framework to measure users’ perceived
a certain impact on other critical aspects of users’ perception.                qualities of a recommender as well as users’ behavioral intentions
Swearingen and Sinha [30] showed that the more transparent a                    such as their intention to buy or use the items suggested to them,
recommended product is, the more likely users would be to                       continue to use the system, and tell their friends about the
purchase it. In addition, Simonson [27] suggested that perceived                recommender. We described both the constructs and
                                                                                corresponding sample questions (see Appendix A for a summary).


               Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.
                        This volume is published and copyrighted by its editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.


                                                                           18
                                                                     FULL PAPER

         Proceedings of the ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces (UCERSTI),
                                                        Barcelona, Spain, Sep 30, 2010
                                Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf


Our overall motivation for this research was to understand the                      The recommender gave me good suggestions.
crucial factors that influence the user adoption of recommenders.                   I am not interested in the items recommended to me (reverse
Another motivation is to come up with a subjective evaluation                        scale).
questionnaire that other researchers and practitioners can employ.
However, it is unlikely that a 60-item questionnaire can be                    A.1.2 Relative Accuracy
administered for a quick and easy evaluation. This has motivated                The recommendation I received better fits my interests than
us in proposing a simplified model based on our past research.                     what I may receive from a friend.
Between 2005 and 2010, we have administered 11 subjective                       A recommendation from my friends better suits my interests
questionnaires on a total of 807 subjects [4,5,6,12,13,14,23,24].                  than the recommendation from this system (reverse scale).
Initial questionnaires covered some of the four categories
identified in the ResQue. As we conducted more experiments, we                 A.1.3 Familiarity
became more convinced of the four categories and used all of
them in recent studies. On average, between 12 and 15 questions                    Some of the recommended items are familiar to me.
were used. Based this previous work, we have synthesized and                       I am not familiar with the items that were recommended to me
organized a total of 15 questions as a simplified model for the                     (reverse scale).
purpose of performing a quick and easy usability and adoption
evaluation of a recommender (see questions with * sign).                       A.1.4 Attractiveness
                                                                                The items recommended to me are attractive.
5. CONCLUSION AND FUTURE WORK
User evaluation of recommender systems is a crucial subject of                 A.1.5 Enjoyability
study that requires a deep understanding, development and testing               I enjoyed the items recommended to me.
of the right dimensions (or constructs) and the standardization of
the questions used. The framework described in this paper                      A.1.6 Novelty
presents the first attempt to develop a complete and balanced                      The items recommended to me are novel and interesting.*
evaluation framework that measures users’ subjective attitudes
                                                                                   The recommender system is educational.
based on their experience towards a recommender.
                                                                                   The recommender system helps me discover new products.
ResQue consists of a set of 13 constructs and 60 questions for a                   I could not find new items through the recommender (reverse
high-quality recommender system from the user point of view and                     scale).
can be used as a standard guideline for a user evaluation. It can
also be adapted to a custom-made user evaluation by tailoring it in            A.1.6 Diversity
an individual research context. Researchers and practitioners can               The items recommended to me are diverse.*
use these questionnaires with ease to measure users’ general                    The items recommended to me are similar to each other
satisfaction with recommenders, their readiness to adopt the                       (reverse scale).*
technology, and their intention to purchase recommended items
and return to the site in the future.                                          A.1.7 Context Compatibility
After ResQue was finalized, we asked several expert researchers                 I was only provided with general recommendations.
in the community of recommender systems to review the model.
                                                                                   The items recommended to me took my personal context
Their feedback and comments were then incorporated into the
                                                                                    requirements into consideration.
final version of the model. This method, known as the Delphi
                                                                                   The recommendations are timely.
method, is one of the first validation attempts on the model. Since
the work was submitted, we have started conducting a survey to
further validate the model’s reliability, validity and sensitivity             A2. Interaction Adequacy
using factor analysis, structural equation modeling (SEM), and                     The recommender provides an adequate way for me to express
other techniques described in [21]. Initial results based on 150                    my preferences.
participants indicate how the model can be interpreted and show                    The recommender provides an adequate way for me to revise
factors that correspond to the original model. At the same time,                    my preferences.
analysis also gives some indications on how to refine the model.                   The recommender explains why the products are
More users are expected to participate in the survey and the final                  recommended to me.*
outcome will be soon reported.
APPENDIX                                                                       A3. Interface Adequacy
A. Constructs and Questions of ResQue                                              The recommender’s interface provides sufficient information.
                                                                                   The information provided for the recommended items is
The following contains the questionnaire statements that can be
                                                                                    sufficient for me.
used in a survey. They are developed based on the ResQue model
described in this paper. Users should be asked to indicate their                   The labels of the recommender interface are clear and
answers to each of the questions using the 1-5 Likert scales, where                 adequate.
1 indicates “strongly disagree” and 5 is “strongly agree.”                         The layout of the recommender interface is attractive and
                                                                                    adequate.*
A1. Quality of Recommended Items
A.1.1 Accuracy                                                                 A4. Perceived Ease of Use
    The items recommended to me matched my interests.*                        A.4.1 Ease of Initial Learning


              Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.
                       This volume is published and copyrighted by its editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.


                                                                          19
                                                                       FULL PAPER

         Proceedings of the ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces (UCERSTI),
                                                        Barcelona, Spain, Sep 30, 2010
                                Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf


   I became familiar with the recommender system very quickly.                      The recommender made me more confident about my
   I easily found the recommended items.                                             selection/decision.
   Looking for a recommended item required too much effort                          The recommended items made me confused about my choice
    (reverse scale).                                                                  (reverse scale).
                                                                                     The recommender can be trusted.
A.4.2 Ease of Preference Elicitation
   I found it easy to tell the system about my preferences.                     A8. Behavioral Intentions
   It is easy to learn to tell the system what I like.                          A.8.1 Intention to Use the System
                                                                                  If a recommender such as this exists, I will use it to find
   It required too much effort to tell the system what I like
                                                                                     products to buy.
    (reversed scale).
                                                                                 A.8.2 Continuance and Frequency
A.4.3 Ease of Preference Revision
                                                                                  I will use this recommender again.*
   I found it easy to make the system recommend different things                 I will use this type of recommender frequently.
    to me.                                                                        I prefer to use this type of recommender in the future.
   It is easy to train the system to update my preferences.
                                                                                 A.8.3 Recommendation to Friends
   I found it easy to alter the outcome of the recommended items
    due to my preference changes.                                                 I will tell my friends about this recommender.*
   It is easy for me to inform the system if I dislike/like the
                                                                                 A.8.4 Purchase Intention
    recommended item.
                                                                                  I would buy the items recommended, given the opportunity.*
   It is easy for me to get a new set of recommendations.
                                                                                 6. REFERENCES
                                                                                 [1] Adomavicius, G. and Tuzhilin, A. 2005. Toward the Next
A.4.4 Ease of Decision Making
                                                                                     Generation of Recommender Systems: A Survey of the State-
   Using the recommender to find what I like is easy.                               of-the-Art and Possible Extensions. IEEE Trans. Knowl.
                                                                                     Data Eng. 17(6), 734-749.
   I was able to take advantage of the recommender very quickly.
   I quickly became productive with the recommender.                            [2] Beenen, G., Ling, K., Wang, X., Chang, K., Frankowski, D.,
   Finding an item to buy with the help of the recommender is                       Resnick, P., et al. 2004. Using social psychology to motivate
    easy.*                                                                           contributions to online communities. In CSCW '04:
   Finding an item to buy, even with the help of the                                Proceedings of the ACM Conference On Computer
    recommender, consumes too much time.                                             Supported Cooperative Work. New York: ACM Press.
                                                                                 [3] Castagnos, S., Jones, N., and Pu, P. 2009. Recommenders'
A5. Perceived Usefulness                                                             Influence on Buyers' Decision Process. In proceedings of the
                                                                                     3rd ACM Conference on Recommender Systems (RecSys
   The recommended items effectively helped me find the ideal
                                                                                     2009), 361-364.
    product.*
   The recommended items influence my selection of products.                    [4] Chen, L. and Pu, P. 2006. Trust Building with Explanation
   I feel supported to find what I like with the help of the                        Interfaces. In Proceedings of International Conference on
    recommender.*                                                                    Intelligent User Interface (IUI’06), 93-100.
   I feel supported in selecting the items to buy with the help of              [5] Chen, L. and Pu, P. 2008. A Cross-Cultural User Evaluation
    the recommender.                                                                 of Product Recommender Interfaces. RecSys 2008, 75-82.
                                                                                 [6] Chen, L. and Pu, P. 2009. Interaction Design Guidelines on
A6. Control/Transparency                                                             Critiquing-based Recommender Systems. User Modeling and
   I feel in control of telling the recommender what I want.                        User-Adapted Interaction Journal (UMUAI), Springer
   I don’t feel in control of telling the system what I want.                       Netherlands, Volume 19, Issue3, 167-206.
   I don’t feel in control of specifying and changing my                        [7] Davis, F.D. 1989. Perceived usefulness, perceived ease of
    preferences (reverse scale).                                                     use, and user acceptance of information technology. MIS
   I understood why the items were recommended to me.                               Quart. 13 319-339.
   The system helps me understand why the items were
                                                                                 [8] Grabner-Kräuter, S. and Kaluscha, E.A. 2003. Empirical
    recommended to me.
                                                                                     research in on-line trust: a review and critical assessment Int.
   The system seems to control my decision process rather than
                                                                                     J. Hum.-Comput. Stud. (IJMMS) 58(6), 783-812.
    me (reverse scale).
                                                                                 [9] Herlocker, J.L., Konstan, J.A., Borchers, A., and Riedl, J. An
A7. Attitudes                                                                        algorithmic framework for performing collaborative filtering.
                                                                                     In Proc. of ACM SIGIR 1999, ACM Press (1999), 230-237.
   Overall, I am satisfied with the recommender.*
   I am convinced of the products recommended to me.*                           [10] Herlocker, J.L., Konstan, J.A., and Riedl, J. 2000. Explaining
   I am confident I will like the items recommended to me. *                         collaborative filtering recommendations. CSCW 2000, 241-
                                                                                      250.


                Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.
                         This volume is published and copyrighted by its editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.


                                                                            20
                                                                     FULL PAPER

         Proceedings of the ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces (UCERSTI),
                                                        Barcelona, Spain, Sep 30, 2010
                                Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf


[11] Herlocker, J.L., Konstan, J.A., Terveen, L.G., and Riedl, J.              [23] Pu, P., Chen, L., and Kumar, P. 2008. Evaluating Product
     2004. Evaluating collaborative filtering recommender                           Search and Recommender Systems for E-Commerce
     systems. ACM Trans. Inf. Syst. 22(1), 5-53.                                    Environments. Electronic Commerce Research Journal, 8(1-
[12] Hu, R. and Pu, P. Potential Acceptance Issues of Personality-                  2), June,1-27.
     based Recommender Systems. In Proceedings of ACM                          [24] Pu, P., Zhou, M., and Castagnos, S. 2009. Critiquing
     Conference on Recommender Systems (RecSys'09), New                             Recommenders for Public Taste Products. In proceedings of
     York City, NY, USA, October 22-25, 2009.                                       the 3rd ACM Conference on Recommender Systems
[13] Jones, N., and Pu, P. 2007. User Technology Adoption Issues                    (RecSys 2009), 249-252.
     in Recommender Systems. In Proceedings of Networking                      [25] Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. 2000.
     and Electronic Commerce Research Conference                                    Analysis of recommendation algorithms for e-commerce.
     (NAEC2007), 379-394.                                                           ACM Conference on Electronic Commerce, 158-167.
[14] Jones, N., Pu, P., and Chen, L. 2009. How Users Perceive                  [26] Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. 2001.
     and Appraise Personalized Recommendations. Proceedings                         Item-based collaborative filtering recommendation
     of User Modeling, Adaptation, and Personalization                              algorithms. WWW, 285-295.
     conference (UMAP09), 461-466.                                             [27] Simonson, I. 2005. Determinants of customers’ responses to
[15] Kirakowski, J. 1993. SUMI: the Software Usability                              customized offers: Conceptual framework and research
     Measurement Inventory. British Journal of Educational                          propositions. Journal of Marketing, 69 (January 2005), 32–45.
     Technology, 24 (3) 210-214.                                               [28] Sinha, R. and Swearingen, K. 2001. Comparing
[16] Lewis, J.R. 1993. IBM computer usability satisfaction                          Recommendations made by Online Systems and Friends.
     questionnaires: psychometric evaluation and instructions for                   Proceedings of the DELOS-NSF Workshop on
     use.                                                                           Personalization and Recommender Systems in Digital
[17] McGinty, L. and Smyth, B. On the role of diversity in                          Libraries, 2001.
     conversational recommender systems. In Proceedings of the                 [29] Stohr, E.A. and Viswanathan, S. 1999. Recommendation
     Fifth International Conference on Case-Based Reasoning                         systems: Decision support for the information economy.
     (ICCBR’03), 2003, 276-290.                                                     Emerging Information Technologies, K. E. Kendall, Ed.
[18] McNee, S.M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam,                   Thousand Oaks, CA: SAGE, 1999, 21-44.
     S.K., Rashid, A.M., Konstan, J.A. and Riedl, J. On the                    [30] Swearingen, K. and Sinha, R. 2002. Interaction design for
     Recommending of Citations for Research Papers. In Proc. of                     recommender systems. In Interactive Systems (DIS2002).
     ACM CSCW 2002, ACM Press (2002), 116-125.                                 [31] Tintarev, N. and Masthoff, J. 2007. Survey of explanations in
[19] McNee, S.M., Lam, S.K., Konstan, J.A., Riedal, J. 2003.                        recommender systems. ICDE Workshops 2007, 801-810.
     Interfaces for eliciting new user preferences in recommender              [32] Venkatesh,V., Morris, M.G., Davis, G.B. and Davis, F.D.
     systems. User Modeling 2003, 178-187.                                          2003. User acceptance of information technology: Toward a
[20] McNee, S.M., Riedl, J., and Konstan, J.A. 2006. Being                          unified view. MIS Quarterly, 2003, 27, 3, 425-478.
     accurate is not enough: How accuracy metrics have hurt                    [33] Ziegler, C.N., McNee, S.M., Konstan, J.A., and Lausen, G.,
     recommender systems. CHI Extended Abstracts 2006,1097-                         Improving Recommendation Lists through Topic
     1101.                                                                          Diversification. In Proc. of WWW 2005, ACM Press (2005),
[21] Nunnally, J. C. 1978. Psychometric Theory.                                     22-32.
[22] Ozok, A.A, Fan, Q., Norcio, A.F. 2010. Design guidelines
     for effective recommender system interfaces based on a
     usability criteria conceptual model: results from a college
     student population. Behaviour & Information Technology,
     Volume 29, Issue 1, 57 - 83.


              Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.
                       This volume is published and copyrighted by its editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.


                                                                          21