=Paper= {{Paper |id=None |storemode=property |title=Effects of Online Recommendations on Consumers' Willingness to Pay |pdfUrl=https://ceur-ws.org/Vol-893/paper7.pdf |volume=Vol-893 |dblpUrl=https://dblp.org/rec/conf/recsys/AdomaviciusBCZ12 }} ==Effects of Online Recommendations on Consumers' Willingness to Pay== https://ceur-ws.org/Vol-893/paper7.pdf
                                  Effects of Online Recommendations
                                   on Consumers’ Willingness to Pay
Gediminas Adomavicius                                 Jesse Bockstedt                           Shawn Curley                       Jingjing Zhang
       University of Minnesota                        University of Arizona                 University of Minnesota                Indiana University
          Minneapolis, MN                                 Tucson, AZ                           Minneapolis, MN                      Bloomington, IN
         gedas@umn.edu                           bockstedt@email.arizo                        curley@umn.edu                  jjzhang@indiana.edu
                                                        na.edu


ABSTRACT                                                                                 we extend these results and find strong evidence that these effects
We present the results of two controlled behavioral studies on the                       still exist with real recommendations generated by a live real-time
effects of online recommendations on consumers’ economic                                 recommender system.         The results of the second study
behavior. In the first study, we found strong evidence that                              demonstrate that errors in recommendation, a common feature of
participants’ willingness to pay was significantly affected by                           live recommender systems, can significantly impact the economic
randomly assigned song recommendations, even when controlling                            behaviors of consumers toward the recommended products.
for participants’ preferences and demographics. In the second
study, we presented participants with actual system-generated                            2. LITERATURE REVIEW AND
recommendations that were intentionally perturbed (i.e.,                                    HYPOTHESES
significant error was introduced) and observed similar effects on
willingness to pay. The results have significant implications for                        Behavioral research has indicated that judgments can be
the design and application of recommender systems as well as for                         constructed upon request and, consequently, are often influenced
e-commerce practice.                                                                     by elements of the environment. One such influence arises from
                                                                                         the use of an anchoring-and-adjustment heuristic (Tversky and
1. INTRODUCTION                                                                          Kahneman 1974; see review by Chapman and Johnson 2002), the
                                                                                         focus of the current study. Using this heuristic, the decision
Recommender systems have become commonplace in online                                    maker begins with an initial value and adjusts it as needed to
purchasing environments. Much research in information systems                            arrive at the final judgment. A systematic bias has been observed
and computer science has focused on algorithmic design and                               with this process in that decision makers tend to arrive at a
improving recommender systems’ performance (see Adomavicius                              judgment that is skewed toward the initial anchor.
& Tuzhilin 2005 for a review). However, little research has
explored the impact of recommender systems on consumer                                   Past studies have largely been performed using tasks for which a
behavior from an economic or decision-making perspective.                                verifiable outcome is being judged, leading to a bias measured
Considering how important recommender systems have become                                against an objective performance standard (e.g., see review by
in helping consumers reduce search costs to make purchase                                Chapman and Johnson 2002). In the recommendation setting, the
decisions, it is necessary to understand how online recommender                          judgment is a subjective preference and is not verifiable against an
systems influence purchases.                                                             objective standard. This aspect of the recommendation setting is
                                                                                         one of the task elements illustrated in Figure 1, where accuracy is
In this paper, we investigate the relationship between                                   measured as a comparison between the rating prediction and the
recommender systems and consumers’ economic behavior.                                    consumer’s actual rating, a subjective outcome. Also illustrated
Drawing on theory from behavioral economics, judgment and                                in Figure 1 is the feedback system involved in the use of
decision-making, and marketing, we hypothesize that online                               recommender systems. Predicted ratings (recommendations) are
recommendations 1 significantly pull a consumer’s willingness to                         systematically tied to the consumer’s perceptions of products.
pay in the direction of the recommendation. We test our                                  Therefore, providing consumers with a predicted “system rating”
hypotheses using two controlled behavioral experiments on the                            can potentially introduce anchoring biases that significantly
recommendation and sale of digital songs. In the first study, we                         influence their subsequent ratings of items.
find strong evidence that randomly generated recommendations
(i.e., not based on user preferences) significantly impact                               One of the few papers identified in the mainstream anchoring
consumers’ willingness to pay, even when we control for user                             literature that has looked directly at anchoring effects in
preferences for the song, demographic and consumption-related                            preference construction is that of Schkade and Johnson (1989).
factors, and individual level heterogeneity. In the second study,                        However, their work studied preferences between abstract,
                                                                                         stylized, simple (two-outcome) lotteries. This preference situation
                                                                                         is far removed from the more realistic situation that we address in
                                                                                         this work. More similar to our setting, Ariely et al. (2003)
1
  In this paper, for ease of exposition, we use the term “recommendations”               observed anchoring in bids provided by students participating in
in a broad sense. Any rating that the consumer receives purportedly from                 auctions of consumer products (e.g., wine, books, chocolates) in a
a recommendation system, even if negative (e.g., 1 star on a five-star                   classroom setting. However, participants were not allowed to
scale), is termed a recommendation of the system.                                        sample the goods, an issue we address in this study.
    Paper presented at the 2012 Decisions@RecSys workshop in conjunction with the
    6th ACM conference on Recommender Systems. Copyright © 2012 for the
    individual papers by the papers' authors. Copying permitted for private and
    academic purposes. This volume is published and copyrighted by its editors.




                                                                                    40
                                  Predicted Ratings (expressing recommendations for unknown items)




             Recommender System                                                                                 Consumer
                                                                  Accuracy                          (Preference formation / purchasing
         (Consumer preference estimation)
                                                                                                         behavior / consumption)




                                        Actual Ratings (expressing preferences for consumed items)

                         Figure 1. Ratings as part of a feedback loop in consumer-recommender interactions.


Very little research has explored how the cues provided by                     Hypothesis 1: Participants exposed to randomly generated
recommender systems influence online consumer behavior.                        artificially high (low) recommendations for a product will
Cosley et al. (2003) dealt with a related but significantly different          exhibit a higher (lower) willingness to pay for that product.
anchoring phenomenon in the context of recommender systems.
They explored the effects of system-generated recommendations                A common issue for recommender systems is error (often
on user re-ratings of movies. They found that users showed high              measured by RMSE) in predicted ratings. This is evidenced by
test-retest consistency when being asked to re-rate a movie with             Netflix’s recent competition for a better recommendation
no prediction provided. However, when users were asked to re-                algorithm with the goal of reducing prediction error by 10%
                                                                             (Bennet and Lanning 2007). If anchoring biases can be generated
rate a movie while being shown a “predicted” rating that was
altered upward or downward from their original rating by a single            by recommendations, then accuracy of recommender systems
fixed amount of one rating point (providing a high or a low                  becomes all the more important. Therefore, we wish to explore
anchor), users tended to give higher or lower ratings, respectively          the potential anchoring effects introduced when real
(compared to a control group receiving accurate original ratings).           recommendations (i.e., based on the state-of-the-art recommender
This showed that anchoring could affect consumers’ ratings based             systems algorithms) are erroneous.         We hypothesize that
on preference recall, for movies seen in the past and now being              significant errors in real recommendations can have similar effects
evaluated.                                                                   on consumers’ behavior as captured by their willingness to pay for
                                                                             products.
Adomavicius et al. (2011) looked at a similar effect in an even
more controlled setting, in which the consumer preference ratings              Hypothesis 2: Participants exposed to a recommendation
for items were elicited at the time of item consumption. Even                  that contains significant error in an upward (downward)
without a delay between consumption and elicited preference,                   direction will exhibit a higher (lower) willingness to pay for
anchoring effects were observed. The predicted ratings, when                   the product.
perturbed to be higher or lower, affected the consumer ratings to            We test these hypotheses with two controlled behavioral
move in the same direction. The effects on consumer ratings are              studies, discussed next.
potentially important for a number of reasons, e.g., as identified
by Cosley et al. (2003): (1) Biases can contaminate the inputs of
the recommender system, reducing its effectiveness. (2) Biases               3. STUDY 1: RECOMMENDATIONS AND
can artificially improve the resulting accuracy, providing a                    WILLINGNESS-TO-PAY
distorted view of the system’s performance. (3) Biases might
                                                                             Study 1 was designed to test Hypothesis 1 and establish whether
allow agents to manipulate the system so that it operates in their
                                                                             or not randomly generated recommendations could significantly
favor. Therefore, it is an important and open research question as
                                                                             impact a consumer’s willingness to pay.
to the direct effects of recommendations on consumer behavior.
However, in addition to the preference formation and                         3.1.    Procedure
consumption issues, there is also the purchasing decision of the             Both studies presented in this paper were conducted using the
consumer, as mentioned in Figure 1. Aside from the effects on                same behavioral research lab at a large public North American
ratings, there is the important question of the possibility of               university, and participants were recruited from the university’s
anchoring effects on economic behavior. Hence, the primary                   research participant pool. Participants were paid a $10 fee plus a
focus of this research is to determine how anchoring effects                 $5 endowment that was used in the experimental procedure
created by online recommendations impact consumers’ economic                 (discussed below). Summary statistics on the participant pool for
behavior as measured by their willingness to pay. Based on the               both Study 1 and Study 2 are presented in Table 1. Seven
prior research, we expect there to be similar effects on economic            participants were dropped from Study 1 because of response
behavior as observed with consumer ratings and preferences.                  issues, leaving data on 42 participants for analysis.
Specifically, we first hypothesize that recommendations will
significantly impact consumers’ economic behavior by pulling                 The experimental procedure for Study 1 consisted of three main
their willingness to pay in the direction of the recommendation,             tasks, all of which were conducted on a web-based application
regardless of the accuracy of the recommendation.                            using personal computers with headphones and dividers between




                                                                        41
participants. In the first task, participants were asked to provide                To capture willingness to pay, we employed the incentive-
ratings for at least 50 popular music songs on a scale from one to                 compatible     Becker-DeGroot-Marschack         method      (BDM)
five stars with half-star increments. The songs presented for the                  commonly used in experimental economics (Becker et al. 1984).
initial rating task were randomly selected from a pool of 200                      For each song presented during the third task of the study,
popular songs, which was generated by taking the songs ranked in                   participants were asked to declare a price they were willing to pay
the bottom half of the year-end Billboard 100 charts from 2006                     between zero and 99 cents. Participants were informed that five
and 2009. 2 For each song, the artist name(s), song title, duration,               songs selected at random at the end of the study would be
album name, and a 30-second sample were provided. The                              assigned random prices, based on a uniform distribution, between
objective of the song-rating task was to capture music preferences                 one and 99 cents. For each of these five songs, the participant
from the participants so that recommendations could later be                       was required to purchase the song using money from their $5
generated using a recommendation algorithm (in Study 2 and                         endowment at the randomly assigned price if it was equal to or
post-hoc analysis of Study 1, as discussed later).                                 below their declared willingness to pay. Participants were
                                                                                   presented with a detailed explanation of the BDM method so that
             Table 1 Participant summary statistics.
                                                                                   they understood that the procedure incentivizes accurate reporting
                                                                                   of their prices, and were required to take a short quiz on the
                                    Study 1                Study 2                 method and endowment distribution before starting the study.

# of Participants (n)                  42                     55                   At the conclusion of the study, they completed a short survey
                                                                                   collecting demographic and other individual information for use
Average Age (years)               21.5 (1.95)             22.9 (2.44)              in the analyses. The participation fee and the endowment minus
                                                                                   fees paid for the required purchases were distributed to
                                  28 Female,              31 Female,               participants in cash. MP3 versions of the songs purchased by
Gender
                                   14 Male                 24 Male                 participants were “gifted” to them through Amazon.com
Prior experience with                                                              approximately within 12 hours after the study was concluded.
                                 50% (21/42)            47.3% (26/55)
recommender systems
                               36 undergrad, 6          27 undergrad,
                                                                                   3.2. Analysis and Results
Student Level                                                                      We start by presenting a plot of the aggregate means of
                                    grad               25 grad, 3 other
Buy new music at least                                                             willingness to pay for each of the treatment groups in Figure 2.
                                66.7% (28/42)           63.6% (35/55)              Note that, although there were three treatment groups, the actual
once a month
Own more than 1000                                                                 ratings shown to the participants were randomly assigned star
                                 50% (21/42)            47.3% (26/55)              ratings from within the corresponding treatment group range (low:
songs
                                                                                   1.0-2.0 stars, mid: 2.5-3.5 stars, high: 4.0-5.0 stars).
                                                                                   As an initial analysis, we performed a repeated measure ANOVA,
In the second task, a different list of songs was presented (with the
                                                                                   as shown in Table 2, demonstrating a statistically significant
same information for each song as in the first task) from the same
                                                                                   effect of the shown rating on willingness to pay. Since the overall
set of 200 songs. For each song, the participant was asked
                                                                                   treatment effect was significant, we followed with pair-wise
whether or not they owned the song. Songs that were owned were
                                                                                   contrasts using t-tests across treatment levels and against the
excluded from the third task, in which willingness-to-pay
                                                                                   control group as shown in Table 3. All three treatment conditions
judgments were obtained. When the participants identified at
                                                                                   significantly differed, showing a clear, positive effect of the
least 40 songs that they did not own, the third task was initiated.
                                                                                   treatment on economic behavior.
In the third main task of Study 1, participants completed a within-
                                                                                   To provide additional depth for our analysis, we used a panel data
subjects experiment where the treatment was the star rating of the
                                                                                   regression model to explore the relationship between the shown
song recommendation and the dependent variable was willingness
                                                                                   star rating (continuous variable) and willingness to pay, while
to pay for the songs. In the experiment, participants were
                                                                                   controlling for participant level factors. A Hausman test was
presented with 40 songs that they did not own, which included a
                                                                                   conducted, and a random effects model was deemed appropriate,
star rating recommendation, artist name(s), song title, duration,
                                                                                   which also allowed us to account for participant level covariates
album name, and a 30 second sample for each song. Ten of the 40
                                                                                   in the analysis. The dependent variable, i.e., willingness to pay,
songs were presented with a randomly generated low
                                                                                   was measured on an integer scale between 0 and 99 and skewed
recommendation between one and two stars (drawn from a
                                                                                   toward the lower end of the scale. This is representative of typical
uniform distribution; all recommendations were presented with a
                                                                                   count data; therefore, a Poisson regression was used
one decimal place precision, e.g., 1.3 stars), ten were presented
                                                                                   (overdispersion was not an issue). The main independent variable
with a randomly generated high recommendation between four
                                                                                   was the shown star rating of the recommendation, which was
and five stars, ten were presented with a randomly generated mid-
                                                                                   continuous between one and five stars. Control variables for
range recommendation between 2.5 and 3.5 stars, and ten were
                                                                                   several demographic and consumer-related factors were included,
presented with no recommendation to act as a control. The 30
                                                                                   which were captured in the survey at the end of the study.
songs presented with recommendations were randomly ordered,
                                                                                   Additionally, we controlled for the participants’ preferences by
and the 10 control songs were presented last.
                                                                                   calculating an actual predicted star rating recommendation for
                                                                                   each song (on a 5 star scale with one decimal precision), post hoc,
                                                                                   using the popular and widely-used item-based collaborative

2
  The Billboard 100 provides a list of popular songs released in each year.
The top half of each year’s list was not used to reduce the number of songs
in our database that participants would already own.




                                                                              42
filtering algorithm (IBCF) (Sarwar et al. 2001). 3 By including                     (interval five point scale) for the music genres country, rock, hip
this predicted rating (which was not shown to the participant                       hop, and pop, the number of songs owned (interval five point
during the study) in the analysis, we are able to determine if the                  scale), frequency of music purchases (interval five point scale),
random recommendations had an impact on willingness to pay                          whether they thought recommendations in the study were accurate
above and beyond the participant’s predicted preferences.                           (interval five point scale), and whether they thought the
                                                                                    recommendations were useful (interval five point scale). The
                                                                                    composite error term (ui + εij) includes the individual participant
                                                                                    effect ui and the standard disturbance term εij.
                                                                                       log(WTPij)= b0 + b1(ShownRatingij)+ b2(PredictedRatingij) +
                                                                                                         b3(Controlsi) + ui + εij
                                                                                    The results of the regression are shown in Table 4. Note that the
                                                                                    control observations were not included, since they had null values
                                                                                    for the main dependent variable ShownRating.
                                                                                    The results of our analysis for Study 1 provide strong support for
                                                                                    Hypothesis 1 and demonstrate clearly that there is a significant
                                                                                    effect of recommendations on consumers’ economic behavior.
                                                                                    Specifically, we have shown that even randomly generated
                                                                                    recommendations with no basis on user preferences can impact
                                                                                    consumers’ perceptions of a product and, thus, their willingness to
                                                                                    pay. The regression analysis goes further and controls for
                                                                                    participant level factors and, most importantly, the participant’s
                 Figure 2. Study 1 treatment means.                                 predicted preferences for the product being recommended. As can
                                                                                    be seen in Table 4, after controlling for all these factors, a one unit
           Table 2. Study 1 repeated measures ANOVA.                                change in the shown rating results in a 0.168 change (in the same
                 Partial                                                            direction) in the log of the expected willingness to pay (in cents).
                               Degrees of        Mean     F        P                As an example, assuming a consumer has a willingness to pay of
                 Sum of
                               Freedom          Square Statistic value              $0.50 for a specific song and is given a recommendation,
                 Squares
                                                                                    increasing the recommendation star rating by one star would
Participant     396744.78           41           9676.70                            increase the consumer’s willingness to pay to $0.59.
Treatment                                                                                          Table 4. Study 1 regression results
                 24469.41            2         12234.70      42.27    <0.000
Level                                                                                        Dependent Variable: log(Willingness to Pay)
Residual        346142.41          1196           289.42                              Variable                        Coefficient    Std. Error
                                                                                      ShownRating                         0.168***     0.004
                                                                                      PredictedRating                     0.323***     0.015
Total           762747.40          1239           615.62
                                                                                      Controls
                                                                                      male                               -0.636**      0.289
                                                                                      undergrad                        -0.142          0.642
    Table 3. Comparison of aggregate treatment group means
                                                                                      age                               -0.105         0.119
                         with t-tests.
                                                                                      usedRecSys                         -0.836**      0.319
                               Control              Low               Mid             country                            0.103         0.108
Low (1-2 Star)                  4.436***                                              rock                               0.125         0.157
Mid (2.5-3.5 Star)              0.555             4.075***                            hiphop                             0.152         0.132
High (4-5 Star)                 1.138             5.501***           1.723**          pop                                0.157         0.156
* p<0.1, ** p<0.05, *** p <0.01                                                       recomUseful                      -0.374          0.255
2-tailed t-test for Control vs. Mid, all else 1-tailed.                               recomAccurate                       0.414*       0.217
                                                                                      buyingFreq                       -0.180          0.175
The resulting Poisson regression model is shown below, where                          songsOwned                        -0.407*        0.223
WTPij is the reported willingness to pay for participant i on song j,                 constant                           4.437         3.414
ShownRatingij is the recommendation star rating shown to                              Number of Obs.                      1240
participant i for song j, PredictedRatingij is the predicted                          Number of Participants                 42
recommendation star rating for participant i on song j, and                           Log-likelihood               -9983.3312
Controlsi is a vector of demographic and consumer-related                             Wald Chi-Square Statistic       1566.34
variables for participant i. The controls included in the model                        (p-value)                      (0.0000)
were gender (binary), age (integer), school level (undergrad                                            * p<0.1, ** p<0.05, *** p <0.01
yes/no binary), whether they have prior experience with
recommendation systems (yes/no binary), preference ratings
                                                                                    4. STUDY 2: ERRORS IN
                                                                                       RECOMMENDATIONS
3
  Several recommendation algorithms were evaluated based on the Study
1 training data, and IBCF was selected for us in both studies because it            The goal of Study 2 was to extend the results of Study 1 by testing
had the highest predictive accuracy.                                                Hypothesis 2 and exploring the impact of significant error in true




                                                                               43
recommendations on consumers’ willingness to pay.             As              from the model used in Study 1 is the inclusion of Perturbationij
discussed below, the design of this study is intended to test for             (i.e., the error introduced for the recommendation of song j to
similar effects as Study 1, but in a more realistic setting with              participant i) as the main independent variable. The results are
recommender-system-generated, real-time recommendations..                     presented in Table 5.

4.1. Procedure                                                                 log(WTPij)= b0 + b1(Perturbationij)+ b2(PredictedRatingij)
Participants in Study 2 used the same facilities and were recruited                            + b3(Controlsi ) + ui + εij
from the same pool as in Study 1; however, there was no overlap               The results of Study 2 provide strong support for Hypothesis 2
in participants across the two studies. The same participation fee            and extend the results of Study 1 in two important ways. First,
and endowment used in Study 1 was provided to participants in                 Study 2 provides more realism to the analysis, since it utilizes real
Study 2. 15 participants were removed from the analysis in Study              recommendations generated using an actual real-time
2 because of issues in their responses, leaving data on 55                    recommender system. Second, rather than randomly assigning
participants for analysis.                                                    recommendations as in Study 1, in Study 2 the recommendations
Study 2 was also a within-subjects design with perturbation of the            presented to participants were calculated based on their
recommendation star rating as the treatment and willingness to                preferences and then perturbed to introduce realistic levels of
pay as the dependent variable. The main tasks for Study 2 were                system error. Thus, considering the fact that all recommender
virtually identical to those in Study 1. The only differences                 systems have some level of error in their recommendations, Study
between the studies were the treatments and the process for                   2 contributes by demonstrating the potential impact of these
assigning stimuli to the participants in the recommendation task of           errors. As seen in Table 5, while controlling for preferences and
the study. In Study 2, all participants completed the initial song-           other factors, a one unit perturbation in the actual rating results in
rating and song ownership tasks as in Study 1. Next, real song                a 0.115 change in the log of the participant’s willingness to pay.
recommendations were calculated based on the participants’                    As an example, assuming a consumer has a willingness to pay of
preferences, which were then perturbed (i.e., excess error was                $0.50 for a given song, perturbing the system’s recommendation
introduced to each recommendation) to generate the shown                      positively by one star would increase the consumer’s willingness
recommendation ratings. In other words, unlike Study 1 in which               to pay to $0.56.
random recommendations were presented to participants, in Study                              Table 5. Study 2 regression results.
2 participants were presented with perturbed versions of their
actual personalized recommendations. Perturbations of -1.5 stars,                      Dependent Variable: log(Willingness to Pay)
-1 star, -0.5 stars, 0 stars, +0.5 stars, +1 star, and +1.5 stars were         Variable                         Coefficient Std. Error
added to the actual recommendations, representing seven                        Perturbation                         0.115***     0.005
treatment levels. The perturbed recommendation shown to the                    PredictedRating                      0.483***     0.012
participant was constrained to be between one and five stars,                  Controls
therefore perturbations were pseudo-randomly assigned to ensure                male                               -0.045         0.254
that the sum of the actual recommendation and the perturbation                 undergrad                          -0.092         0.293
would fit within the allowed rating scale. The recommendations                 age                                -0.002         0.053
were calculated using the item-based collaborative filtering                   usedRecSys                          0.379         0.253
(IBCF) algorithm (Sarwar et al. 2001), and the ratings data from               country                            -0.056         0.129
Study 1 was used as training data.                                             rock                               -0.132         0.112
                                                                               hiphop                             0.0137         0.108
Each participant was presented with 35 perturbed, personalized                 pop                                -0.035         0.124
song recommendations, five from each of the seven treatment                    recomUseful                         0.203*        0.112
levels. The song recommendations were presented in a random                    recomAccurate                       0.060         0.161
order. Participants were asked to provide their willingness to pay             buyingFreq                           0.276**      0.128
for each song, which was captured using the same BDM                           songsOwned                         -0.036         0.156
technique as in Study 1. The final survey, payouts, and song                   constant                            0.548         1.623
distribution were also conducted in the same manner as in Study                Number of Obs.                       1925
1.                                                                             Number of Participants                 55
                                                                               Log-likelihood                -16630.547
4.2. Analysis and Results                                                      Wald Chi-Square Statistic        2374.72
For Study 2, we focus on the regression analysis to determine the               (p-value)                       (0.0000)
relationship between error in a recommendation and willingness                                   * p<0.1, ** p<0.05, *** p <0.01
to pay. We follow a similar approach as in Study 1 and model
this relationship using a Poisson random effects regression model.
The distribution of willingness to pay data in Study 2 was similar            5. CONCLUSIONS
to that of Study 1, overdispersion was not an issue, and the results
of a Hausman test for fixed versus random effects suggested that a            Study 1 provided strong evidence that willingness to pay can be
random effects model was appropriate. We control for the                      affected by online recommendations through a randomized trial
participants’ preferences using the predicted rating for each song            design. Participants presented with random recommendations
in the study (i.e., the recommendation rating prior to                        were influenced even when controlling for demographic factors
perturbation), which was calculated using the IBCF algorithm.                 and general preferences. Study 2 extended these results to
Furthermore, the same set of control variables used in Study 1 was            demonstrate that the same effects exist for real recommendations
included in our regression model for Study 2. The resulting                   that contain errors, which were calculated using the state-of-the-
regression model is presented below, where the main difference                art recommendation algorithms used in practice.




                                                                         44
There are significant implications of the results presented. First,            of-the-Art and Possible Extensions. IEEE Transactions on
the results raise new issues on the design of recommender                      Knowledge and Data Engineering, 17 (6) pp. 734-749.
systems. If recommender systems can generate biases in                     [3] Ariely, D., Lewenstein, G., and Prelec, D. 2003. “Coherent
consumer decision-making, do the algorithms need to be adjusted                arbitrariness”: Stable demand curves without stable
to compensate for such biases? Furthermore, since recommender                  preferences, Quarterly Journal of Economics (118), pp. 73-
systems use a feedback loop based on consumer purchase                         105.
decisions, do recommender systems need to be calibrated to
handle biased input? Second, biases in decision-making based on            [4] Becker G.M., DeGroot M.H., and Marschak J. 1964.
online recommendations can potentially be used to the advantage                Measuring utility by a single-response sequential method.
of e-commerce companies, and retailers can potentially become                  Behavioral Science, 9 (3) pp. 226–32.
more strategic in their use of recommender systems as a means of           [5] Bennet, J. and Lanning, S. 2007. The Netflix Prize. KDD
increasing profit and marketing to consumers. Third, consumers                 Cup and Workshop. [www.netflixprize.com].
may need to become more cognizant of the potential decision
                                                                           [6] Chapman, G. and Johnson, E. 2002. Incorporating the
making biases introduced through online recommendations. Just
                                                                               irrelevant: anchors in judgments of belief and value.
as savvy consumers understand the impacts of advertising,
                                                                               Heuristics and Biases: The Psychology of Intuitive Judgment,
discounting, and pricing strategies, they may also need to consider
                                                                               T. Gilovich, D. Griffin and D. Kahneman (eds.), Cambridge
the potential impact of recommendations on their purchasing
                                                                               University Press, Cambridge, pp. 120-138.
decisions.
                                                                           [7] Cosley, D., Lam, S., Albert, I., Konstan, J.A., and Riedl, J.
                                                                               2003. Is seeing believing? How recommender interfaces
6. ACKNOWLEDGMENT                                                              affect users’ opinions.     CHI 2003 Conference, Fort
This work is supported in part by the National Science Foundation              Lauderdale FL.
grant IIS-0546443.                                                         [8] Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. 2001.
                                                                               Item-based collaborative filtering algorithms. 10th Annual
REFERENCES                                                                     World Wide Web Conference (WWW10), May 1-5, Hong
                                                                               Kong.
[1] Adomavicius, G., Bockstedt, J., Curley, S., and Zhang, J.
    2011. Recommender Systems, Consumer Preferences, and                   [9] Schkade, D.A. and Johnson, E.J. 1989. Cognitive processes
    Anchoring Effects.   Proceedings of the RecSys 2011                        in preference reversals.   Organizational Behavior and
                                                                               Human Decision Processes, (44), pp. 203-231.
    Workshop on Human Decision Making in Recommender
    Systems (Decisions@RecSys 2011), Chicago IL, October 27,               [10] Tversky, A., and Kahneman, D. 1974. Judgment under
    pp. 35-42.                                                                  uncertainty: Heuristics and biases. Science, (185), pp. 1124-
                                                                                1131.
[2] Adomavicius, G. and Tuzhilin, A. 2005. Towards the Next
    Generation of Recommender Systems: A Survey of the State-




                                                                      45