=Paper=
{{Paper
|id=Vol-2903/IUI21WS-HUMANIZE-3
|storemode=property
|title=Addressing Present Bias in Movie Recommender Systems and Beyond
|pdfUrl=https://ceur-ws.org/Vol-2903/IUI21WS-HUMANIZE-3.pdf
|volume=Vol-2903
|authors=Kai Lukoff
|dblpUrl=https://dblp.org/rec/conf/iui/Lukoff21
}}
==Addressing Present Bias in Movie Recommender Systems and Beyond==
<pdf width="1500px">https://ceur-ws.org/Vol-2903/IUI21WS-HUMANIZE-3.pdf</pdf>
<pre>
Addressing Present Bias in Movie
Recommender Systems and Beyond
Kai Lukoffa
a University of Washington, Seattle, WA, USA


                                     Abstract
                                     Present bias leads people to choose smaller immediate rewards over larger rewards in the future. Recom-
                                     mender systems often reinforce present bias because they rely predominantly upon what people have
                                     done in the past to recommend what they should do in the future. How can recommender systems over-
                                     come this present bias to recommend items in ways that match with users’ aspirations? Our workshop
                                     position paper presents the motivation and design for a user study to address this question in the domain
                                     of movies. We plan to ask Netflix users to rate movies that they have watched in the past for the long-
                                     term rewards that these movies provided (e.g., memorable or meaningful experiences). We will then
                                     evaluate how well long-term rewards can be predicted using existing data (e.g., movie critic ratings). We
                                     hope to receive feedback on this study design from other participants at the HUMANIZE workshop and
                                     spark conversations about ways to address present bias in recommender systems.

                                     Keywords
                                     present bias, cognitive bias, algorithmic bias, recommender systems, digital wellbeing, movies


1. Introduction                                                       such as Schindler’s List [2].
                                                                        Recommender systems (RS), algorithmic
People often select smaller immediate re- systems that predict the preference a user
wards over larger rewards in the future, a would give to an item, often reinforce present
phenomenon that is known as present bias bias. Today, the dominant paradigm of rec-
or time discounting. This applies to deci- ommender systems is behaviorism: recom-
sions such as what snack to eat [1, 2], how mendations are selected based on behavior
much to save for retirement [3], or which traces (“what users do”) and they largely ne-
movies to watch [2]. For example, when peo- glect to capture explicit preferences (“what
ple choose a movie to watch this evening users say”) [4]. Since “what users do” reflects
they often choose guilty pleasures like The a present bias, RS that rely upon such actions
Fast and The Furious, which are enjoyable in- to train their recommendations will priori-
the-moment, but then quickly forgotten. By tize items that offer high short-term rewards
contrast, when they choose a movie to watch but low long-term rewards. In this way, rec-
next week, they are more likely to choose ommender systems may reinforce what the
films that are challenging but meaningful, current self wants rather than helping peo-
                                                                      ple reach their ideal self [5].
Joint Proceedings of the ACM IUI 2021 Workshops, April
                                                                        This position paper for the HUMANIZE
13-17, 2021, College Station, USA                                     workshop   proposes a study design to address
" kai1@uw.edu (K. Lukoff)                                             these topics in the domain of movies. In
~ https://kailukoff.com (K. Lukoff)                                   Study 1, a survey of Netflix users, we in-
 0000-0001-5069-6817 (K. Lukoff)
          © 2020 Copyright for this paper by its authors. Use permit- vestigate: How should a RS make recom-
          ted under Creative Commons License Attribution 4.0 Inter-
          national (CC BY 4.0).                                       mendations by asking ordinary users about
          CEUR            Workshop                Proceedings
 CEUR
               http://ceur-ws.org
                                                                      the rather academic concept of "long-term
                                    (CEUR-WS.org)
 Workshop      ISSN 1613-0073
 Proceedings
rewards"? And can long-term rewards be           to transform their behavior [9]. Lukoff et
predicted based on existing data (e.g., movie    al. previously explored how experience sam-
critic ratings)? In Study 2, a participatory de- pling can be used to measure how meaning-
sign exercise with a movie RS, we ask: How       ful people find their interactions with smart-
do users want a RS to balance short-term         phone apps immediately after use [10]. How-
and long-term rewards? And what controls         ever, building such reflection into RSs re-
would users like to have over such a RS?         mains a major challenge because it is un-
   We expect that our eventual findings will     clear how and when a system should ask a
also inform the design of recommender sys-       user about the "long-term rewards" of an ex-
tems that address present bias in other do-      perience. It may be that the common ap-
mains such as news, food, and fitness.           proach of asking users to rate items on a "5-
                                                 star" scale reflects a combination of short-
                                                 term and long-term rewards, and that a dif-
2. Related Work                                  ferent prompt is required to capture evalua-
                                                 tions of long-term rewards more specifically.
The social psychologist Daniel Kahneman
                                                 It is also an open question how well such
describes people as having both an experi-
                                                 long-term rewards can be inferred from ex-
encing self, who prefers short-term rewards
                                                 isting data.
like pleasure, and a remembering self, who
                                                    The third perspective leverages the wis-
prefers long-terms rewards like meaningful
                                                 dom of the crowd, by using the collective
experiences [6]. Lyngs et al. describe three
                                                 elicited preferences of similar users with more
different approaches to the thorny question
                                                 experience to make recommendations. Rec-
of how to measure a user’s “true preferences"
                                                 ommender systems today tend to use the "be-
[5]. The first approach aligns with the expe-
                                                 havior of the crowd" as input into their mod-
riencing self, the second with the remember-
                                                 els, in the form of behavioral data of similar
ing self, and the third with the wisdom of the
                                                 users, but largely neglect elicited preferences
crowd.
                                                 [4].
   The first approach follows the experienc-
                                                    Finally, Ekstrand and Willemsen propose
ing self and asserts that what users do is what
                                                 participatory design as a general corrective
they really want, which many in the Silicon
                                                 to the behaviorist bias of recommender sys-
Valley push one step further to what we can
                                                 tems [4]. Harambam et al. explored using
get users to do is what they really want [7].
                                                 participatory methods to evaluate a recom-
Social media that are financed by advertising
                                                 mender system for news, suggesting that giv-
are "compelled to find ways to keep users en-
                                                 ing users control might mitigate filter bub-
gaged for as long as possible" [8]. To achieve
                                                 bles in news consumption [11]. Participa-
this, social media services often give the ex-
                                                 tory design is a promising way to investigate
periencing self exactly what it wants, know-
                                                 how users want a RS to balance short-term
ing that it will override the preferences of the
                                                 and long-term rewards and the controls they
remembering self and lead the user to stay
                                                 would like to have.
engaged for longer than they had intended.
   The second approach prioritizes the re-
membering self, calling for systems to 3. Proposed Study Design
prompt the user to reflect on their ideal self.
In this vein, Slovak et al. propose to design In what follows, we propose a study design to
to help users to reflect upon how they wish better understand how to measure the long-
term rewards of items in the context of movie        • For long-term rewards: How reward-
recommendations. We hope to receive feed-              ing was this movie after you watched
back on this study design from other par-              it?
ticipants at the HUMANIZE workshop and
prompt conversations about ways to address       Participants will rate all questions on a 1-5
present bias in recommender systems more         scale, from "Not at all" to "Very."
broadly.                                            We are also interested in understanding
                                                 what other constructs are correlated with the
                                                 both short-term and long-term rewards. To
3.1. Study 1: Eliciting user                     this end, we are also asking about related
     preferences for long-term                   constructs, such as:
     rewards
                                                     • How enjoyable was this movie while
Our first study is a survey of Netflix users           you were watching it?
that we are currently piloting, which ad-
dresses two research questions:                      • How interesting was this movie while
                                                       you were watching it?
    • RQ 1a: What wording should be used
      to ask users to rate the “short-term re-       • How meaningful was this movie after
      wards” and “long-term rewards” of a              you watched it?
      movie? In other words, what wording            • How memorable was this movie after
      captures the right construct and makes           you watched it?
      sense to users?
                                                    We are currently piloting all questions us-
    • RQ 1b: How well can a recommender          ing a talk-aloud protocol in which partici-
      system predict the long-term rewards       pants explain their thinking to us as they
      of a movie for an individual using data    complete the survey. We are checking to
      other than explicit user ratings?          make sure that the wording makes sense to
                                                 participants and to identify constructs that
3.1.1. Study 1 Methods                           are related to short-term and long-term re-
We will ask Netflix users who watch at least     wards. The constructs that are most closely
one movie per month to download their past       related to these rewards in the piloting will
viewing history and share it with us. We will    be included in the final survey, so that each
ask them to rate 30 movies: 10 watched in        movie will be rated for a cluster of constructs
the past year, 10 watched 1-2 years ago, and     all related to short-term and long-term re-
10 watched 2-3 years ago. Participants will      wards. For the final survey, we plan to recruit
rate each movie for short-term reward, long-     about 50 Netflix users to generate a total of
term reward, and other constructs that might     1,500 movie ratings.
be correlated with these rewards (e.g., mean-
ingfulness, memorability).                       3.1.2. Study 1 Planned Analysis
   The current wording of our questions is:      To address RQ 1a, we will report the qual-
    • For short-term rewards: How reward-        itative results of our talk-aloud piloting of
      ing was this movie while you were          the survey wording. We will also report the
      watching it?                               correlation between how participants rated
our measures of short-term and long-term re-           • RQ 2a: What preferences do users
wards and related constructs (e.g., meaning-             have for how a movie RS should weight
fulness, memorability).                                  the short-term versus long-term re-
   To address RQ 1b, first we will test how              wards of the movies it recommends? In
well existing data correlates with long-term             what contexts would users prefer what
rewards. The existing data we plan to test in-           weights?
cludes: user ratings (from others), critic rat-
ings, box office earnings, genre, and the day          • RQ 2b: How would users like to con-
of the week the movie was watched. One                   trol how a movie RS weighs the short-
notable limitation of our study design here              term versus long-term rewards of the
is that we will not have access to behavioral            movies it recommends?
data like clicks, views, or time spent about       At the workshop, we hope to elicit feedback
movies.                                            on how Study 2 might be revised to best an-
   Second, we will create machine learning         swer our research questions.
models to test how well we can predict the
long-term rewards of a movie might provide
                                                   3.2.1. Study 2 Methods
for an individual, as assessed by metrics like
precision and recall. Specifically, we will cre-   Our exercise will begin by showing partic-
ate:                                               ipants a set of recommended movies that
                                                   is heavily weighted towards short-term re-
    • A generalized model that makes the           wards, based on the ratings that we obtained
      same predictions for each movie for all      in Study 1. Then we will show them a set
      participants;                                of recommendations that is heavily weighted
                                                   towards long-term rewards. We will ask
    • A personalized model that makes indi-
                                                   participants to describe which recommenda-
      vidualized predictions for each movie
                                                   tions they would prefer and why. We will
      for each participant;
                                                   also ask about which contexts (e.g., mood,
Finally, we will also create generalized and       day of the week) affect which types of re-
personalized models that predict short-terms       wards they would prefer.
rewards too, and compare their performance            Next, we will solicit feedback from users
against the models that predict long-term re-      on a paper prototype of a RS that offers
wards. Our suspicion is that existing data         users control over recommendations at the
may be more predictive of short-term re-           input, process, and output stages. For in-
wards than long-term rewards, because long-        stance, at the input stage, users could indi-
term rewards may require a form of reflection      cate their general preferences for short-term
that most existing data do not capture.            versus long-term rewards. At the process
                                                   stage, users could choose from different "al-
                                                   gorithmic personas" to filter their recommen-
3.2. Study 2: Addressing present                   dations, e.g., "the guilty pleasure watcher"
     bias via user control                         or "the classic movie snob." At the output
     mechanisms                                    stage, users might control the order in which
                                                   recommendations are sorted. This exercise
Study 2 is currently planned as a participa-
                                                   draws from the study design in Harambam
tory design exercise with a movie recom-
                                                   et al., in which users participants evaluated
mender system, in which we ask:
                                                   and described the control mechanisms they
would like to have based on a prototype of a      to address cognitive biases in intelligent user
news recommender system, with a focus on          interfaces [13].
addressing the bias of filter bubbles [11].

                                                  Acknowledgments
4. Workshop relevance
                                                  Thank you to Minkyong Kim, Ulrik Lyngs,
Today’s recommender systems often priori-         David McDonald, Sean Munson, and Alexis
tize “what users do” and neglect “what users      Hiniker for feedback on this research agenda
say.” As a result, they tend to reinforce the     and drafts of the position paper.
current self rather than foster the ideal self.
This study design proposes to study this with
regards to movies. But the same problem           References
also applies to other domains such as online
                                                   [1] M. K. Lee, S. Kiesler, J. Forlizzi, Min-
groceries, where the current self might want
                                                       ing behavioral economics to design per-
cookies while the ideal self wants blueber-
                                                       suasive technology for healthy choices,
ries, or digital news, where the current self
                                                       in: Proceedings of the SIGCHI Confer-
might want to read stories that agree with
                                                       ence on Human Factors in Computing
their worldview while the ideal self wants to
                                                       Systems, CHI ’11, ACM, New York, NY,
be challenged by different perspectives [12].
                                                       USA, 2011, pp. 325–334.
The methods we propose in this study design
                                                   [2] D. Read, van Leeuwen B, Predicting
are relevant beyond just movies.
                                                       hunger: The effects of appetite and de-
   We expect that all workshop participants
                                                       lay on choice, Organ. Behav. Hum. De-
will benefit from a lively discussion of how to
                                                       cis. Process. 76 (1998) 189–205.
conceptualize and measure user preferences
                                                   [3] H. E. Hershfield, D. G. Goldstein,
in ways that go beyond the current behav-
                                                       W. F. Sharpe, J. Fox, L. Yeykelis, L. L.
iorist paradigm of prioritizing what users do
                                                       Carstensen, J. N. Bailenson,         In-
over explicit preferences. Our proposal to ask
                                                       creasing saving behavior through age-
users for their explicit ratings and correlate
                                                       progressed renderings of the future self,
these with other data is just one possible ap-
                                                       J. Mark. Res. 48 (2011) S23–S37.
proach, and we would like to discuss what
                                                   [4] M. D. Ekstrand, M. C. Willemsen, Be-
other methods workshop participants would
                                                       haviorism is not enough: Better recom-
suggest and how well these apply to other do-
                                                       mendations through listening to users,
mains such as groceries and news. Address-
                                                       in: Proceedings of the 10th ACM Con-
ing present bias also raises philosophical is-
                                                       ference on Recommender Systems, Rec-
sues: Is it always irrational to pursue short-
                                                       Sys ’16, ACM, New York, NY, USA, 2016,
term rewards over long-term rewards? Are
                                                       pp. 221–224.
users in a position to judge their own long-
                                                   [5] U. Lyngs, R. Binns, M. Van Kleek, oth-
term rewards? How far should computing
                                                       ers, So, tell me what users want, what
systems go in nudging or shoving users to-
                                                       they really, really want!, Extended Ab-
wards long-term rewards?
                                                       stracts of the (2018).
   Finally, present bias is just one of many
                                                   [6] D. Kahneman, Thinking, Fast and Slow,
cognitive biases. We hope that our submis-
                                                       Farrar, Straus and Giroux, 2011.
sion will also contribute to the growing con-
                                                   [7] N. Seaver, Captivating algorithms: Rec-
versation on how to use psychological theory
     ommender systems as traps, Journal
     of Material Culture (2018). URL: https:
     //doi.org/10.1177/1359183518820366.
 [8] T. Dingler, B. Tag, E. Karapanos, K. Kise,
     A. Dengel, Detection and design for
     cognitive biases in people and comput-
     ing systems, http://critical-media.org/
     cobi/background.html, 2020.
 [9] P. Slovák, C. Frauenberger, G. Fitz-
     patrick, Reflective practicum: A frame-
     work of sensitising concepts to design
     for transformative reflection, in: Pro-
     ceedings of the 2017 CHI Conference on
     Human Factors in Computing Systems,
     dl.acm.org, 2017, pp. 2696–2707.
[10] K. Lukoff, C. Yu, J. Kientz, A. Hiniker,
     What makes smartphone use meaning-
     ful or meaningless?, Proc. ACM Inter-
     act. Mob. Wearable Ubiquitous Technol.
     2 (2018) 22:1–22:26.
[11] J. Harambam,           D. Bountouridis,
     M. Makhortykh, J. Van Hoboken,
     Designing for the better by taking
     users into account: a qualitative eval-
     uation of user control mechanisms in
     (news) recommender systems, in: Pro-
     ceedings of the 13th ACM Conference
     on Recommender Systems, dl.acm.org,
     2019, pp. 69–77.
[12] S. A. Munson, P. Resnick, Presenting di-
     verse political opinions: how and how
     much, in: Proceedings of the SIGCHI
     conference on human factors in com-
     puting systems, dl.acm.org, 2010, pp.
     1457–1466.
[13] D. Wang, Q. Yang, A. Abdul, B. Y. Lim,
     Designing theory-driven user-centric
     explainable AI, of the 2019 CHI Confer-
     ence on . . . (2019). URL: https://dl.acm.
     org/doi/abs/10.1145/3290605.3300831.

</pre>