<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>K. Lukof)
~ https://kailukof.com (K. Lukof)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Addressing Present Bias in Movie Recommender Systems and Beyond</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kai Lukof</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Washington</institution>
          ,
          <addr-line>Seattle, WA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Present bias leads people to choose smaller immediate rewards over larger rewards in the future. Recommender systems often reinforce present bias because they rely predominantly upon what people have done in the past to recommend what they should do in the future. How can recommender systems overcome this present bias to recommend items in ways that match with users' aspirations? Our workshop position paper presents the motivation and design for a user study to address this question in the domain of movies. We plan to ask Netflix users to rate movies that they have watched in the past for the longterm rewards that these movies provided (e.g., memorable or meaningful experiences). We will then evaluate how well long-term rewards can be predicted using existing data (e.g., movie critic ratings). We hope to receive feedback on this study design from other participants at the HUMANIZE workshop and spark conversations about ways to address present bias in recommender systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;present bias</kwd>
        <kwd>cognitive bias</kwd>
        <kwd>algorithmic bias</kwd>
        <kwd>recommender systems</kwd>
        <kwd>digital wellbeing</kwd>
        <kwd>movies</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>People often select smaller immediate
rewards over larger rewards in the future, a
phenomenon that is known as present bias
or time discounting. This applies to
decisions such as what snack to eat [1, 2], how
much to save for retirement [3], or which
movies to watch [2]. For example, when
people choose a movie to watch this evening
they often choose guilty pleasures like The
Fast and The Furious, which are enjoyable
inthe-moment, but then quickly forgotten. By
contrast, when they choose a movie to watch
next week, they are more likely to choose
iflms that are challenging but meaningful,
such as Schindler’s List [2].</p>
      <p>Recommender systems (RS), algorithmic
systems that predict the preference a user
would give to an item, often reinforce present
bias. Today, the dominant paradigm of
recommender systems is behaviorism:
recommendations are selected based on behavior
traces (“what users do”) and they largely
neglect to capture explicit preferences (“what
users say”) [4]. Since “what users do” reflects
a present bias, RS that rely upon such actions
to train their recommendations will
prioritize items that ofer high short-term rewards
but low long-term rewards. In this way,
recommender systems may reinforce what the
current self wants rather than helping
people reach their ideal self [5].</p>
      <p>This position paper for the HUMANIZE
workshop proposes a study design to address
these topics in the domain of movies. In
Study 1, a survey of Netflix users, we
investigate: How should a RS make
recommendations by asking ordinary users about
the rather academic concept of "long-term
rewards"? And can long-term rewards be to transform their behavior [9]. Lukof et
predicted based on existing data (e.g., movie al. previously explored how experience
samcritic ratings)? In Study 2, a participatory de- pling can be used to measure how
meaningsign exercise with a movie RS, we ask: How ful people find their interactions with
smartdo users want a RS to balance short-term phone apps immediately after use [10].
Howand long-term rewards? And what controls ever, building such reflection into RSs
rewould users like to have over such a RS? mains a major challenge because it is
un</p>
      <p>We expect that our eventual findings will clear how and when a system should ask a
also inform the design of recommender sys- user about the "long-term rewards" of an
extems that address present bias in other do- perience. It may be that the common
apmains such as news, food, and fitness. proach of asking users to rate items on a
"5star" scale reflects a combination of
shortterm and long-term rewards, and that a
dif2. Related Work ferent prompt is required to capture
evaluations of long-term rewards more specifically.</p>
      <p>It is also an open question how well such
long-term rewards can be inferred from
existing data.</p>
      <p>The third perspective leverages the
wisdom of the crowd, by using the collective
elicited preferences of similar users with more
experience to make recommendations.
Recommender systems today tend to use the
"behavior of the crowd" as input into their
models, in the form of behavioral data of similar
users, but largely neglect elicited preferences
[4].</p>
      <p>Finally, Ekstrand and Willemsen propose
participatory design as a general corrective
to the behaviorist bias of recommender
systems [4]. Harambam et al. explored using
participatory methods to evaluate a
recommender system for news, suggesting that
giving users control might mitigate filter
bubbles in news consumption [11].
Participatory design is a promising way to investigate
how users want a RS to balance short-term
and long-term rewards and the controls they
would like to have.</p>
      <p>The social psychologist Daniel Kahneman
describes people as having both an
experiencing self, who prefers short-term rewards
like pleasure, and a remembering self, who
prefers long-terms rewards like meaningful
experiences [6]. Lyngs et al. describe three
diferent approaches to the thorny question
of how to measure a user’s “true preferences"
[5]. The first approach aligns with the
experiencing self, the second with the
remembering self, and the third with the wisdom of the
crowd.</p>
      <p>The first approach follows the
experiencing self and asserts that what users do is what
they really want, which many in the Silicon
Valley push one step further to what we can
get users to do is what they really want [7].</p>
      <p>Social media that are financed by advertising
are "compelled to find ways to keep users
engaged for as long as possible" [8]. To achieve
this, social media services often give the
experiencing self exactly what it wants,
knowing that it will override the preferences of the
remembering self and lead the user to stay
engaged for longer than they had intended.</p>
      <p>The second approach prioritizes the
remembering self, calling for systems to 3. Proposed Study Design
prompt the user to reflect on their ideal self.</p>
      <p>In this vein, Slovak et al. propose to design In what follows, we propose a study design to
to help users to reflect upon how they wish better understand how to measure the
longterm rewards of items in the context of movie
recommendations. We hope to receive
feedback on this study design from other
participants at the HUMANIZE workshop and
prompt conversations about ways to address
present bias in recommender systems more
broadly.</p>
      <sec id="sec-1-1">
        <title>3.1. Study 1: Eliciting user preferences for long-term rewards</title>
        <p>Our first study is a survey of Netflix users
that we are currently piloting, which
addresses two research questions:
• RQ 1a: What wording should be used
to ask users to rate the “short-term
rewards” and “long-term rewards” of a
movie? In other words, what wording
captures the right construct and makes
sense to users?
• For long-term rewards: How
rewarding was this movie after you watched
it?</p>
        <sec id="sec-1-1-1">
          <title>Participants will rate all questions on a 1-5</title>
          <p>scale, from "Not at all" to "Very."</p>
          <p>We are also interested in understanding
what other constructs are correlated with the
both short-term and long-term rewards. To
this end, we are also asking about related
constructs, such as:
• How enjoyable was this movie while</p>
          <p>you were watching it?
• How interesting was this movie while</p>
          <p>you were watching it?
• How meaningful was this movie after</p>
          <p>you watched it?
• How memorable was this movie after
you watched it?
3.1.1. Study 1 Methods
We will ask Netflix users who watch at least
one movie per month to download their past
viewing history and share it with us. We will
ask them to rate 30 movies: 10 watched in
the past year, 10 watched 1-2 years ago, and
10 watched 2-3 years ago. Participants will
rate each movie for short-term reward,
longterm reward, and other constructs that might
be correlated with these rewards (e.g.,
meaningfulness, memorability).</p>
          <p>The current wording of our questions is:
We are currently piloting all questions
us• RQ 1b: How well can a recommender ing a talk-aloud protocol in which
particisystem predict the long-term rewards pants explain their thinking to us as they
of a movie for an individual using data complete the survey. We are checking to
other than explicit user ratings? make sure that the wording makes sense to
participants and to identify constructs that
are related to short-term and long-term
rewards. The constructs that are most closely
related to these rewards in the piloting will
be included in the final survey, so that each
movie will be rated for a cluster of constructs
all related to short-term and long-term
rewards. For the final survey, we plan to recruit
about 50 Netflix users to generate a total of
1,500 movie ratings.</p>
          <p>3.1.2. Study 1 Planned Analysis
• For short-term rewards: How
rewarding was this movie while you were
watching it?</p>
          <p>To address RQ 1a, we will report the
qualitative results of our talk-aloud piloting of
the survey wording. We will also report the
correlation between how participants rated
our measures of short-term and long-term
rewards and related constructs (e.g.,
meaningfulness, memorability).</p>
          <p>To address RQ 1b, first we will test how
well existing data correlates with long-term
rewards. The existing data we plan to test
includes: user ratings (from others), critic
ratings, box ofice earnings, genre, and the day
of the week the movie was watched. One
notable limitation of our study design here
is that we will not have access to behavioral
data like clicks, views, or time spent about
movies.</p>
          <p>Second, we will create machine learning
models to test how well we can predict the
long-term rewards of a movie might provide
for an individual, as assessed by metrics like
precision and recall. Specifically, we will
create:
• RQ 2a: What preferences do users
have for how a movie RS should weight
the short-term versus long-term
rewards of the movies it recommends? In
what contexts would users prefer what
weights?
• RQ 2b: How would users like to
control how a movie RS weighs the
shortterm versus long-term rewards of the
movies it recommends?</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>At the workshop, we hope to elicit feedback on how Study 2 might be revised to best answer our research questions.</title>
          <p>3.2.1. Study 2 Methods
Finally, we will also create generalized and
personalized models that predict short-terms
rewards too, and compare their performance
against the models that predict long-term
rewards. Our suspicion is that existing data
may be more predictive of short-term
rewards than long-term rewards, because
longterm rewards may require a form of reflection
that most existing data do not capture.</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>3.2. Study 2: Addressing present bias via user control mechanisms</title>
        <sec id="sec-1-2-1">
          <title>Study 2 is currently planned as a participa</title>
          <p>tory design exercise with a movie
recommender system, in which we ask:
would like to have based on a prototype of a to address cognitive biases in intelligent user
news recommender system, with a focus on interfaces [13].
addressing the bias of filter bubbles [11].</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Acknowledgments</title>
    </sec>
    <sec id="sec-3">
      <title>4. Workshop relevance</title>
      <sec id="sec-3-1">
        <title>Thank you to Minkyong Kim, Ulrik Lyngs,</title>
        <p>Today’s recommender systems often priori- David McDonald, Sean Munson, and Alexis
tize “what users do” and neglect “what users Hiniker for feedback on this research agenda
say.” As a result, they tend to reinforce the and drafts of the position paper.
current self rather than foster the ideal self.</p>
        <p>This study design proposes to study this with
regards to movies. But the same problem References
also applies to other domains such as online
groceries, where the current self might want [1] M. K. Lee, S. Kiesler, J. Forlizzi,
Mincookies while the ideal self wants blueber- ing behavioral economics to design
perries, or digital news, where the current self suasive technology for healthy choices,
might want to read stories that agree with in: Proceedings of the SIGCHI
Confertheir worldview while the ideal self wants to ence on Human Factors in Computing
be challenged by diferent perspectives [12]. Systems, CHI ’11, ACM, New York, NY,
The methods we propose in this study design USA, 2011, pp. 325–334.
are relevant beyond just movies. [2] D. Read, van Leeuwen B, Predicting</p>
        <p>We expect that all workshop participants hunger: The efects of appetite and
dewill benefit from a lively discussion of how to lay on choice, Organ. Behav. Hum.
Deconceptualize and measure user preferences cis. Process. 76 (1998) 189–205.
in ways that go beyond the current behav- [3] H. E. Hershfield, D. G. Goldstein,
iorist paradigm of prioritizing what users do W. F. Sharpe, J. Fox, L. Yeykelis, L. L.
over explicit preferences. Our proposal to ask Carstensen, J. N. Bailenson,
Inusers for their explicit ratings and correlate creasing saving behavior through
agethese with other data is just one possible ap- progressed renderings of the future self,
proach, and we would like to discuss what J. Mark. Res. 48 (2011) S23–S37.
other methods workshop participants would [4] M. D. Ekstrand, M. C. Willemsen,
Besuggest and how well these apply to other do- haviorism is not enough: Better
recommains such as groceries and news. Address- mendations through listening to users,
ing present bias also raises philosophical is- in: Proceedings of the 10th ACM
Consues: Is it always irrational to pursue short- ference on Recommender Systems,
Recterm rewards over long-term rewards? Are Sys ’16, ACM, New York, NY, USA, 2016,
users in a position to judge their own long- pp. 221–224.
term rewards? How far should computing [5] U. Lyngs, R. Binns, M. Van Kleek,
othsystems go in nudging or shoving users to- ers, So, tell me what users want, what
wards long-term rewards? they really, really want!, Extended
Ab</p>
        <p>Finally, present bias is just one of many stracts of the (2018).
cognitive biases. We hope that our submis- [6] D. Kahneman, Thinking, Fast and Slow,
sion will also contribute to the growing con- Farrar, Straus and Giroux, 2011.
versation on how to use psychological theory [7] N. Seaver, Captivating algorithms:
Recommender systems as traps, Journal
of Material Culture (2018). URL: https:
//doi.org/10.1177/1359183518820366.
[8] T. Dingler, B. Tag, E. Karapanos, K. Kise,</p>
        <p>A. Dengel, Detection and design for
cognitive biases in people and
computing systems, http://critical-media.org/
cobi/background.html, 2020.
[9] P. Slovák, C. Frauenberger, G.
Fitzpatrick, Reflective practicum: A
framework of sensitising concepts to design
for transformative reflection, in:
Proceedings of the 2017 CHI Conference on
Human Factors in Computing Systems,
dl.acm.org, 2017, pp. 2696–2707.
[10] K. Lukof, C. Yu, J. Kientz, A. Hiniker,</p>
        <p>What makes smartphone use
meaningful or meaningless?, Proc. ACM
Interact. Mob. Wearable Ubiquitous Technol.</p>
        <p>2 (2018) 22:1–22:26.
[11] J. Harambam, D. Bountouridis,</p>
        <p>M. Makhortykh, J. Van Hoboken,
Designing for the better by taking
users into account: a qualitative
evaluation of user control mechanisms in
(news) recommender systems, in:
Proceedings of the 13th ACM Conference
on Recommender Systems, dl.acm.org,
2019, pp. 69–77.
[12] S. A. Munson, P. Resnick, Presenting
diverse political opinions: how and how
much, in: Proceedings of the SIGCHI
conference on human factors in
computing systems, dl.acm.org, 2010, pp.</p>
        <p>1457–1466.
[13] D. Wang, Q. Yang, A. Abdul, B. Y. Lim,</p>
        <p>Designing theory-driven user-centric
explainable AI, of the 2019 CHI
Conference on . . . (2019). URL: https://dl.acm.
org/doi/abs/10.1145/3290605.3300831.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>