Towards Effective Exploration/Exploitation in
                            Sequential Music Recommendation
                         Himan Abdollahpouri                                                                         Steve Essinger
                              DePaul University                                                                   Pandora Media, Inc.
                                     USA                                                                                 USA
                             habdolla@depaul.edu                                                                sessinger@pandora.com

ABSTRACT                                                                  has already recommended all the available people who match the
Music streaming companies collectively serve billions of songs per        user’s interest and, so, exploring a wider range of people is needed
day. Radio-based music services may intersperse audio advertise-          in order to be able to generate new recommendations. Therefore,
ments among the songs as a means to generate revenue, much like           providing exploratory content to a user is a key component for
traditional FM radio. Regardless of the monetization approach, the        discovery. We conducted an experiment on a music recommenda-
recommender system should decide when to play content that the            tion application and our results show that the previous sequence
listener is known to enjoy (exploit) and content that is novel to         of events in a listener’s session is important in deciding whether
the listener (explore). Recommender systems that rely on this ex-         the RS should provide subsequent exploratory types of content.
plore/exploit type framework have been deployed in a wide variety
of applications such as movies, books, music, shopping and more. In       2                      SONG/AD SEQUENCE ANALYSIS
this work, we investigate the impact of different ad/song sequences       We have compiled data from a large-scale music recommendation
on listener behavior. In particular, we focus on the impact of explor-    service for our analysis. To find the effect of different sequences
ing new song content for the listener given the previous sequence         of songs and ads on the probability of a user switching the station
of ads and songs in the listener’s session. Our results show that         after listening to an exploratory song, we looked at one million
the prior sequence matters when considering song exploration and          sessions on mobile devices where the ad placement had been made
that this prior sequence has an impact on the listener’s tendency         completely at random. Note that the randomness of ad placement
to interrupt their current session.                                       is important in order to make sure our analysis is not biased toward
                                                                          any particular ad placement algorithm. We compare the impact of
                                                                          explore songs versus exploit songs in the context of the previous
1    INTRODUCTION
                                                                          three events. For example, given the prior three events Ad, Song,
Recommender systems (RS) have been deployed in numerous do-               Song, where each song is an exploit, what is the probability of
mains including music, movies, e-commerce and books. In music             the listener changing the station if the next song spun for them is
recommendation, one of the overarching goals of the RS is to find         an explore song versus the probability of station change given an
the best song to play for each listener, personalized to their specific   exploit song? Station change is used as a proxy for discontent with
taste(s) in music. In general, companies offering music recommen-         the current stream of music.
dation services provide two different types of subscriptions: (1)             We calculated the probabilities of users changing the station
Ad-supported membership where the music is free, but the listener         when they are exposed to different sequences of ads and songs as
is subject to advertisements and (2) premium membership where
listener pays a monthly membership fee in exchange for ad-free
listening. This paper focuses on the former, ad-supported listening.                                               Percent Increase of Station Change
                                                                                                             Explore versus Exploit Song, following Sequence
Unsurprisingly, listeners prefer hearing songs over ads. However,                                600
                                                                                                                                                   531
the business depends on the revenue that it makes from the ads
                                                                                                 500
and cannot operate without serving them. Therefore, playing ads
is crucial to keep the business alive and should be considered as a                              400
                                                                              Percent Increase


content served to the listener along with music.
    One of the fundamental concepts in RS is the idea of exploration                             300
and exploitation [9]. This paradigm results in a balance between rec-                                          208                                             196
ommending content the system has high certainty the user would                                   200
                                                                                                       133                             138
like (exploitation) and the content for which there is less certainty                                                  99                                104
                                                                                                 100
(exploration). Without exploration, users would become stuck in a                                                              64

filter bubble and continue to see a narrow set of products. This is
                                                                                                   0
a missed opportunity to experience other products that could be
                                                                                                        S


                                                                                                                A


                                                                                                                        S


                                                                                                                                A


                                                                                                                                         S


                                                                                                                                                    A


                                                                                                                                                          S


                                                                                                                                                                A
                                                                                                        S


                                                                                                               S


                                                                                                                       A


                                                                                                                               A


                                                                                                                                        S


                                                                                                                                                    S


                                                                                                                                                          A


                                                                                                                                                                A
                                                                                                       S


                                                                                                              S


                                                                                                                      S


                                                                                                                              S


                                                                                                                                       A


                                                                                                                                                   A


                                                                                                                                                         A


                                                                                                                                                               A


of interest to them [5, 8]. Another reason for exploration is when                                                          Prior Event Sequence
the number of items matching a user’s interest is limited and the
system should not recommend the same item again to the user.              Figure 1: Percent increase of the probability of station
For example, in online dating [7], it is possible that the system         change for an explore song vs. an exploit song, following
RecSys 2017 Poster Proceedings, August 27-31, Como, Italy.                different sequences of exploit songs and ads.
RecSys 2017 Poster Proceedings, August 27-31, Como, Italy                                                  Himan Abdollahpouri and Steve Essinger


follows: there are a total of 8 possible event combinations for a set      used to make a balance between exploration and exploitation [10].
of three items as shown in figure 1. We denote explore song by S 0         Moreover, authors have previously proposed an approach for an ef-
and exploit song by S. Station change is represented by, C.                fective balance between recommending popular and long-tail items
                                       P(C | S 0) − P(C | S)               [2]. A more similar idea to our work is done in [6] where authors
      Percentage Difference      =                           ∗ 100   (1)   investigated a proper timing for delivering the recommendation.
                                             P(C | S)
                                                                           However, in our work, we are not looking for a perfect timing for
P(C | S 0) is the probability of a user changing the station given the     the recommendation in general as the user always should receive a
last played content is an explore song. P(C | S) is the probability of a   content (song or ad) as recommendation. Our work is also novel as
user changing the station when the last played content is an exploit       we look at the previous sequences of the recommendations as an
song. The lower and upper confidence bounds for the computed               indication for whether it is a good time for exploration or not.
percentage increases, shown in figure 1 as vertical blue lines on top
of the bars, are computed as follows,                                      4    CONCLUSION AND FUTURE WORK
                
                  P(C | S 0) − P(C | S)
                                                                          In this work, we investigated the impact of different ad/song se-
                                        ± 1.96 ∗ SE ∗ 100            (2)   quences on listener behavior. In particular, we focused on the
                        P(C | S)
                                                                           impact of exploring new song content for the listener given the
   where SE (i.e. the standard error) is calculated using equation 3,
                                                                           previous set of ads and songs in the listener’s session. Our ex-
      s                                                                    perimental results show that the previous sequence of ads/songs
          P(C | S 0) ∗ (1 − P(C | S 0)) P(C | S) ∗ (1 − P(C | S))          matters in deciding what the right time is for exploration versus ex-
                                       +                             (3)   ploitation. For our future work, we will launch an A/B experiment
                      NS 0                         NS
                                                                           controlling for the placement of explore songs and see how differ-
   where N S 0 is the total number of times an explore song has been       ent users behave when they observe different sequences of songs
played. The total number of times an exploit song has been played          and ads. We will also investigate more sophisticated offline models,
is denoted by N S . Figure 1 shows the percent increase of station         such as HMMs and RNNs in a reinforcement learning setting that
changes after playing an explore versus an exploit song when a user        could learn superior personalized playlist sequencing. This work is
has observed the respective prior sequence of exploit songs and ads.       a starting point for a larger project in which we aim to optimize the
Due to the company’s data privacy policy, we have not included             stream of recommendations of mixed types of content (i.e. contents
the individual probabilities of switching the station for explore          from different stakeholders) [1, 3, 4].
and exploit songs, but have provided the probability difference of
change.                                                                    ACKNOWLEDGMENTS
   An exploit song is denoted by S and an ad is shown by A. As
                                                                           We would like to thank Pandora Media, Inc. for access to their
you can see, depending upon the previous sequence of songs and
                                                                           vastly rich dataset.
ads, the probability of a user switching the station when we show
them an explore song is higher than the probability when we show
                                                                           REFERENCES
an exploit song. This is true for all 8 different combinations of           [1] Himan Abdollahpouri, Robin Burke, and Mobasher Bamshad. 2017. Recom-
songs and ads. Moreover, some sequences are riskier than the oth-               mender systems as multi-stakeholder environments. In Proceedings of the 25th
ers for placing an explore song. For example, the ASA sequence                  Conference on User Modeling, Adaptation and Personalization (UMAP2017). ACM.
                                                                            [2] Himan Abdollahpouri, Robin Burke, and Bamshad Mobasher. 2017. Controlling
(which means playing an ad, then a song and then another ad) has                Popularity Bias in Learning to Rank Recommendation. In Proceedings of the 11th
the highest probability increase (+531.13%) of a user switching the             ACM conference on Recommender systems. ACM, To appear.
station when given an explore song after that sequence. Clearly,            [3] Himan Abdollahpouri and Steve Essinger. 2017. Multiple stakeholders in a
                                                                                music recommender system. In 1st International Workshop on Value-Aware and
this is not the best opportunity to explore new content. On the                 Multistakeholder Recommendation at RecSys 2017.
other hand, the SAA sequence has the lowest probability increase            [4] Robin Burke and Himan Abdollahpouri. 2017. Patterns of Multistakeholder Rec-
                                                                                ommendation. In 1st International Workshop on Value-Aware and Multistakeholder
(+64.42%), but is still positive. While playing an explore song is still        Recommendation at RecSys 2017.
riskier than an exploit song in all cases, it is better to explore after    [5] Oscar Celma. 2016. The Exploit-Explore Dilemma in Music Recommendation. In
particular sequences over others. Certainly, different sequences of             Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 377–377.
                                                                            [6] Nofar Dali Betzalel, Bracha Shapira, and Lior Rokach. 2015. Please, not now!: A
songs and ads have different effects on station switching behavior              model for timing recommendations. In Proceedings of the 9th ACM Conference on
and a recommender system should try to take these sequences into                Recommender Systems. ACM, 297–300.
account when doing exploration and exploitation, as in our sequen-          [7] Luiz Pizzato, Tomek Rej, Thomas Chung, Irena Koprinska, and Judy Kay. 2010.
                                                                                RECON: a reciprocal recommender for online dating. In Proceedings of the fourth
tial music recommendation system. Overarching, instead of a blind               ACM conference on Recommender systems. ACM, 207–214.
explore-exploit platform, we advise taking an intelligent approach          [8] Paul Resnick, R Kelly Garrett, Travis Kriplean, Sean A Munson, and Natalie Jo-
                                                                                mini Stroud. 2013. Bursting your (filter) bubble: strategies for promoting diverse
that accounts for a listener’s state of listening (whether they are             exposure. In Proceedings of the 2013 conference on Computer supported cooperative
happy with the past couple of songs/ads or not) into account when               work companion. ACM, 95–100.
deciding to exploit or when to explore.                                     [9] Hastagiri P Vanchinathan, Isidor Nikolic, Fabio De Bona, and Andreas Krause.
                                                                                2014. Explore-exploit in top-n recommender systems via gaussian processes. In
                                                                                Proceedings of the 8th ACM Conference on Recommender systems. ACM, 225–232.
3   RELATED WORK                                                           [10] Xinxi Wang, Yi Wang, David Hsu, and Ye Wang. 2014. Exploration in interac-
                                                                                tive personalized music recommendation: a reinforcement learning approach.
The idea of explore-exploit has been studied in recommender sys-                ACM Transactions on Multimedia Computing, Communications, and Applications
tems by some researchers [9]. In particular, for single item rec-               (TOMM) 11, 1 (2014), 7.
ommendation, approaches like Multi-Armed Bandits have been