Categories and Subject Descriptors

ACM RecSys

Analyzing the Characteristics of Shared Playlists for Music Recommendation

Dietmar Jannach

dietmar.jannach@tu-dortmund.de 0

Iman Kamehkhosh

iman.kamehkhosh@tu-dortmund.de 0

Geoffray Bonnin

geo ray.bonnin@tu-dortmund.de ray.bonnin@tu-dortmund.de 0 0 TU Dortmund , Germany

2014

The automated generation of music playlists { as supported by modern music services like last.fm or Spotify { represents a special form of music recommendation. When designing a \playlisting" algorithm, the question arises which kind of quality criteria the generated playlists should ful ll and if there are certain characteristics like homogeneity, diversity or freshness that make the playlists generally more enjoyable for the listeners. In our work, we aim to obtain a better understanding of such desired playlist characteristics in order to be able to design better algorithms in the future. The research approach chosen in this work is to analyze several thousand playlists that were created and shared by users on music platforms based on musical and meta-data features. Our rst results for example reveal that factors like popularity, freshness and diversity play a certain role for users when they create playlists manually. Comparing such usergenerated playlists with automatically created ones moreover shows that today's online playlisting services sometimes generate playlists which are quite di erent from user-created ones. Finally, we compare the user-created playlists with playlists generated with a nearest-neighbor technique from the research literature and observe even stronger di erences. This last observation can be seen as another indication that the accuracy-based quality measures from the literature are probably not su cient to assess the e ectiveness of playlisting algorithms.

Categories and Subject Descriptors

H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval; H.5.5 [Information Interfaces and Presentation]: Sound and Music Computing

Playlist generation, Music recommendation

Music, playlist, analysis, algorithm, evaluation

1. INTRODUCTION

The automated creation of playlists or personalized radio stations is a typical feature of today's online music platforms and music streaming services. In principle, standard recommendation algorithms based on collaborative ltering or content-based techniques can be applied to generate a ranked list of musical tracks given some user preferences or past listening history. For several reasons, the generation of playlists however represents a very speci c music recommendation problem. Personal playlists are, for example, often created with a certain goal or usage context (e.g., sports, relaxation, driving) in mind. Furthermore, in contrast to relevance-ranked recommendation lists used in other domains, playlists typically obey some homogeneity and coherence criteria, i.e., there are quality characteristics that are related to the transitions between the tracks or to the playlist as a whole.

In the research literature, a number of approaches for the automation of the playlist generation process have been proposed, see, e.g., [ 2, 6, 8, 10, 11 ] or the recent survey in [ 3 ]. Some of them for example take a seed song or artist as an input and look for similar tracks; others try to nd track co-occurrence patterns in existing playlists. In some approaches, playlist generation is considered as an optimization problem. Independent of the chosen technique, a common problem when designing new playlisting algorithms is to assess whether or not the generated playlists will be positively perceived by the listeners. User studies and online experiments are unfortunately particularly costly in the music domain. Researchers therefore often use o ine experimental designs and for example use existing playlists shared by users on music platforms as a basis for their evaluations. The assumption is that these \hand-crafted" playlists are of good quality; typical measures used in the literature include the Recall [ 8 ] or the Average Log-Likelihood (ALL) [ 11 ]. Unfortunately, both measures have their limitations, see also [ 2 ]. The Recall measure for example tells us how good an algorithm is at predicting the tracks selected by the users, but does not explicitly capture speci c aspects such as the homogeneity or the smoothness of track transitions.

To design better and more comprehensive quality measures, we however rst have to answer the question of what users consider to be desirable characteristics of playlists or what the driving principles are when users create playlists. In the literature, a few works have studied this aspect using di erent approaches, e.g., user studies [ 1, 7 ] or analyzing forum posts [ 5 ]. The work presented in this paper continues these lines of research. Our research approach is however di erent from previous works as we aim to identify patterns in a larger set of manually created playlists that were shared by users of three di erent online music platforms. To be able to take a variety of potential driving factors into account in our analysis, we have furthermore collected various types of meta-data and musical features of the playlist tracks from public music databases.

Overall, with our analyses we hope to obtain insights on the principles which an automated playlist generation system should observe to end up with better-received or more \natural" playlists. To test if current music services and a nearest-neighbor algorithm from the literature generate playlists that observe the identi ed patterns and make similar choices as real users, we conducted an experiment in which we analyzed commonalities and di erences between automatically generated and user-provided playlists.

Before reporting the details of our rst analyses, we will rst discuss previous works in the next section.

PREVIOUS WORKS

In [ 14 ], Slaney and White addressed the question if users have a tendency to create very homogeneous or rather diverse playlists. As a basis for determining the diversity they relied on an objective measure based on genre information about the tracks. Each track was considered as a point in the genre space and the diversity was then determined by calculating the volume of an ellipsoid enclosing the tracks of the playlist. An analysis of 887 user-created playlists indicated that diversity can be considered to be a driving factor as users typically create playlists covering several genres.

Sarro and Casey more recently [ 13 ] focused on track transitions in album playlists and made an analysis to determine if there are certain musical characteristics that are particularly important. One of the results of their investigation was that fade durations and the mean timbre of the beginnings and endings of consecutive tracks seem to have a strong inuence on the ordering of the tracks.

Generally, our work is similar to [ 14 ] and [ 13 ] in that we rely on user-created (\hand-crafted") playlists and look at meta-data and musical features of the tracks to identify potentially important patterns. The aspects we cover in this paper were however not covered in their work and our analysis is based on larger datasets.

Cunningham et al., [ 5 ], in contrast, relied on another form of track-related information and looked at the user posts in the forum of the Art of the Mix web site. According to their analysis, the typical principles for setting up the playlists mentioned by the creators were related to the artist, genre, style, event or activity but also the intended purpose, context or mood. Some users also talked about the smoothness of track transitions and how many tracks of one single artist should be included in playlists. Placing the most \important" track at the end of a playlist was another strategy mentioned by some of the playlist creators.

A di erent form of identifying playlist creation principles is to conduct laboratory studies with users. The study reported in [ 7 ] for example involved 52 subjects and indicated that the rst and the last tracks can play an important role for the quality of a playlist. In another study, Andric and Haus [ 1 ] concluded that the ordering of tracks is not important when the playlist mainly contains tracks which the users like in general.

Reynolds et al. [ 12 ] made an online survey that revealed that the context and environment like the location activity or the weather can have an in uence both on the listeners' mood and on the track selection behavior of playlist creators. Finally, the study presented in [ 9 ] again con rmed the importance of artists, genres and mood in the playlist creation process.

In this discussion, we have focused on previous attempts to understand how users create playlists and what their characteristics are. Playlist generation algorithms however do not necessarily have to rely on such knowledge. Instead, one can follow a statistical approach and only look at cooccurrences and transitions of tracks in existing playlists and use these patterns when creating new playlists, see e.g., [ 2 ] or [ 4 ]. This way, the quality factors respected by human playlist creators are implicitly taken into account. Such approaches, however, cannot be directly applied for many types of playlist generation settings, e.g., for creating \thematic" playlists (e.g., Christmas Songs) or for creating playlists that only contain tracks that have certain musical features. Pure statistical methods are not aware of these characteristics and the danger exists that tracks are included that do not match the purpose of the list and thus lead to a limited overall quality. 3.

CHARACTERISTICS OF PLAYLISTS

The ultimate goal of our research is to analyze the structure and characteristics of playlists in order to better understand the principles used by the users to create them. This section is a rst step toward this goal. 3.1

Data sources

As a basis for the rst analyses that we report in this paper, we used two types of playlist data. 3.1.1

Hand-crafted playlists

We used samples of hand-crafted playlists from three different sources. One set of playlists was retrieved via the public API of last.fm1, one was taken from the Art of the Mix (AotM) website2, and a third one was provided to us by 8tracks3. To enhance the data quality, we corrected artist misspellings using the API of last.fm.

Overall, we analyzed over 10,000 playlists containing about 108,000 di erent tracks of about 40,000 di erent artists. As a rst attempt toward our goal, we retrieved the features listed in Table 1 using the public API of last.fm and The Echo Nest (tEN), and the MusicBrainz database.

Some dataset characteristics are shown in Table 2. The \usage count" statistics express how often tracks and artists appeared overall in the playlists. When selecting the playlists, we made sure that they do not simply contain album listings. The datasets are partially quite di erent, e.g., with respect to the average playlist lengths. The 8tracks dataset furthermore has the particularity that users are not allowed to include more than two tracks of one artist, in case they want to share their playlist with others.

Figure 1 shows the distributions of playlist lengths. As can be seen, the distributions are quite di erent across the datasets. On 8tracks, a playlist generally has to comprise 1http://www.last.fm 2http://www.artofthemix.org 3http://8tracks.com at least 8 tracks. The lengths of the last.fm playlists seem to follow a normal distribution with a maximum frequency value at around 20 tracks. Finally, the sizes of the AotM playlists are much more equally distributed. 3.1.2

Generated playlists

To assess if the playlists generated by today's online services are similar to those created by users, we used the public API of The Echo Nest. We chose this service because it uses a very large database and allows the generation of playlists from several seed tracks, as opposed to, for instance, iTunes Genius or last.fm radios. We split the existing hand-crafted playlists in half, provided the rst half of the list as seed tracks to the music service and then analyzed the characteristics of the playlist returned by The Echo Nest and compared them to the patterns that we found in hand-crafted playlists. Instead of observing whether a playlister generates playlists that are generally similar to playlists created by hand, our goal here is to break down their di erent characteristics and observe on what speci c dimensions they differ. Notice that using the second half as seed would not be appropriate as the order of the tracks may be important.

We also draw our attention to the ability of the algorithms of the literature to reproduce the characteristics of handcrafted playlists. According to some recent research, one of the most competitive approaches in terms of recall is the simple k-nearest-neighbors (kNN) method [ 2, 8 ]. More precisely, given some seed tracks, the algorithm extracts the k most similar playlists based on the number of shared items and recommends the tracks of these playlists. This algorithm does not require a training step and scans the entire set of available playlists for each recommendation. 3.2

Detailed observations

In the following sections, we will look at general distributions of di erent track characteristics. 3.2.1

Popularity of tracks

The goal of the rst analysis here is to determine if users tend to position tracks in playlists depending on their popularity. In our analysis, we measure the popularity in terms of play counts. Play counts were taken from last.fm, because this is one of the most popular services and the corresponding values can be considered indicative for a larger user group.

For the measurement, we split the playlists into two parts of equal size and then determined the average play counts on last.fm for the tracks for each half. To measure to which extent the user community favors certain tracks in the playlists, we calculated the Gini index, a standard measure of inequality4. Table 3 shows the results. In the last column, we report the statistics for the tracks returned by The Echo Nest (tEN) and kNN playlisters5. We provided the rst half of the hand-crafted playlists as seed tracks and the playlisters had to select the same number of tracks as the number of remaining tracks.

The results show that users actually tend to place more popular items in the rst part of the list in all datasets, when play counts are considered. The Echo Nest playlister does not seem to take that form of popularity into account 4We organized the average play counts in 100 bins. 5We determined 10 as the best neighborhood size for our data sets based on the recall value, see Section 4.

Playlists

Tracks Artists Avg. tracks/playlist Avg. artists/playlist Avg. genres/playlist Avg. tags/playlist Avg. track usage count Avg. artist usage count lastfm 1,172 24,754 9,925 26.0 16.8 2.7 473.4 1.2 3.0 measure, we compared the creation year of each playlist with the average release year of its tracks. We limit our analysis to the last.fm and 8tracks datasets because we only could acquire creation dates for these two. 8tracks last.fm and recommends on average less popular tracks. These differences are statistically signi cant according to a Student's t-test (p < 10 5 for The Echo Nest playlister and p < 10 7 for the kNN playlister). This behavior indicates also that The Echo Nest is successfully replicating the fact that the second halves of playlists are supposed to be less popular than the rst half.

The Gini index reveals that there is a slightly stronger concentration on some tracks in the rst half for two of three datasets and the diversity slightly increases in the second part. The absolute numbers cannot be directly compared across datasets, but for the AotM dataset the concentration is generally much higher, which is also indicated by the higher \track reuse" in Table 2. Interestingly, The Echo Nest playlister quite nicely reproduces the behavior of real users with respect to the diversity of popularity.

In the lower part of Table 3, we show the results for the kNN method. Note that these statistics are based on a di erent sample of the playlists than the previous measurement. The reason is that both The Echo Nest and the kNN playlisters cannot produce playlists for all of the rst halves provided as seed tracks. We therefore considered only playlists, for which the corresponding algorithm could produce a playlist.

Unlike the playlister of The Echo Nest, the kNN method has a strong trend to recommend mostly very popular items. This can be caused by the fact that the kNN method by design recommends tracks that are often found in similar playlists. Moreover, based on the lower half of Table 3, the popularity correlates strongly with the seed track popularity. As a result, the kNN shows a potentially undesirable trend to reinforce already popular items to everyone. At the same time, it concentrates the track selection on a comparable small number of tracks as indicated by the very high value for the Gini coe cient. 3.2.2

The role of freshness

Next, we analyzed if there is a tendency of users to create playlists that mainly contain recently released tracks. As a 6On 8tracks, artist repetitions are limited due to license constraints

Figure 2 shows the statistics for both datasets. We organized the data points in bins (x-axis), where each bin represents an average-freshness level, and then counted how many playlists fall into these levels. The relative frequencies are shown on the y-axis. The result are very similar for both datasets, with a slight tendency to include older tracks for last.fm. On both datasets, more than half of the playlists contain tracks that were released on average in the last 5 years, the most frequent average age being between 4 and 5 years for last.fm and between 3 and 4 years for 8tracks. Similarly, on both datasets, more than 75% of the playlists contain tracks that were released on average in the last 8 years.

We also analyzed the standard deviation of the resulting freshness values and observed that more than half of the playlists have a standard deviation of less than 4 (years), while more than 75% have a standard deviation of less than 7 (years) on both datasets. Overall, this suggests that playlists made by users are often homogeneous with regard to the release date.

Computing the freshness for the generated playlists would require to con gure the playlisters in such a way that they select only tracks that were not released after the playlists' creation years. Unfortunately, The Echo Nest does not allow such a con guration. Moreover, for the kNN approach, the playlists that are more recent would have to be ignored, which would lead to a too small sample size and not very reliable results anymore. 3.2.3

Homogeneity and diversity

Homogeneity and diversity can be determined in a variety of ways. In the following, we will use simple measures based on artist and genre counts. The genres correspond to the genres of the artists of the tracks retrieved from The Echo Nest. Basic gures for artist and genre diversity are already given in Table 2. On AotM, for example, having several tracks of an artist in a playlist is not very common6. On last.fm, we in contrast very often see two or more tracks of one artist in a playlist. A similar, very rough estimate can be made for the genre diversity. If we ordered the tracks of a playlist by genre, we would encounter a di erent genre on last.fm only after having listened to about 10 tracks. On AotM and 8tracks, in contrast, playlists on average cover more genres.

Table 4 shows the diversities of the rst and second halves of the hand-crafted playlists, and for the automatic selections using the rst halves as seeds. As a measure of diversity, we simply counted the number of artists and genres and divided by the corresponding number of tracks. The values in Table 4 correspond the averages of these diversity measures.

Regarding the diversity of the hand-crafted playlists, the tables show that users tend to keep a same level of artist and genre diversity throughout the playlists. We can also notice that the playlists of last.fm are much more homogeneous. The diversity values of the automatic selections reveal several things. First, The Echo Nest playlister tends to always maximize the artist diversity independently of the diversity of the seeds; on the contrary, the kNN playlister lowered the initial artist diversities, except on the last.fm dataset, where it increased them, though less than The Echo Nest playlister. Regarding the genre diversity, we can observe an opposite tendency for both playlisters: The Echo Nest playlister tends to reduce the genre diversity while the kNN playlister tends to increase it. Again, these di erence are statistically significant (p < 0:03 for The Echo Nest playlister and p < 0:006 for the kNN playlister). Overall, the resulting diversities of the both approaches tend to be rather dissimilar to those of the hand-crafted playlists. 3.2.4

Musical features (The Echo Nest)

To understand if people tend to place tracks with speci c feature values into their playlists, we then computed the distribution of the average feature values of each playlist. Figure 4 shows the results of this measurement for the energy and \hotttnesss" features. For all the other features (danceability, loudness and tempo), the distributions were similar to those of Figure 3, which could mean that they are generally not particularly important for the users.

When looking at the energy feature, we see that users tend to include tracks from a comparably narrow energy spectrum with a low average energy level, even though there exist more high-energy tracks in general as shown in Figure 3. A similar phenomenon of concentration on a certain range of values can be observed for the \hotttnesss" feature. As a side aspect, we can observe that the tracks shared on AotM are on average slightly less \hottt" than those of both other platforms7.

We nally draw our attention to the feature distributions of the generated playlists. Figure 5 as an example shows the distributions of the energy and \hotttnesss" factors for 7The results for the \hotttnesss" we report here correspond to the values at the time when we retrieved the data using the API of The Echo Nest, and not to those at the time when the playlists were created. This is not important as we do not look at the distributions independently, but compare them to the distributions in Figure 3. 0.1 0.09 0.08 y0.07 n eu0.06 q fre0.05 e iv0.04 t a leR0.03 0.02 0.01

0 0.25 0.2 y c n eu0.15 q e fr itve 0.1 a l e R 0.05 0 0 1st half 2nd half tEN kNN10 the rst halves and second halves of the playlists of all three datasets, together with the distributions of the tracks selected by The Echo Nest and kNN playlisters.

The gure shows that The Echo Nest playlister tends to produce a distribution that is quite similar to the distribution of the seed tracks. The kNN playlister, in contrast, tends to concentrate the distributions toward the maximum values of the distributions of the seeds. We could observe this phenomenon of concentration for all the features on all three datasets, except for the danceability on the AotM dataset. 3.2.5

Transitions and Coherence

We now focus on the importance of transitions between the tracks, and de ne the coherence of a playlist as the average similarity between its consecutive tracks. Such similarities can be computed according to various criteria. We used the binary cosine similarity of the genres and artists8, and the Euclidean linear similarity for the numerical track features of The Echo Nest. Table 5 shows the corresponding results for the rst and second halves of the hand-crafted playlists, and for the automatic selections using the rst halves as seeds.

We can rst see that for all datasets and for all criteria, the second halves of the playlists have a lower coherence than the rst halves. If we assume that the coherence is representative of the e ort of the users to create good playlists, then the tracks of the second halves seem to be slightly less carefully selected than those of the rst halves. 8In the case of artists, this means that the similarity equals 1 if both tracks have the same artist, and 0 else. The metric thus measures the proportion of cases when the users consecutively selected tracks from the same artist.

Another interesting phenomenon is the high artist coherence values on the last.fm dataset. These values indicate that last.fm users have a surprisingly strong tendency to group tracks from the same artist together, which was not successfully reproduced by the two playlisters. Both playlisters actually seem to have a tendency to produce always the same coherence values, independently of the coherence values of the seed. A last interesting result is the high coherence of artist genres on the AotM and 8tracks datasets { the high genre coherence values on last.fm can be explained by the high artist coherence values.

4. STANDARD ACCURACY METRICS

Our analysis so far has revealed some particular characteristics of user-created playlists. Furthermore, we observed that the nearest-neighbor playlisting scheme can produce playlists that are quite di erent to those generated by the commercial Echo Nest service, e.g., in terms of average track popularity (Table 3).

In the research literature, \hit rates" (recall) and the average log-likelihood (ALL) are often used to compare the quality of playlists generated by di erent algorithms [ 2, 8, 11 ]. The goal of our next experiment was to nd out how The Echo Nest playlister performs on these measures. As it is not possible to acquire probability values for the tracks selected by The Echo Nest playlister, the ALL cannot be 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 the Echo

Nest kNN10 the Echo

Nest kNN10

kNN10 the Echo

Nest last.fm track recall

AotM artist recall genre recall

8tracks tag recall used9. In the following we thus only focus on the precision and recall.

The upper part of Figure 6 shows the recall values at list length 100 for the di erent datasets10. Again, we split the playlists and used the rst half as seed tracks. Recall was then computed by comparing the computed playlists with the \hidden" tracks of the original playlist. We measured recall for tracks, artists, genres and tags. The results show that the kNN method quite clearly outperforms the playlister of The Echo Nest on the recall measures across all datasets except for the artist recall for the last.fm dataset. The di erences are statistically signi cant for all the experiments except for the track and artists recall on last.fm (p < 10 6) according to a Student's t-test. As expected, the kNN method leads to higher absolute values for larger datasets as more neighbors can be found.

The lower part of Figure 6 presents the precision results. The precision values for tracks are as expected very low and close to zero which is caused by the huge set of possible tracks and the list length of 100. We can however observe a higher precision for the kNN method on the AotM dataset (p < 10 11), which is the largest dataset. Regarding artist, genre and tag prediction, The Echo Nest playlister lead to a higher precision (p < 10 3) than the kNN playlister on all datasets. 9Another possible measure is the Mean Reciprocal Rank (MRR). Applied to playlist generation, one limitation of this metric is that it corresponds to the assumption that the rank of the test track or artist to predict should be as high as possible in the recommendation list, although many other tracks or artist may be more relevant and should be ranked before. 10We could not measure longer list lengths as 100 is the maximum playlist length returned by The Echo Nest.

With respect to the evaluation protocol, note that we only measured precision and recall when the playlister was able to return a playlist continuation given the seed tracks. This was however not always the case for both techniques. In Table 6, we therefore report the detailed coverage gures, which show that the kNN method was more often able to produce a playlist. If recall is measured for all seed playlists, the differences between the algorithms are even larger. When measuring precision for all playlists, the di erences between the playlisters become very small.

Dataset tEN

last.fm 28.33 AotM 42.75 8tracks 35.3

Overall, measuring precision and recall when comparing generated playlists with those provided by users in our view represents only one particular form of assessing the quality of a playlist generator and should be complemented with additional measures. Precision and recall as measured in our experiments for example do not consider track transitions. There is also no \punishment" if a generated playlist contains individual non- tting tracks that would hurt the listener's overall enjoyment.

5. PUBLIC AND PRIVATE PLAYLISTS

Some music platforms and in particular 8tracks let their users create \private" playlists which are not visible to others and public ones that for example are shared and used for social interaction like parties, motivation for team sport or romantic evening. The question arises if public playlists have di erent characteristics than those that were created for personal use only, e.g., because sharing playlists to some extent can also serve the purpose of creating a public image of oneself.

We made an initial analysis on the 8tracks dataset. Table 7 shows the average popularity of the tracks in the 8tracks playlists depending on whether they were in \public" or \private" playlists (the rst category contains 2679 playlists and the second 451). As can be seen, the tracks of the private playlists are much more popular on average than the tracks in the public playlists. Moreover, as indicated by the corresponding Gini coe cients, the popular tracks are almost equally distributed across the playlists. Furthermore, Figure 7 shows the corresponding freshness values. We can see that the private playlists generally contained more recent tracks than public playlists.

Public playlists Private playlists Play counts 870k 935k Gini index 0.20 0.06

These results can be interpreted at least in two di erent ways. First, users might create some playlists for their personal use to be able to repeatedly listen to the latest popular tracks. They probably do not share these playlists because Private playlists 0

5 10 15 20 25 Average freshness of 8tracks playlists (years) 30 sharing a list of current top hits might be of limited value for other platform members who might be generally more interested in discovering not so popular artists and tracks. Second, users might deliberately share playlists with less popular or known artists and tracks to create a social image on the platform.

Given these rst observations, we believe that our approach has some potential to help us better understand some elements of user behavior on social platforms in general, i.e., that people might not necessarily only share tracks that match their actual taste.

6. SUMMARY AND OUTLOOK

The goal of our work is to gain a better understanding of how users create playlists in order to be able to design future playlisting algorithms that take these \natural" characteristics into account. The rst results reported in this paper indicate, for example, that features like track freshness, popularity aspects, or homogeneity of the tracks are relevant for users, but not yet fully taken into account by current algorithms that are considered to create high-quality playlists in the literature. Overall, the observations also indicate that additional metrics might be required to assess the quality of computer-generated playlists in experimental settings that are based on historical data such as existing playlists or listening logs.

Given the richness of the available data, many more analyses are possible. Currently, we are exploring \semantic" characteristics to automatically identify the underlying theme or topic of the playlists. Another aspect not considered so far in our research is the popularity of the playlists. For some music platforms, listening counts and \like" statements for playlists are available. This additional information can be used to further di erentiate between \good" and \bad" playlists and help us obtain more ne-granular di erences with respect to the corresponding playlist characteristics. Last, we plan to extend our experiments and analysis by considering other music services, in particular last.fm radios, and other playlisting algorithms, in particular algorithms that exploit content information. 7.

[1]

Andric and

Haus . Estimating Quality of Playlists by Sight . In Proc. AXMEDIS , pages 68 { 74 , 2005 .

[2]

Bonnin and

Jannach . Evaluating the Quality of Playlists Based on Hand-Crafted Samples . In Proc. ISMIR , pages 263 { 268 , 2013 .

[3]

Bonnin and

Jannach . Automated generation of music playlists: Survey and experiments . ACM Computing Surveys , 47 ( 2 ), 2014 .

[4]

Chen ,

J. L.

Moore ,

Turnbull , and

Joachims . Playlist Prediction via Metric Embedding . In Proc. KDD , pages 714 { 722 , 2012 .

[5]

Cunningham ,

Bainbridge , and

Falconer . ` More of an Art than a Science': Supporting the Creation of Playlists and Mixes . In Proc. ISMIR , pages 240 { 245 , 2006 .

[6]

Flexer ,

Schnitzer ,

Gasser , and

Widmer . Playlist Generation Using Start and End Songs . In Proc. ISMIR , pages 173 { 178 , 2008 .

[7]

D. L.

Hansen and

Golbeck . Mixing It Up: Recommending Collections of Items . In Proc. CHI , pages 1217 { 1226 , 2009 .

[8]

Hariri ,

Mobasher , and

Burke . Context-Aware Music Recommendation Based on Latent Topic Sequential Patterns . In Proc. RecSys , pages 131 { 138 , 2012 .

[9]

Kamalzadeh ,

Baur , and

Mo ller. A Survey on Music Listening and Management Behaviours . In Proc. ISMIR , pages 373 { 378 , 2012 .

[10]

Lehtiniemi and

Seppa nen. Evaluation of Automatic Mobile Playlist Generator . In Proc. MC , pages 452 { 459 , 2007 .

[11]

McFee and

G. R.

Lanckriet . The Natural Language of Playlists . In Proc. ISMIR , pages 537 { 542 , 2011 .

[12]

Reynolds ,

Barry ,

Burke , and

Coyle . Interacting With Large Music Collections: Towards the Use of Environmental Metadata . In Proc. ICME , pages 989 { 992 , 2008 .

[13]

A. M.

Sarro and

Casey . Modeling and Predicting Song Adjacencies In Commercial Albums . In Proc. SMC , 2012 .

[14]

Slaney and

White . Measuring Playlist Diversity for Recommendation Systems . In Proc. AMCMM , pages 77 { 82 , 2006 .