MedleyAssistant – A system for personalized music medley creation Zhengshan Shi Gautham J. Mysore CCRMA, Stanford University Adobe Research Stanford, USA San Francisco, USA ABSTRACT 3. Determine the exact transition points from a given segment In this paper, we present MedleyAssistant, a system to assist to the following segment, crop the segments accordingly, in the creation of music medleys from segments of existing and use a crossfade to stitch the segments together. This music. Our goal is to make medley creation more accessible to step is crucial for a seamless transition between segments. novices, while still allowing them to express their own creative 4. Adjust tempos or keys when necessary by using traditional style. Our system addresses two key challenges in medley cre- audio editing tools. ation – determining which segments of music sound natural when transitioning to which other segments of music, and Steps 2 and step 4 above require a keen musical ear or a determining specific transition points between two given seg- background in music theory. Step 3 requires a certain amount ments of music. This constrains the problem so that medleys of skill in audio editing. All three of these steps can be quite created with our system tend to sound natural while allowing tedious. Step 1, however requires less of a prior background the user to be creative with music selection. We also provide in music and audio editing and can simply be based on music a music visualization that helps users understand the musical preference. principles of medley creation. We present MedleyAssistant, a system to help people easily ACM Classification Keywords create personalized music medleys with little or no background H.5.5. Sound and Music Computing : Methodologies and in music and audio editing. Our system assists in step 2 and Techniques; I.5.5. Implementation : Interactive systems automates step 3. It allows users to be creative with song selection and step 1. Moreover, it visualizes certain musical features to help guide users with these steps and can help Author Keywords them better understand the underlying musical principles. We music medley; creative MIR; personalized music creation. believe that this can make medley creation a more accessi- ble process for novices, and allow experts to speed up their INTRODUCTION AND MOTIVATION workflow. A music medley is a piece of music that is composed or ar- ranged from a series of songs or musical segments. In a high RELATED WORK quality medley, each segment tends to naturally flow into the Recent advances in Music Information Retrieval (MIR) tech- next segment, and the transitions typically sound seamless. niques have given rise to intelligent musical interfaces [3, 9], Medleys provide a way to create new variations of music making certain aspects of music creation more accessible to starting from existing music. They can be used for music non-experts. This includes applications such as an automatic playback by itself or as backing tracks for media such as videos DJ [4], a song mixing tool [2], an automatic mashup system and video games. They provide a way to customize a piece [1], and a loop creation system [11]. All of these applications of music so that different sections of the media correspond to help reuse parts of existing music to create new music. different segments of music. To the best of our knowledge, the work that is most closely related to our proposed medley creation system is Music Cut and Paste [5, 6], a personalized music-cut-and-paste system, which is also used to create medleys. The key difference is that this system only allows users to specify the sequence of segments in terms of vocal and instrumental sections, whereas our system provides the user with significantly more flexibility in terms of choosing and adjusting segments. Copying permitted for private and academic purposes. MILC ’18, March 11, 2018, Tokyo, Japan SYSTEM OVERVIEW In this section, we describe the workflow and interface of MedleyAssistant. A user creates a medley using our system as follows: 1. The user provides a number of candidate songs based on their preference in music. 2. Our system provides a visualization of these songs as shown in Figure 1. Specifically, it visualizes the chord structure and tempo. This visualization could help guide the user in the subsequent steps. 3. The user chooses the first segment of the medley based on the music that they would like to be the introduction. This forms the foundation of the medley because it in turn dictates subsequent segments. The user can choose this first segment based on listening as well as using the visualization as a guide. The beginning and end of the selection snaps to the nearest beat. 4. Based on the first segment, the system assists in choosing the second segment. It estimates multiple potential candi- dates for the second segment (over all songs) and highlights them, as shown in Figure 2. The opacity of each gray box indicates a confidence level of how good the segment will Figure 1. MedleyAssistant Interface. For each song, we visualize chords sound. The system attempts to choose candidate segments (as colored blocks on the waveform) and tempo (as a color density bar that will musically flow well from the first segment based on below the waveform). chord structure, tempo, and timbre. The chosen segments will be 16 beats or less. We use a relatively small size so that the user can adjust the length to whatever they desire in the next step. The user can choose to use one of these so we have a total of 24 chords. The color scheme we used segments, and the system will then perform the low level in visualizing each chord is inspired by Alexander Scriabin’s edits to concatenate the two segments. However, the user “Clavier à lumières" (“keyboard with lights") [10], a keyboard can alternatively choose any part of any song (even if it is instrument with colors assigned to different keys. For example, not highlighted) and use that as the second segment. we use intense red for C and orange for G. We utilize this 5. After choosing a segment, the user can drag the segment color scheme because similar color produces a desirable chord ending boundary to adjust the length of the segment. progression, such as the proximity in RGB color value of the chord C and G. The color mapping of the chords are shown in 6. The user chooses the next segment as outlined in the previ- a chord visualization colormap in Figure 1. ous two steps and continues this process until the medley is We visualize tempo below the waveform, with a bar represent- complete. ing rhythmic density of the music as a function of time. For The interface in our system is realized as a web application the rhythmic feature, we first extract the dynamic tempos of using wavesurfer1 , an API built on top of Web Audio API2 the song, and then group the dynamic tempos into different and HTML5 Canvas. sections. We use color interpolation to indicate the tempo density (i.e: deeper color for a faster tempo). Visualization Our system demonstrates the visualization of two features As shown in Figure 1, our system visualizes both the chord (chord structure and tempo), but this can be extended to various structure as well as the tempo for each song. This visualization other features. We think that different kinds of musical features provides a tool for users to better understand the segments could be used to assist in different forms of medley creation highlighted by the system, helps users choose new segments styles. that are not highlighted, and can serve as an educational tool to better understand the music theoretical principals of medley creation. ALGORITHMS We visualize the chord structure of each song by overlaying In this section, we describe the algorithms that we use for the waveform on colored blocks where each color corresponds visualization and automatic segment selection (as described in to a different chord. We use only Major and Minor chords, steps 2 and 4 of the workflow in the previous section). 1 We start with a pre-processing step in which we extract fea- 2 tures at each beat of each song. These features are used both Host. These estimated chords are used in the visualization under the waveform. Acoustic Smoothness Given the query segment, S1 of length N, and the potential candidate segment S2 of length M = 16, our goal is to compute the optimal transition point from S1 to S2 . We do this by computing a cost function between four-beat windows sliding over both S1 and S2 . We define the optimal transition point by the location of the sliding windows with the lowest cost, which we refer to as the highest acoustic smoothness. We consider four beat sliding windows starting from beat 14 N to beat N − 3 of S1 , and beat 1 to beat 34 M of S2 . We do not consider the beginning of S1 and the ending of S2 in order to ensure that the resulting combination of S1 and S2 retain at least a part of each segment. Our cost function over a four-beat sliding window of S1 to S2 is: 3 Figure 2. When a query segment (the gray section in the first song) is C(S1i , S2 j ) = α ∑ Dc (CS1 [i + k],CS2 [ j + k]) selected, a menu with a segment selection option appear at the bottom of k=0 the song such that the user can add the segment into the medley editor (bottom row). The system then calculates and suggests the user potential ∑3 Dm (MS1 [i + k], MS2 [ j + k]) next segments (the gray boxes in the second and the third song.) + β k=0 σm (1) Dr (RS1 [i − 4 : i − 1], RS2 [ j : j + 3]) +γ σr in the visualization and computation of the cost function de- Dt (S1 , S2 ) scribed below. +δ σt Given a specific segment of the medley, which we refer to as Where Dc denotes the cosine distance of the chroma features, the query segment, the goal of our algorithm is to estimate Dm denotes the Euclidean distance of the timbre features, Dr potential candidates for the next segment. We select the next denotes the difference of root mean square energy, and Dt candidate segment based on specific criteria as follows: denotes the tempo difference. α, β , γ, and δ are tuning factors, 1. Acoustically smooth and seamless transitions between con- whereas σm , σr , and σt represents the standard deviations. secutive segments. We compute this smoothness based on Figure 3 illustrates the computation of the cost function. a cost function between the query segment and every other We compute our tuning factors apriori as follows. The goal is segment in every song based on a sliding window. to define the tuning factors that help indicate acoustic smooth- ness. By definition, the transitions between consecutive seg- 2. Consecutive segments should conform to a harmonic pro- ments of a given song are maximally smooth. Therefore we gression specified by music theory rules. Based on the compute the cost function between all consecutive segments of acoustically smooth candidates (as determined by the previ- a number of songs using a number of different combinations ous step), we select a subset of candidate segments based of tuning factors (each tuning factor can vary from 0 to 1.0). on a harmonic progression factor. We choose the combination of tuning factors that on average yields the lowest cost. Feature Extraction When we compute the cost function for segments S1 and S2 , We first estimate beat locations in each song by applying we choose the optimal transition point between S1 and S2 beat tracking on the onset envelope of the audio signal. At based on the i and j that yield the lowest cost. 