MedleyAssistant – A system for personalized music medley creation Zhengshan Shi Gautham J. Mysore CCRMA, Stanford University Adobe Research Stanford, USA San Francisco, USA kittyshi@ccrma.stanford.edu gmysore@adobe.com ABSTRACT 3. Determine the exact transition points from a given segment In this paper, we present MedleyAssistant, a system to assist to the following segment, crop the segments accordingly, in the creation of music medleys from segments of existing and use a crossfade to stitch the segments together. This music. Our goal is to make medley creation more accessible to step is crucial for a seamless transition between segments. novices, while still allowing them to express their own creative 4. Adjust tempos or keys when necessary by using traditional style. Our system addresses two key challenges in medley cre- audio editing tools. ation – determining which segments of music sound natural when transitioning to which other segments of music, and Steps 2 and step 4 above require a keen musical ear or a determining specific transition points between two given seg- background in music theory. Step 3 requires a certain amount ments of music. This constrains the problem so that medleys of skill in audio editing. All three of these steps can be quite created with our system tend to sound natural while allowing tedious. Step 1, however requires less of a prior background the user to be creative with music selection. We also provide in music and audio editing and can simply be based on music a music visualization that helps users understand the musical preference. principles of medley creation. We present MedleyAssistant, a system to help people easily ACM Classification Keywords create personalized music medleys with little or no background H.5.5. Sound and Music Computing : Methodologies and in music and audio editing. Our system assists in step 2 and Techniques; I.5.5. Implementation : Interactive systems automates step 3. It allows users to be creative with song selection and step 1. Moreover, it visualizes certain musical features to help guide users with these steps and can help Author Keywords them better understand the underlying musical principles. We music medley; creative MIR; personalized music creation. believe that this can make medley creation a more accessi- ble process for novices, and allow experts to speed up their INTRODUCTION AND MOTIVATION workflow. A music medley is a piece of music that is composed or ar- ranged from a series of songs or musical segments. In a high RELATED WORK quality medley, each segment tends to naturally flow into the Recent advances in Music Information Retrieval (MIR) tech- next segment, and the transitions typically sound seamless. niques have given rise to intelligent musical interfaces [3, 9], Medleys provide a way to create new variations of music making certain aspects of music creation more accessible to starting from existing music. They can be used for music non-experts. This includes applications such as an automatic playback by itself or as backing tracks for media such as videos DJ [4], a song mixing tool [2], an automatic mashup system and video games. They provide a way to customize a piece [1], and a loop creation system [11]. All of these applications of music so that different sections of the media correspond to help reuse parts of existing music to create new music. different segments of music. To the best of our knowledge, the work that is most closely Manual creation of high quality medleys can be a challenging related to our proposed medley creation system is Music Cut task and typically requires a background in music and audio and Paste [5, 6], a personalized music-cut-and-paste system, editing. They are often created by musicians and DJs. The which is also used to create medleys. The key difference is typical sequence of steps to create a medley is as follows: that this system only allows users to specify the sequence of segments in terms of vocal and instrumental sections, whereas 1. Select a number of candidate musical segments from various our system provides the user with significantly more flexibility pieces of music. in terms of choosing and adjusting segments. 2. Determine a musically natural sequence of segments for the proposed medley from the above candidate set of segments. ©2018. Copyright for the individual papers remains with the authors. Copying permitted for private and academic purposes. MILC ’18, March 11, 2018, Tokyo, Japan SYSTEM OVERVIEW In this section, we describe the workflow and interface of MedleyAssistant. A user creates a medley using our system as follows: 1. The user provides a number of candidate songs based on their preference in music. 2. Our system provides a visualization of these songs as shown in Figure 1. Specifically, it visualizes the chord structure and tempo. This visualization could help guide the user in the subsequent steps. 3. The user chooses the first segment of the medley based on the music that they would like to be the introduction. This forms the foundation of the medley because it in turn dictates subsequent segments. The user can choose this first segment based on listening as well as using the visualization as a guide. The beginning and end of the selection snaps to the nearest beat. 4. Based on the first segment, the system assists in choosing the second segment. It estimates multiple potential candi- dates for the second segment (over all songs) and highlights them, as shown in Figure 2. The opacity of each gray box indicates a confidence level of how good the segment will Figure 1. MedleyAssistant Interface. For each song, we visualize chords sound. The system attempts to choose candidate segments (as colored blocks on the waveform) and tempo (as a color density bar that will musically flow well from the first segment based on below the waveform). chord structure, tempo, and timbre. The chosen segments will be 16 beats or less. We use a relatively small size so that the user can adjust the length to whatever they desire in the next step. The user can choose to use one of these so we have a total of 24 chords. The color scheme we used segments, and the system will then perform the low level in visualizing each chord is inspired by Alexander Scriabin’s edits to concatenate the two segments. However, the user “Clavier à lumières" (“keyboard with lights") [10], a keyboard can alternatively choose any part of any song (even if it is instrument with colors assigned to different keys. For example, not highlighted) and use that as the second segment. we use intense red for C and orange for G. We utilize this 5. After choosing a segment, the user can drag the segment color scheme because similar color produces a desirable chord ending boundary to adjust the length of the segment. progression, such as the proximity in RGB color value of the chord C and G. The color mapping of the chords are shown in 6. The user chooses the next segment as outlined in the previ- a chord visualization colormap in Figure 1. ous two steps and continues this process until the medley is We visualize tempo below the waveform, with a bar represent- complete. ing rhythmic density of the music as a function of time. For The interface in our system is realized as a web application the rhythmic feature, we first extract the dynamic tempos of using wavesurfer1 , an API built on top of Web Audio API2 the song, and then group the dynamic tempos into different and HTML5 Canvas. sections. We use color interpolation to indicate the tempo density (i.e: deeper color for a faster tempo). Visualization Our system demonstrates the visualization of two features As shown in Figure 1, our system visualizes both the chord (chord structure and tempo), but this can be extended to various structure as well as the tempo for each song. This visualization other features. We think that different kinds of musical features provides a tool for users to better understand the segments could be used to assist in different forms of medley creation highlighted by the system, helps users choose new segments styles. that are not highlighted, and can serve as an educational tool to better understand the music theoretical principals of medley creation. ALGORITHMS We visualize the chord structure of each song by overlaying In this section, we describe the algorithms that we use for the waveform on colored blocks where each color corresponds visualization and automatic segment selection (as described in to a different chord. We use only Major and Minor chords, steps 2 and 4 of the workflow in the previous section). 1 https://wavesurfer-js.org/ We start with a pre-processing step in which we extract fea- 2 https://www.w3.org/TR/webaudio/ tures at each beat of each song. These features are used both Host. These estimated chords are used in the visualization under the waveform. Acoustic Smoothness Given the query segment, S1 of length N, and the potential candidate segment S2 of length M = 16, our goal is to compute the optimal transition point from S1 to S2 . We do this by computing a cost function between four-beat windows sliding over both S1 and S2 . We define the optimal transition point by the location of the sliding windows with the lowest cost, which we refer to as the highest acoustic smoothness. We consider four beat sliding windows starting from beat 14 N to beat N − 3 of S1 , and beat 1 to beat 34 M of S2 . We do not consider the beginning of S1 and the ending of S2 in order to ensure that the resulting combination of S1 and S2 retain at least a part of each segment. Our cost function over a four-beat sliding window of S1 to S2 is: 3 Figure 2. When a query segment (the gray section in the first song) is C(S1i , S2 j ) = α ∑ Dc (CS1 [i + k],CS2 [ j + k]) selected, a menu with a segment selection option appear at the bottom of k=0 the song such that the user can add the segment into the medley editor (bottom row). The system then calculates and suggests the user potential ∑3 Dm (MS1 [i + k], MS2 [ j + k]) next segments (the gray boxes in the second and the third song.) + β k=0 σm (1) Dr (RS1 [i − 4 : i − 1], RS2 [ j : j + 3]) +γ σr in the visualization and computation of the cost function de- Dt (S1 , S2 ) scribed below. +δ σt Given a specific segment of the medley, which we refer to as Where Dc denotes the cosine distance of the chroma features, the query segment, the goal of our algorithm is to estimate Dm denotes the Euclidean distance of the timbre features, Dr potential candidates for the next segment. We select the next denotes the difference of root mean square energy, and Dt candidate segment based on specific criteria as follows: denotes the tempo difference. α, β , γ, and δ are tuning factors, 1. Acoustically smooth and seamless transitions between con- whereas σm , σr , and σt represents the standard deviations. secutive segments. We compute this smoothness based on Figure 3 illustrates the computation of the cost function. a cost function between the query segment and every other We compute our tuning factors apriori as follows. The goal is segment in every song based on a sliding window. to define the tuning factors that help indicate acoustic smooth- ness. By definition, the transitions between consecutive seg- 2. Consecutive segments should conform to a harmonic pro- ments of a given song are maximally smooth. Therefore we gression specified by music theory rules. Based on the compute the cost function between all consecutive segments of acoustically smooth candidates (as determined by the previ- a number of songs using a number of different combinations ous step), we select a subset of candidate segments based of tuning factors (each tuning factor can vary from 0 to 1.0). on a harmonic progression factor. We choose the combination of tuning factors that on average yields the lowest cost. Feature Extraction When we compute the cost function for segments S1 and S2 , We first estimate beat locations in each song by applying we choose the optimal transition point between S1 and S2 beat tracking on the onset envelope of the audio signal. At based on the i and j that yield the lowest cost. Given the every beat of every song, we compute timbral features (Mel- optimal i and j, the optimal transition between the segments frequency cepstrum coefficients), signal energy level (root is to go from beat i − 1 of S1 to beat j of S2 . mean square), and harmonic features (chroma vectors). We also compute tempo over a window around each beat. We The query segment S1 has a cost with respect to each segment compute these features using librosa [8]. Additionally, we S2 (associated with the optimal transition point) of each song. estimate the chords in each song using the chordino plugin We choose all of the segments S2 that have a cost under a based on NNLS Chroma [7]. We compute this in the VamPy threshold as candidate segments for the next step. REFERENCES 1. Matthew EP Davies, Philippe Hamel, Kazuyoshi Yoshii, and Masataka Goto. 2014. AutoMashUpper: Automatic creation of multi-song music mashups. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, 12 (2014), 1726–1737. 2. Tatsunori Hirai, Hironori Doi, and Shigeo Morishima. 2015. MusicMixer: Computer-aided DJ system based on an automatic song mixing. In Proceedings of the 12th International Conference on Advances in Computer Entertainment Technology. ACM, 41. 3. Eric J Humphrey, Douglas Turnbull, and Tom Collins. 2013. A brief review of creative MIR. In Proceedings of the International Conference on Music Information Figure 3. Illustration of the computation of the cost function over a four- Retrieval, Late Breaking Demo. beat sliding window (as in dashed rectangle) between beat i in segment S1 , and beat j in segment S2 . 4. Hiromi Ishizaki, Keiichiro Hoashi, and Yasuhiro Takishima. 2009. Full-Automatic DJ mixing system with optimal tempo adjustment based on measurement Harmonic Progression Factor function of user discomfort. In ISMIR. 135–140. After obtaining a set of candidate segments with acoustically 5. Yin-Tzu Lin, I-Ting Liu, Jyh-Shing Roger Jang, and smooth transition points from S1 , we determine which of these Ja-Ling Wu. 2015. Audio musical dice game: A candidates yield a music theoretically valid harmonic progres- user-preference-aware medley generating system. sion when transitioning from S1 . TOMCCAP 11 (2015), 52:1–52:24. Given candidate beat b1 in S1 that connects to beat b2 in S2 , we 6. I-Ting Liu, Yin-Tzu Lin, and Ja-Ling Wu. 2013. Music analyze the four-beat harmonic progression from S1 [b1 − 1 : Cut and Paste: A personalized musical medley generating b1 ] to S2 [b2 : b2 + 1], namely the last two beats of the transition system. In ISMIR. 463–468. point in S1 and the first two beats in transition point in S2 , as well as the two-beat harmonic progression from S1 [b1 ] to 7. Matthias Mauch and Simon Dixon. 2010. Approximate S2 [b2 ]. We assign a score Qb1 ,b2 as: Note Transcription for the Improved Identification of Difficult Chords. In ISMIR. 135–140. Qb1 ,b2 = P5th (S1 [b1 ], S2 [b2 ]) (2) 8. Brian McFee, Colin Raffel, Dawen Liang, Daniel PW + Ppop (S1 [b1 − 1 : b1 ], S2 [b2 : b2 + 1]) Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. Where P5th is the chord transition probability from the last 2015. librosa: Audio and music signal analysis in python. beat of the transition point in S1 to the first beat of the transition In Proceedings of the 14th Python in Science Conference. point in S2 based on a circle of fifth, and Ppop is the four-beat 18–25. harmonic transition probability from the last two beats of the 9. Markus Schedl, Emilia Gómez, Julián Urbano, and others. transition point in S1 to the first two beats of the transition 2014. Music information retrieval: Recent developments point in S2 , trained from the SALAMI dataset [12]. and applications. Foundations and Trends® in We choose all segments with a score Qb1 ,b2 above a threshold Information Retrieval 8, 2-3 (2014), 127–261. as candidate segments that are displayed as gray boxes in the 10. Aleksandr Scriabin and Leonid Sabaneev. 1913. interface, as shown in Figure 2. The confidence value of a Prométhée, le poème du feu pour grand orchestre et piano given segment is based on the score Qb1 ,b2 of that segment and avec orgue, choeurs et clavier à lumières. Op. 60. mapped to an opacity value of the corresponding gray box in Transcription pour 2 pianos à quatre mains par L. the interface. Sabaneiew. (1913). CONCLUSION 11. Zhengshan Shi and Gautham J Mysore. 2018. We present medleyAssistant, an interactive music medley cre- LoopMaker: Automatic creation of music loops for ation system that enables users to create personalized music pre-recorded music. In Proceedings of the SIGCHI medleys. Our informal pilot study showed that our interface Conference on Human Factors in Computing Systems. makes medley creation significantly easier for novices. We ACM. believe that it could be a useful tool for experts as well as it 12. Jordan Bennett Louis Smith, John Ashley Burgoyne, could help them create medleys more quickly. Ichiro Fujinaga, David De Roure, and J Stephen Downie. 2011. Design and creation of a large-scale database of structural annotations.. In ISMIR, Vol. 11. 555–560.