-

Performing with a Generative Electronic Music Controller

Charles Patrick Martin

0 0 The Australian National University , Canberra , Australia

Generative electronic music is, by and large, old news; however, despite ever more convincing composition systems, less progress has been made in systems for live performance with a generative model. One limitation has been the focus on symbolic music, an imperfect representation for musical gesture, another has been the lack of interactive explorations of co-creative musical systems with modern machine learning techniques. In this work these limitations are addressed through the study of a co-creative interactive music system that applies generative AI to gestures on an electronic music controller, not to creating traditional musical notes. The controller features eight rotational controls with visual feedback and is typical of interfaces used for electronic music performance and production. The sound and interaction design of the system suggest new techniques for adopting co-creation in generative music systems and a discussion of live performances experiences put these techniques into practical context.

eol>interactive music system mixture density recurrent neural network performance

or Voyager [8] that generate MIDI notes. While gestural strict time requirements for an interactive application. predictions have been studied in a minimal musical in- This ML model learns to reproduce how a human plays strument [9], this work involves a more complete musical a musical instrument in terms of physical movements interface capable of driving a complete performance. rather than what notes should come next. As a result,

Throughout performance with this system, the neural this musical ML configuration could be termed embodnetwork can take control of the interface, continuing the ied musical prediction. This style of musical ML is ideal performer’s actions, transforming them into a “predicted for application in a live electronic performance system, reality”, or overriding the performer in real-time. The where embodied musical gestures with a new interface performer can see these actions represented visually on are often more important that traditional musical notathe controller interface and must tune their inputs to tion. guide the neural network towards musically acceptable behaviours. The goal is to set up a feedback loop between human and generative neural network model where the 3. Sound and Interaction Design process of co-creation leads to transformed interactive experiences [10]. The synthesised sounds are created by eight sound gener

This work is part of an ongoing process of artistic re- ators, each operated by one knob of the controller. Two search studying how a ML model might evolve over time sound options are available, a sine-tone oscillator and a as part of a computer music practice. Over the develop- looped sample player (granular synthesiser), these can be ment of this work, the ML model has been re-trained as switched by clicking a knob. Turning each knob changes more training data has been collected. The afordances the main parameter of each sound generator, these are of the neural network change (sometimes dramatically) the oscillator pitch or looped sample section depending when it is re-trained with more or diferent data. This on which sound option is selected. changes the possible interaction between performer and Each sound generator has volume set to zero (silence) instrument and demands negotiation and improvisation by default, but changing the main parameter triggers a from the performer in each performance to learn and short volume envelope (a note). The buttons below each exploit new behaviours. The instrument itself is an ex- knob allow additional control over each generator’s volperiment in co-creation. Through it, this work highlights ume: the top button triggers the short envelope without the tension between the machine learning algorithm’s changing the main knob and the bottom button turns the role as a component within a musical instrument, and as sound on continuously. a distinct agent that shares musical control with a human The eight sound generators are mixed together and performer. sent through distortion and reverb efects which can be controlled through the computer interface. The large slider controls the main volume allowing the performer 2. Generative AI System to start and end the performance. The sound design and MIDI interfacing with the XTouch Mini is implemented This system uses a mixture density recurrent neural net- in Pure Data which runs on the performer’s laptop. work (MDRNN) within the context of a live computer The knobs controlling the synthesis tuning parameter music performance. This algorithm is a variant of the are the main focus of the performance and it is this part deep neural networks often used to compose text or sym- of the system that is controlled by both the performer bolic music but allows learning and creative generation and generative AI system. The LED indicators on each of continuous data such as synthesiser control signals, knob show the latest update to the parameter, either from and absolute time values. the performer turning the knob or the generative system

The generative aspects of this system use the Inter- adjusting it in software. active Musical Prediction System (IMPS) [11] which im- The IMPS system is set to function in a call-and-response plements the MDRNN in Python. In this context, the manner. When the performer is adjusting the knobs, their MDRNN is configured with two 32-unit LSTM layers changes are driven through the MDRNN to update its and an MDN layer that outputs the parameters of a 9- internal state but predictions are discarded. When the dimensional Gaussian mixture model: one dimension performer stops for two seconds, the IMPS system takes for each knob on the controller and one dimension for control of parameter changes, generating predictions for the number of seconds in the future that this interaction the parameters continually from where the performer should occur. The input to the MDRNN is, similarly, a left of and updating the eight synthesis parameters in 9D vector of the location of each knob and the time since real time. The generative system’s changes are displayed the previous interaction. Although this is a tiny model on the LED rings on the control interface as well as on by comparison with other deep learning models, it is ap- the computer screen. The performer has control of the propriate given the size of the dataset involved and the diversity controls (prediction temperature) allowing a 4. Performance Experiences and

Conclusions This system has been deployed in live performances since 2019. These experiences demonstrate that the generative system works and makes a practical contribution to the performances in terms of creating plausible adjustments to the synthesis parameters. A deeper question then is whether the MDRNN generative system ofers a level of co-creative engagement above what could be ofered, for instance, by a simpler random-walk generator. From the Figure 3: The computer screen view during performance experience of these live performances, it does seem that showing the state of each control knob from the performer the generator can be influenced simply through the style and generative AI system. The RNN system runs in a termi- of adjustments that the performer is making (e.g., it tends nal window on the right. This screen is shown to the audience to continue adjusting the knobs that the performer preduring performance. viously was using). Diferent behaviours in between the eight knobs, e.g., adjusting just one, changing multiple, pausing in-between adjustments or making continual degree of influence over generated material. changes, appear in the generator’s changes. These be

While “call-and-response” might suggest that the per- haviours appear “for free” with the MDRNN, that is, they former can do nothing while the generative system is are learned from the dataset, whereas they would need operating, in fact, this setup allows the performer to ad- to be encoded into a rule-based generator manually. just other aspects of the performance; for instance, the Whenever the system is used, either in rehearsal or buttons changing the envelope state, the sound generator performance, the performer’s interactions are captured type as well as the computer-based controls for efects. to continue building a set of gestural control data for In this type of performance, it is advantageous to allow the XTouch Mini controller. As the system is retrained a degree of generative change to one part of the musical with new data, it “learns” more behaviours, just as the system to continue while focusing on other parts. performer adjusts their style in between performances. In this way, this system could be said to be co-adaptive [12], although this is yet to be studied in a rigorous way. From the experience of working with this system, it can be reported that features such as the buttons controlling Acknowledgments The Titan V GPU used in this work was provided by NVIDIA Corporation. synthesiser envelopes were added in order to give the performer control over the sound while allowing the generative system to operate. Even though there is the potential for direct interplay between the performer and generative system, it seems to be important to have some diferent roles to play, and to allow the performer to listen and interact without interrupting the generative model.

From a practical perspective, this system has been successful in allowing complete performances in co-creation with a generative AI music system. The generative system acts as a predictive model for control gestures and is clever enough to enable interaction and steering from the performer using only their own performance gestures.

Higher level behaviours, such as long-term structure of the performance are not learned by the model but need to be controlled manually by the performer. While this could be said to be limiting, when compared to similar non-generative system, the performer in this case can switch to handling high-level changes while control over the synthesis parameters is seamlessly continued by the generative system.

This research has described a generative electronic music controller for co-creative performance. This system fits within the idiom of improvised electronic music performance and shows how a machine learning model for control gesture prediction can be applied in a typical electronic music controller allowing a very diferent style of music generation to symbolic music generation systems. Many other electronic music designs would be possible within this style of interaction, and we see this work as part of developing an orchestra of co-creative musical instruments that interrogate how modern music generation and music interaction can be applied together. 28th ACM International Conference on Multimedia, Association for Computing Machinery, New York, NY, USA, 2020, p. 1180–1188. doi:10.1145/ 3394171.3413671. [5] A. Roberts, J. Engel, Y. Mann, J. Gillick, C. Kayacik, S. Nørly, M. Dinculescu, C. Radebaugh, C. Hawthorne, D. Eck, Magenta studio: Augmenting creativity with deep learning in Ableton Live, in: Proceedings of the International Workshop on Musical Metacreation (MUME), 2019.

URL: http://musicalmetacreation.org/buddydrive/ ifle/mume_2019_paper_2/. [6] T. R. Naess, C. P. Martin, A physical intelligent instrument using recurrent neural networks, in: M. Queiroz, A. X. Sedó (Eds.), Proceedings of the International Conference on New Interfaces for Musical Expression, NIME ’19, UFRGS, Porto Alegre, Brazil, 2019, pp. 79–82. doi:10.5281/zenodo.

3672874. [7] F. Pachet, The continuator: Musical interaction with style, Journal of New Music Research 32 (2003) 333–341. doi:10.1076/jnmr.32.3.333.16861. [8] G. E. Lewis, Too many notes: Computers, complexity and culture in “Voyager”, Leonardo Music Journal 10 (2000) 33–39. doi:10.1162/ 096112100570585. [9] C. P. Martin, K. Glette, T. F. Nygaard, J. Torresen,

Understanding musical predictions with an embodied interface for musical machine learning, Frontiers in Artificial Intelligence 3 (2020) 6. doi: 10.

3389/frai.2020.00006. [10] S. Jones, Cybernetics in society and art, in: Proceedings of the 19th International Symposium of Electronic Art, ISEA2013, ISEA International; Australian Network for Art & Technology; University of Sydney, 2013-01-01. URL: http://hdl.handle.net/ 2123/9863. [11] C. P. Martin, J. Torresen, An interactive musical prediction system with mixture density recurrent neural networks, in: M. Queiroz, A. X. Sedó (Eds.), Proceedings of the International Conference [1] C. Ames, Automated composition in retrospect: on New Interfaces for Musical Expression, NIME 1956-1986, Leonardo 20 (1987) 169–185. doi:10. ’19, UFRGS, Porto Alegre, Brazil, 2019, pp. 260–265. 2307/1578334. doi:10.5281/zenodo.3672952. [2] C.-Z. A. Huang, A. Vaswani, J. Uszkoreit, N. Shazeer, [12] W. Mackay, Responding to cognitive overload: CoI. Simon, C. Hawthorne, A. M. Dai, M. D. Hofman, adaptation between users and technology, IntellecM. Dinculescu, D. Eck, Music transformer: Gener- tica 30 (2000) 177–193. ating music with long-term structure, in: Proc. of

ICLR ’19, 2019. arXiv:1809.04281. [3] C. J. Carr, Z. Zukowski, Generating albums with samplernn to imitate metal, rock, and punk bands, arXiV Preprint, 2018. arXiv:1811.06633. [4] Y.-S. Huang, Y.-H. Yang, Pop music transformer:

Beat-based modeling and generation of expressive pop piano compositions, in: Proceedings of the