=Paper=
{{Paper
|id=Vol-3124/paper10
|storemode=property
|title=Performing with a Generative Electronic Music Controller
|pdfUrl=https://ceur-ws.org/Vol-3124/paper10.pdf
|volume=Vol-3124
|authors=Charles Patrick Martin
|dblpUrl=https://dblp.org/rec/conf/iui/Martin22
}}
==Performing with a Generative Electronic Music Controller==
Performing with a Generative Electronic Music Controller Charles Patrick Martin1 1 The Australian National University, Canberra, Australia Abstract Generative electronic music is, by and large, old news; however, despite ever more convincing composition systems, less progress has been made in systems for live performance with a generative model. One limitation has been the focus on symbolic music, an imperfect representation for musical gesture, another has been the lack of interactive explorations of co-creative musical systems with modern machine learning techniques. In this work these limitations are addressed through the study of a co-creative interactive music system that applies generative AI to gestures on an electronic music controller, not to creating traditional musical notes. The controller features eight rotational controls with visual feedback and is typical of interfaces used for electronic music performance and production. The sound and interaction design of the system suggest new techniques for adopting co-creation in generative music systems and a discussion of live performances experiences put these techniques into practical context. Keywords interactive music system, mixture density recurrent neural network, performance rule-based music generators are more common. Two is- sues facing musical performance with generative AI are that such systems tend to generate symbolic music, ignor- ing the gestural and non-note-based aspects of present electronic music performance, and that co-creative inter- actions for musical AI systems have not been explored to the same extent as the generative models. To address these issues, different types of musical models must be explored, and the interactions between performers and generative models must be considered as a first-class problem. Figure 1: Performing with the generative electronic music In this work, a somewhat different kind of musical controller. The Behringer XTouch Mini (lower centre) is the AI system is presented: a physical electronic music con- main musical interface for this system while the laptop screen troller with eight knobs allowing direct control over a shows the synthesiser and generative system state. In the synthesiser program is backed by an AI system that at- performance, both the performer and generative system have tempts to continue interactive gestures from the human control over the eight knobs of the controller. A performance performer using the predictions an artificial neural net- video is available at: https://youtu.be/upHSIpiGYVg work. This system explores an approach to embodied co-creation, where interactive gestures, rather than mu- sical notes, are generated and collaboration of performer 1. Introduction and generative model are expressed through live perfor- mance1 . Generative music is well-established as a component of Rather than a note-driven aesthetic, the musical con- contemporary composition, with proponents in the ex- text is improvised electronic sound with gestural control perimental music scenes of the mid-20th Century among over synthesis parameters. The neural network has been other earlier examples [1]. Current explorations of deep trained on this gestural performance data, collected from neural networks for generating music [e.g., 2, 3, 4] are the controller during rehearsals and performances, to enjoying success in terms of convincing output, but, per- predict the next interaction, both in terms of quantity of haps, not in terms of application where much simpler controller movement, and the amount of time before this movement should occur. This sets this work apart from Joint Proceedings of the ACM IUI Workshops 2022, March 2022, other interactive generative music systems related to mu- Helsinki, Finland sic production [5] and MIDI-note performance [6] as well " charles.martin@anu.edu.au (C. P. Martin) as non-neural-network systems such as Continuator [7], ~ https://charlesmartin.com.au (C. P. Martin) 0000-0001-5683-7529 (C. P. Martin) © 2022 Copyright for this paper by its authors. Use permitted under Creative 1 Commons License Attribution 4.0 International (CC BY 4.0). A video of this system in performance can be found at https: CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) //youtu.be/upHSIpiGYVg or Voyager [8] that generate MIDI notes. While gestural strict time requirements for an interactive application. predictions have been studied in a minimal musical in- This ML model learns to reproduce how a human plays strument [9], this work involves a more complete musical a musical instrument in terms of physical movements interface capable of driving a complete performance. rather than what notes should come next. As a result, Throughout performance with this system, the neural this musical ML configuration could be termed embod- network can take control of the interface, continuing the ied musical prediction. This style of musical ML is ideal performer’s actions, transforming them into a “predicted for application in a live electronic performance system, reality”, or overriding the performer in real-time. The where embodied musical gestures with a new interface performer can see these actions represented visually on are often more important that traditional musical nota- the controller interface and must tune their inputs to tion. guide the neural network towards musically acceptable behaviours. The goal is to set up a feedback loop between human and generative neural network model where the 3. Sound and Interaction Design process of co-creation leads to transformed interactive The synthesised sounds are created by eight sound gener- experiences [10]. ators, each operated by one knob of the controller. Two This work is part of an ongoing process of artistic re- sound options are available, a sine-tone oscillator and a search studying how a ML model might evolve over time looped sample player (granular synthesiser), these can be as part of a computer music practice. Over the develop- switched by clicking a knob. Turning each knob changes ment of this work, the ML model has been re-trained as the main parameter of each sound generator, these are more training data has been collected. The affordances the oscillator pitch or looped sample section depending of the neural network change (sometimes dramatically) on which sound option is selected. when it is re-trained with more or different data. This Each sound generator has volume set to zero (silence) changes the possible interaction between performer and by default, but changing the main parameter triggers a instrument and demands negotiation and improvisation short volume envelope (a note). The buttons below each from the performer in each performance to learn and knob allow additional control over each generator’s vol- exploit new behaviours. The instrument itself is an ex- ume: the top button triggers the short envelope without periment in co-creation. Through it, this work highlights changing the main knob and the bottom button turns the the tension between the machine learning algorithm’s sound on continuously. role as a component within a musical instrument, and as The eight sound generators are mixed together and a distinct agent that shares musical control with a human sent through distortion and reverb effects which can be performer. controlled through the computer interface. The large slider controls the main volume allowing the performer 2. Generative AI System to start and end the performance. The sound design and MIDI interfacing with the XTouch Mini is implemented This system uses a mixture density recurrent neural net- in Pure Data which runs on the performer’s laptop. work (MDRNN) within the context of a live computer The knobs controlling the synthesis tuning parameter music performance. This algorithm is a variant of the are the main focus of the performance and it is this part deep neural networks often used to compose text or sym- of the system that is controlled by both the performer bolic music but allows learning and creative generation and generative AI system. The LED indicators on each of continuous data such as synthesiser control signals, knob show the latest update to the parameter, either from and absolute time values. the performer turning the knob or the generative system The generative aspects of this system use the Inter- adjusting it in software. active Musical Prediction System (IMPS) [11] which im- The IMPS system is set to function in a call-and-response plements the MDRNN in Python. In this context, the manner. When the performer is adjusting the knobs, their MDRNN is configured with two 32-unit LSTM layers changes are driven through the MDRNN to update its and an MDN layer that outputs the parameters of a 9- internal state but predictions are discarded. When the dimensional Gaussian mixture model: one dimension performer stops for two seconds, the IMPS system takes for each knob on the controller and one dimension for control of parameter changes, generating predictions for the number of seconds in the future that this interaction the parameters continually from where the performer should occur. The input to the MDRNN is, similarly, a left off and updating the eight synthesis parameters in 9D vector of the location of each knob and the time since real time. The generative system’s changes are displayed the previous interaction. Although this is a tiny model on the LED rings on the control interface as well as on by comparison with other deep learning models, it is ap- the computer screen. The performer has control of the propriate given the size of the dataset involved and the diversity controls (prediction temperature) allowing a Figure 2: The XTouch Mini MIDI interface used in this performance system. Each column of controls is mapped to a separate sound generator. Both the performer and generative system can adjust the parameter knobs. The performer has access to other controls to steer the performance. 4. Performance Experiences and Conclusions This system has been deployed in live performances since 2019. These experiences demonstrate that the generative system works and makes a practical contribution to the performances in terms of creating plausible adjustments to the synthesis parameters. A deeper question then is whether the MDRNN generative system offers a level of co-creative engagement above what could be offered, for instance, by a simpler random-walk generator. From the Figure 3: The computer screen view during performance experience of these live performances, it does seem that showing the state of each control knob from the performer the generator can be influenced simply through the style and generative AI system. The RNN system runs in a termi- of adjustments that the performer is making (e.g., it tends nal window on the right. This screen is shown to the audience to continue adjusting the knobs that the performer pre- during performance. viously was using). Different behaviours in between the eight knobs, e.g., adjusting just one, changing multiple, pausing in-between adjustments or making continual degree of influence over generated material. changes, appear in the generator’s changes. These be- While “call-and-response” might suggest that the per- haviours appear “for free” with the MDRNN, that is, they former can do nothing while the generative system is are learned from the dataset, whereas they would need operating, in fact, this setup allows the performer to ad- to be encoded into a rule-based generator manually. just other aspects of the performance; for instance, the Whenever the system is used, either in rehearsal or buttons changing the envelope state, the sound generator performance, the performer’s interactions are captured type as well as the computer-based controls for effects. to continue building a set of gestural control data for In this type of performance, it is advantageous to allow the XTouch Mini controller. As the system is retrained a degree of generative change to one part of the musical with new data, it “learns” more behaviours, just as the system to continue while focusing on other parts. performer adjusts their style in between performances. In this way, this system could be said to be co-adaptive [12], although this is yet to be studied in a rigorous way. From the experience of working with this system, it can be reported that features such as the buttons controlling synthesiser envelopes were added in order to give the 28th ACM International Conference on Multime- performer control over the sound while allowing the dia, Association for Computing Machinery, New generative system to operate. Even though there is the York, NY, USA, 2020, p. 1180–1188. doi:10.1145/ potential for direct interplay between the performer and 3394171.3413671. generative system, it seems to be important to have some [5] A. Roberts, J. Engel, Y. Mann, J. Gillick, C. Kay- different roles to play, and to allow the performer to listen acik, S. Nørly, M. Dinculescu, C. Radebaugh, and interact without interrupting the generative model. C. Hawthorne, D. Eck, Magenta studio: Aug- From a practical perspective, this system has been suc- menting creativity with deep learning in Able- cessful in allowing complete performances in co-creation ton Live, in: Proceedings of the International with a generative AI music system. The generative sys- Workshop on Musical Metacreation (MUME), 2019. tem acts as a predictive model for control gestures and is URL: http://musicalmetacreation.org/buddydrive/ clever enough to enable interaction and steering from the file/mume_2019_paper_2/. performer using only their own performance gestures. [6] T. R. Næss, C. P. Martin, A physical intelligent Higher level behaviours, such as long-term structure of instrument using recurrent neural networks, in: the performance are not learned by the model but need M. Queiroz, A. X. Sedó (Eds.), Proceedings of the to be controlled manually by the performer. While this International Conference on New Interfaces for could be said to be limiting, when compared to similar Musical Expression, NIME ’19, UFRGS, Porto Ale- non-generative system, the performer in this case can gre, Brazil, 2019, pp. 79–82. doi:10.5281/zenodo. switch to handling high-level changes while control over 3672874. the synthesis parameters is seamlessly continued by the [7] F. Pachet, The continuator: Musical interaction generative system. with style, Journal of New Music Research 32 (2003) This research has described a generative electronic 333–341. doi:10.1076/jnmr.32.3.333.16861. music controller for co-creative performance. This sys- [8] G. E. Lewis, Too many notes: Computers, com- tem fits within the idiom of improvised electronic music plexity and culture in “Voyager”, Leonardo performance and shows how a machine learning model Music Journal 10 (2000) 33–39. doi:10.1162/ for control gesture prediction can be applied in a typi- 096112100570585. cal electronic music controller allowing a very different [9] C. P. Martin, K. Glette, T. F. Nygaard, J. Torresen, style of music generation to symbolic music generation Understanding musical predictions with an embod- systems. Many other electronic music designs would be ied interface for musical machine learning, Fron- possible within this style of interaction, and we see this tiers in Artificial Intelligence 3 (2020) 6. doi:10. work as part of developing an orchestra of co-creative 3389/frai.2020.00006. musical instruments that interrogate how modern music [10] S. Jones, Cybernetics in society and art, in: Pro- generation and music interaction can be applied together. ceedings of the 19th International Symposium of Electronic Art, ISEA2013, ISEA International; Aus- Acknowledgments tralian Network for Art & Technology; University of Sydney, 2013-01-01. URL: http://hdl.handle.net/ The Titan V GPU used in this work was provided by 2123/9863. NVIDIA Corporation. [11] C. P. Martin, J. Torresen, An interactive musi- cal prediction system with mixture density recur- References rent neural networks, in: M. Queiroz, A. X. Sedó (Eds.), Proceedings of the International Conference [1] C. Ames, Automated composition in retrospect: on New Interfaces for Musical Expression, NIME 1956-1986, Leonardo 20 (1987) 169–185. doi:10. ’19, UFRGS, Porto Alegre, Brazil, 2019, pp. 260–265. 2307/1578334. doi:10.5281/zenodo.3672952. [2] C.-Z. A. Huang, A. Vaswani, J. Uszkoreit, N. Shazeer, [12] W. Mackay, Responding to cognitive overload: Co- I. Simon, C. Hawthorne, A. M. Dai, M. D. Hoffman, adaptation between users and technology, Intellec- M. Dinculescu, D. Eck, Music transformer: Gener- tica 30 (2000) 177–193. ating music with long-term structure, in: Proc. of ICLR ’19, 2019. arXiv:1809.04281. [3] C. J. Carr, Z. Zukowski, Generating albums with samplernn to imitate metal, rock, and punk bands, arXiV Preprint, 2018. arXiv:1811.06633. [4] Y.-S. Huang, Y.-H. Yang, Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions, in: Proceedings of the