=Paper=
{{Paper
|id=Vol-3124/paper10
|storemode=property
|title=Performing with a Generative Electronic Music Controller
|pdfUrl=https://ceur-ws.org/Vol-3124/paper10.pdf
|volume=Vol-3124
|authors=Charles Patrick Martin
|dblpUrl=https://dblp.org/rec/conf/iui/Martin22
}}
==Performing with a Generative Electronic Music Controller==
<pdf width="1500px">https://ceur-ws.org/Vol-3124/paper10.pdf</pdf>
<pre>
Performing with a Generative Electronic Music Controller
Charles Patrick Martin1
1
     The Australian National University, Canberra, Australia


                                             Abstract
                                             Generative electronic music is, by and large, old news; however, despite ever more convincing composition systems, less
                                             progress has been made in systems for live performance with a generative model. One limitation has been the focus on
                                             symbolic music, an imperfect representation for musical gesture, another has been the lack of interactive explorations of
                                             co-creative musical systems with modern machine learning techniques. In this work these limitations are addressed through
                                             the study of a co-creative interactive music system that applies generative AI to gestures on an electronic music controller,
                                             not to creating traditional musical notes. The controller features eight rotational controls with visual feedback and is typical
                                             of interfaces used for electronic music performance and production. The sound and interaction design of the system suggest
                                             new techniques for adopting co-creation in generative music systems and a discussion of live performances experiences put
                                             these techniques into practical context.

                                             Keywords
                                             interactive music system, mixture density recurrent neural network, performance


                                                               rule-based music generators are more common. Two is-
                                                               sues facing musical performance with generative AI are
                                                               that such systems tend to generate symbolic music, ignor-
                                                               ing the gestural and non-note-based aspects of present
                                                               electronic music performance, and that co-creative inter-
                                                               actions for musical AI systems have not been explored
                                                               to the same extent as the generative models. To address
                                                               these issues, different types of musical models must be
                                                               explored, and the interactions between performers and
                                                               generative models must be considered as a first-class
                                                               problem.
Figure 1: Performing with the generative electronic music         In this work, a somewhat different kind of musical
controller. The Behringer XTouch Mini (lower centre) is the AI system is presented: a physical electronic music con-
main musical interface for this system while the laptop screen troller with eight knobs allowing direct control over a
shows the synthesiser and generative system state. In the synthesiser program is backed by an AI system that at-
performance, both the performer and generative system have tempts to continue interactive gestures from the human
control over the eight knobs of the controller. A performance performer using the predictions an artificial neural net-
video is available at: https://youtu.be/upHSIpiGYVg            work. This system explores an approach to embodied
                                                               co-creation, where interactive gestures, rather than mu-
                                                               sical notes, are generated and collaboration of performer
1. Introduction                                                and generative model are expressed through live perfor-
                                                               mance1 .
Generative music is well-established as a component of            Rather than a note-driven aesthetic, the musical con-
contemporary composition, with proponents in the ex- text is improvised electronic sound with gestural control
perimental music scenes of the mid-20th Century among over synthesis parameters. The neural network has been
other earlier examples [1]. Current explorations of deep trained on this gestural performance data, collected from
neural networks for generating music [e.g., 2, 3, 4] are the controller during rehearsals and performances, to
enjoying success in terms of convincing output, but, per- predict the next interaction, both in terms of quantity of
haps, not in terms of application where much simpler controller movement, and the amount of time before this
                                                               movement should occur. This sets this work apart from
Joint Proceedings of the ACM IUI Workshops 2022, March 2022,   other interactive generative music systems related to mu-
Helsinki, Finland                                              sic production [5] and MIDI-note performance [6] as well
" charles.martin@anu.edu.au (C. P. Martin)                     as non-neural-network systems such as Continuator [7],
~ https://charlesmartin.com.au (C. P. Martin)
 0000-0001-5683-7529 (C. P. Martin)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative       1
                                       Commons License Attribution 4.0 International (CC BY 4.0).                           A video of this system in performance can be found at https:
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)                                        //youtu.be/upHSIpiGYVg
or Voyager [8] that generate MIDI notes. While gestural      strict time requirements for an interactive application.
predictions have been studied in a minimal musical in-          This ML model learns to reproduce how a human plays
strument [9], this work involves a more complete musical     a musical instrument in terms of physical movements
interface capable of driving a complete performance.         rather than what notes should come next. As a result,
   Throughout performance with this system, the neural       this musical ML configuration could be termed embod-
network can take control of the interface, continuing the    ied musical prediction. This style of musical ML is ideal
performer’s actions, transforming them into a “predicted     for application in a live electronic performance system,
reality”, or overriding the performer in real-time. The      where embodied musical gestures with a new interface
performer can see these actions represented visually on      are often more important that traditional musical nota-
the controller interface and must tune their inputs to       tion.
guide the neural network towards musically acceptable
behaviours. The goal is to set up a feedback loop between
human and generative neural network model where the          3. Sound and Interaction Design
process of co-creation leads to transformed interactive
                                                             The synthesised sounds are created by eight sound gener-
experiences [10].
                                                             ators, each operated by one knob of the controller. Two
   This work is part of an ongoing process of artistic re-
                                                             sound options are available, a sine-tone oscillator and a
search studying how a ML model might evolve over time
                                                             looped sample player (granular synthesiser), these can be
as part of a computer music practice. Over the develop-
                                                             switched by clicking a knob. Turning each knob changes
ment of this work, the ML model has been re-trained as
                                                             the main parameter of each sound generator, these are
more training data has been collected. The affordances
                                                             the oscillator pitch or looped sample section depending
of the neural network change (sometimes dramatically)
                                                             on which sound option is selected.
when it is re-trained with more or different data. This
                                                                Each sound generator has volume set to zero (silence)
changes the possible interaction between performer and
                                                             by default, but changing the main parameter triggers a
instrument and demands negotiation and improvisation
                                                             short volume envelope (a note). The buttons below each
from the performer in each performance to learn and
                                                             knob allow additional control over each generator’s vol-
exploit new behaviours. The instrument itself is an ex-
                                                             ume: the top button triggers the short envelope without
periment in co-creation. Through it, this work highlights
                                                             changing the main knob and the bottom button turns the
the tension between the machine learning algorithm’s
                                                             sound on continuously.
role as a component within a musical instrument, and as
                                                                The eight sound generators are mixed together and
a distinct agent that shares musical control with a human
                                                             sent through distortion and reverb effects which can be
performer.
                                                             controlled through the computer interface. The large
                                                             slider controls the main volume allowing the performer
2. Generative AI System                                      to start and end the performance. The sound design and
                                                             MIDI interfacing with the XTouch Mini is implemented
This system uses a mixture density recurrent neural net-     in Pure Data which runs on the performer’s laptop.
work (MDRNN) within the context of a live computer              The knobs controlling the synthesis tuning parameter
music performance. This algorithm is a variant of the        are the main focus of the performance and it is this part
deep neural networks often used to compose text or sym-      of the system that is controlled by both the performer
bolic music but allows learning and creative generation      and generative AI system. The LED indicators on each
of continuous data such as synthesiser control signals,      knob show the latest update to the parameter, either from
and absolute time values.                                    the performer turning the knob or the generative system
   The generative aspects of this system use the Inter-      adjusting it in software.
active Musical Prediction System (IMPS) [11] which im-          The IMPS system is set to function in a call-and-response
plements the MDRNN in Python. In this context, the           manner. When the performer is adjusting the knobs, their
MDRNN is configured with two 32-unit LSTM layers             changes are driven through the MDRNN to update its
and an MDN layer that outputs the parameters of a 9-         internal state but predictions are discarded. When the
dimensional Gaussian mixture model: one dimension            performer stops for two seconds, the IMPS system takes
for each knob on the controller and one dimension for        control of parameter changes, generating predictions for
the number of seconds in the future that this interaction    the parameters continually from where the performer
should occur. The input to the MDRNN is, similarly, a        left off and updating the eight synthesis parameters in
9D vector of the location of each knob and the time since    real time. The generative system’s changes are displayed
the previous interaction. Although this is a tiny model      on the LED rings on the control interface as well as on
by comparison with other deep learning models, it is ap-     the computer screen. The performer has control of the
propriate given the size of the dataset involved and the     diversity controls (prediction temperature) allowing a
Figure 2: The XTouch Mini MIDI interface used in this performance system. Each column of controls is mapped to a separate
sound generator. Both the performer and generative system can adjust the parameter knobs. The performer has access to
other controls to steer the performance.


                                                                4. Performance Experiences and
                                                                   Conclusions
                                                                This system has been deployed in live performances since
                                                                2019. These experiences demonstrate that the generative
                                                                system works and makes a practical contribution to the
                                                                performances in terms of creating plausible adjustments
                                                                to the synthesis parameters. A deeper question then is
                                                                whether the MDRNN generative system offers a level of
                                                                co-creative engagement above what could be offered, for
                                                                instance, by a simpler random-walk generator. From the
Figure 3: The computer screen view during performance           experience of these live performances, it does seem that
showing the state of each control knob from the performer       the generator can be influenced simply through the style
and generative AI system. The RNN system runs in a termi-       of adjustments that the performer is making (e.g., it tends
nal window on the right. This screen is shown to the audience   to continue adjusting the knobs that the performer pre-
during performance.
                                                                viously was using). Different behaviours in between the
                                                                eight knobs, e.g., adjusting just one, changing multiple,
                                                                pausing in-between adjustments or making continual
degree of influence over generated material.                    changes, appear in the generator’s changes. These be-
   While “call-and-response” might suggest that the per-        haviours appear “for free” with the MDRNN, that is, they
former can do nothing while the generative system is            are learned from the dataset, whereas they would need
operating, in fact, this setup allows the performer to ad-      to be encoded into a rule-based generator manually.
just other aspects of the performance; for instance, the           Whenever the system is used, either in rehearsal or
buttons changing the envelope state, the sound generator        performance, the performer’s interactions are captured
type as well as the computer-based controls for effects.        to continue building a set of gestural control data for
In this type of performance, it is advantageous to allow        the XTouch Mini controller. As the system is retrained
a degree of generative change to one part of the musical        with new data, it “learns” more behaviours, just as the
system to continue while focusing on other parts.               performer adjusts their style in between performances. In
                                                                this way, this system could be said to be co-adaptive [12],
                                                                although this is yet to be studied in a rigorous way. From
                                                                the experience of working with this system, it can be
                                                                reported that features such as the buttons controlling
synthesiser envelopes were added in order to give the              28th ACM International Conference on Multime-
performer control over the sound while allowing the                dia, Association for Computing Machinery, New
generative system to operate. Even though there is the             York, NY, USA, 2020, p. 1180–1188. doi:10.1145/
potential for direct interplay between the performer and           3394171.3413671.
generative system, it seems to be important to have some       [5] A. Roberts, J. Engel, Y. Mann, J. Gillick, C. Kay-
different roles to play, and to allow the performer to listen      acik, S. Nørly, M. Dinculescu, C. Radebaugh,
and interact without interrupting the generative model.            C. Hawthorne, D. Eck, Magenta studio: Aug-
   From a practical perspective, this system has been suc-         menting creativity with deep learning in Able-
cessful in allowing complete performances in co-creation           ton Live, in: Proceedings of the International
with a generative AI music system. The generative sys-             Workshop on Musical Metacreation (MUME), 2019.
tem acts as a predictive model for control gestures and is         URL: http://musicalmetacreation.org/buddydrive/
clever enough to enable interaction and steering from the          file/mume_2019_paper_2/.
performer using only their own performance gestures. [6] T. R. Næss, C. P. Martin, A physical intelligent
Higher level behaviours, such as long-term structure of            instrument using recurrent neural networks, in:
the performance are not learned by the model but need              M. Queiroz, A. X. Sedó (Eds.), Proceedings of the
to be controlled manually by the performer. While this             International Conference on New Interfaces for
could be said to be limiting, when compared to similar             Musical Expression, NIME ’19, UFRGS, Porto Ale-
non-generative system, the performer in this case can              gre, Brazil, 2019, pp. 79–82. doi:10.5281/zenodo.
switch to handling high-level changes while control over           3672874.
the synthesis parameters is seamlessly continued by the        [7] F. Pachet, The continuator: Musical interaction
generative system.                                                 with style, Journal of New Music Research 32 (2003)
   This research has described a generative electronic             333–341. doi:10.1076/jnmr.32.3.333.16861.
music controller for co-creative performance. This sys- [8] G. E. Lewis, Too many notes: Computers, com-
tem fits within the idiom of improvised electronic music           plexity and culture in “Voyager”,         Leonardo
performance and shows how a machine learning model                 Music Journal 10 (2000) 33–39. doi:10.1162/
for control gesture prediction can be applied in a typi-           096112100570585.
cal electronic music controller allowing a very different      [9] C. P. Martin, K. Glette, T. F. Nygaard, J. Torresen,
style of music generation to symbolic music generation             Understanding musical predictions with an embod-
systems. Many other electronic music designs would be              ied interface for musical machine learning, Fron-
possible within this style of interaction, and we see this         tiers in Artificial Intelligence 3 (2020) 6. doi:10.
work as part of developing an orchestra of co-creative             3389/frai.2020.00006.
musical instruments that interrogate how modern music [10] S. Jones, Cybernetics in society and art, in: Pro-
generation and music interaction can be applied together.          ceedings of the 19th International Symposium of
                                                                   Electronic Art, ISEA2013, ISEA International; Aus-
Acknowledgments                                                    tralian Network for Art & Technology; University
                                                                   of Sydney, 2013-01-01. URL: http://hdl.handle.net/
The Titan V GPU used in this work was provided by                  2123/9863.
NVIDIA Corporation.                                           [11] C. P. Martin, J. Torresen, An interactive musi-
                                                                   cal prediction system with mixture density recur-
References                                                         rent neural networks, in: M. Queiroz, A. X. Sedó
                                                                   (Eds.), Proceedings of the International Conference
 [1] C. Ames, Automated composition in retrospect:                 on New Interfaces for Musical Expression, NIME
      1956-1986, Leonardo 20 (1987) 169–185. doi:10.               ’19, UFRGS, Porto Alegre, Brazil, 2019, pp. 260–265.
      2307/1578334.                                                doi:10.5281/zenodo.3672952.
 [2] C.-Z. A. Huang, A. Vaswani, J. Uszkoreit, N. Shazeer, [12] W. Mackay, Responding to cognitive overload: Co-
      I. Simon, C. Hawthorne, A. M. Dai, M. D. Hoffman,            adaptation between users and technology, Intellec-
      M. Dinculescu, D. Eck, Music transformer: Gener-             tica 30 (2000) 177–193.
      ating music with long-term structure, in: Proc. of
      ICLR ’19, 2019. arXiv:1809.04281.
 [3] C. J. Carr, Z. Zukowski, Generating albums with
      samplernn to imitate metal, rock, and punk bands,
      arXiV Preprint, 2018. arXiv:1811.06633.
 [4] Y.-S. Huang, Y.-H. Yang, Pop music transformer:
      Beat-based modeling and generation of expressive
      pop piano compositions, in: Proceedings of the

</pre>