<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Performing with a Generative Electronic Music Controller</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Charles Patrick Martin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The Australian National University</institution>
          ,
          <addr-line>Canberra</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Generative electronic music is, by and large, old news; however, despite ever more convincing composition systems, less progress has been made in systems for live performance with a generative model. One limitation has been the focus on symbolic music, an imperfect representation for musical gesture, another has been the lack of interactive explorations of co-creative musical systems with modern machine learning techniques. In this work these limitations are addressed through the study of a co-creative interactive music system that applies generative AI to gestures on an electronic music controller, not to creating traditional musical notes. The controller features eight rotational controls with visual feedback and is typical of interfaces used for electronic music performance and production. The sound and interaction design of the system suggest new techniques for adopting co-creation in generative music systems and a discussion of live performances experiences put these techniques into practical context.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;interactive music system</kwd>
        <kwd>mixture density recurrent neural network</kwd>
        <kwd>performance</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>or Voyager [8] that generate MIDI notes. While gestural strict time requirements for an interactive application.
predictions have been studied in a minimal musical in- This ML model learns to reproduce how a human plays
strument [9], this work involves a more complete musical a musical instrument in terms of physical movements
interface capable of driving a complete performance. rather than what notes should come next. As a result,</p>
      <p>Throughout performance with this system, the neural this musical ML configuration could be termed
embodnetwork can take control of the interface, continuing the ied musical prediction. This style of musical ML is ideal
performer’s actions, transforming them into a “predicted for application in a live electronic performance system,
reality”, or overriding the performer in real-time. The where embodied musical gestures with a new interface
performer can see these actions represented visually on are often more important that traditional musical
notathe controller interface and must tune their inputs to tion.
guide the neural network towards musically acceptable
behaviours. The goal is to set up a feedback loop between
human and generative neural network model where the 3. Sound and Interaction Design
process of co-creation leads to transformed interactive
experiences [10]. The synthesised sounds are created by eight sound
gener</p>
      <p>This work is part of an ongoing process of artistic re- ators, each operated by one knob of the controller. Two
search studying how a ML model might evolve over time sound options are available, a sine-tone oscillator and a
as part of a computer music practice. Over the develop- looped sample player (granular synthesiser), these can be
ment of this work, the ML model has been re-trained as switched by clicking a knob. Turning each knob changes
more training data has been collected. The afordances the main parameter of each sound generator, these are
of the neural network change (sometimes dramatically) the oscillator pitch or looped sample section depending
when it is re-trained with more or diferent data. This on which sound option is selected.
changes the possible interaction between performer and Each sound generator has volume set to zero (silence)
instrument and demands negotiation and improvisation by default, but changing the main parameter triggers a
from the performer in each performance to learn and short volume envelope (a note). The buttons below each
exploit new behaviours. The instrument itself is an ex- knob allow additional control over each generator’s
volperiment in co-creation. Through it, this work highlights ume: the top button triggers the short envelope without
the tension between the machine learning algorithm’s changing the main knob and the bottom button turns the
role as a component within a musical instrument, and as sound on continuously.
a distinct agent that shares musical control with a human The eight sound generators are mixed together and
performer. sent through distortion and reverb efects which can be
controlled through the computer interface. The large
slider controls the main volume allowing the performer
2. Generative AI System to start and end the performance. The sound design and
MIDI interfacing with the XTouch Mini is implemented
This system uses a mixture density recurrent neural net- in Pure Data which runs on the performer’s laptop.
work (MDRNN) within the context of a live computer The knobs controlling the synthesis tuning parameter
music performance. This algorithm is a variant of the are the main focus of the performance and it is this part
deep neural networks often used to compose text or sym- of the system that is controlled by both the performer
bolic music but allows learning and creative generation and generative AI system. The LED indicators on each
of continuous data such as synthesiser control signals, knob show the latest update to the parameter, either from
and absolute time values. the performer turning the knob or the generative system</p>
      <p>The generative aspects of this system use the Inter- adjusting it in software.
active Musical Prediction System (IMPS) [11] which im- The IMPS system is set to function in a call-and-response
plements the MDRNN in Python. In this context, the manner. When the performer is adjusting the knobs, their
MDRNN is configured with two 32-unit LSTM layers changes are driven through the MDRNN to update its
and an MDN layer that outputs the parameters of a 9- internal state but predictions are discarded. When the
dimensional Gaussian mixture model: one dimension performer stops for two seconds, the IMPS system takes
for each knob on the controller and one dimension for control of parameter changes, generating predictions for
the number of seconds in the future that this interaction the parameters continually from where the performer
should occur. The input to the MDRNN is, similarly, a left of and updating the eight synthesis parameters in
9D vector of the location of each knob and the time since real time. The generative system’s changes are displayed
the previous interaction. Although this is a tiny model on the LED rings on the control interface as well as on
by comparison with other deep learning models, it is ap- the computer screen. The performer has control of the
propriate given the size of the dataset involved and the diversity controls (prediction temperature) allowing a
4. Performance Experiences and</p>
      <p>Conclusions
This system has been deployed in live performances since
2019. These experiences demonstrate that the generative
system works and makes a practical contribution to the
performances in terms of creating plausible adjustments
to the synthesis parameters. A deeper question then is
whether the MDRNN generative system ofers a level of
co-creative engagement above what could be ofered, for
instance, by a simpler random-walk generator. From the
Figure 3: The computer screen view during performance experience of these live performances, it does seem that
showing the state of each control knob from the performer the generator can be influenced simply through the style
and generative AI system. The RNN system runs in a termi- of adjustments that the performer is making (e.g., it tends
nal window on the right. This screen is shown to the audience to continue adjusting the knobs that the performer
preduring performance. viously was using). Diferent behaviours in between the
eight knobs, e.g., adjusting just one, changing multiple,
pausing in-between adjustments or making continual
degree of influence over generated material. changes, appear in the generator’s changes. These
be</p>
      <p>While “call-and-response” might suggest that the per- haviours appear “for free” with the MDRNN, that is, they
former can do nothing while the generative system is are learned from the dataset, whereas they would need
operating, in fact, this setup allows the performer to ad- to be encoded into a rule-based generator manually.
just other aspects of the performance; for instance, the Whenever the system is used, either in rehearsal or
buttons changing the envelope state, the sound generator performance, the performer’s interactions are captured
type as well as the computer-based controls for efects. to continue building a set of gestural control data for
In this type of performance, it is advantageous to allow the XTouch Mini controller. As the system is retrained
a degree of generative change to one part of the musical with new data, it “learns” more behaviours, just as the
system to continue while focusing on other parts. performer adjusts their style in between performances. In
this way, this system could be said to be co-adaptive [12],
although this is yet to be studied in a rigorous way. From
the experience of working with this system, it can be
reported that features such as the buttons controlling
Acknowledgments
The Titan V GPU used in this work was provided by
NVIDIA Corporation.
synthesiser envelopes were added in order to give the
performer control over the sound while allowing the
generative system to operate. Even though there is the
potential for direct interplay between the performer and
generative system, it seems to be important to have some
diferent roles to play, and to allow the performer to listen
and interact without interrupting the generative model.</p>
      <p>From a practical perspective, this system has been
successful in allowing complete performances in co-creation
with a generative AI music system. The generative
system acts as a predictive model for control gestures and is
clever enough to enable interaction and steering from the
performer using only their own performance gestures.</p>
      <p>Higher level behaviours, such as long-term structure of
the performance are not learned by the model but need
to be controlled manually by the performer. While this
could be said to be limiting, when compared to similar
non-generative system, the performer in this case can
switch to handling high-level changes while control over
the synthesis parameters is seamlessly continued by the
generative system.</p>
      <p>This research has described a generative electronic
music controller for co-creative performance. This
system fits within the idiom of improvised electronic music
performance and shows how a machine learning model
for control gesture prediction can be applied in a
typical electronic music controller allowing a very diferent
style of music generation to symbolic music generation
systems. Many other electronic music designs would be
possible within this style of interaction, and we see this
work as part of developing an orchestra of co-creative
musical instruments that interrogate how modern music
generation and music interaction can be applied together.
28th ACM International Conference on
Multimedia, Association for Computing Machinery, New
York, NY, USA, 2020, p. 1180–1188. doi:10.1145/
3394171.3413671.
[5] A. Roberts, J. Engel, Y. Mann, J. Gillick, C.
Kayacik, S. Nørly, M. Dinculescu, C. Radebaugh,
C. Hawthorne, D. Eck, Magenta studio:
Augmenting creativity with deep learning in
Ableton Live, in: Proceedings of the International
Workshop on Musical Metacreation (MUME), 2019.</p>
      <p>URL: http://musicalmetacreation.org/buddydrive/
ifle/mume_2019_paper_2/.
[6] T. R. Naess, C. P. Martin, A physical intelligent
instrument using recurrent neural networks, in:
M. Queiroz, A. X. Sedó (Eds.), Proceedings of the
International Conference on New Interfaces for
Musical Expression, NIME ’19, UFRGS, Porto
Alegre, Brazil, 2019, pp. 79–82. doi:10.5281/zenodo.</p>
      <p>3672874.
[7] F. Pachet, The continuator: Musical interaction
with style, Journal of New Music Research 32 (2003)
333–341. doi:10.1076/jnmr.32.3.333.16861.
[8] G. E. Lewis, Too many notes: Computers,
complexity and culture in “Voyager”, Leonardo
Music Journal 10 (2000) 33–39. doi:10.1162/
096112100570585.
[9] C. P. Martin, K. Glette, T. F. Nygaard, J. Torresen,</p>
      <p>Understanding musical predictions with an
embodied interface for musical machine learning,
Frontiers in Artificial Intelligence 3 (2020) 6. doi: 10.</p>
      <p>3389/frai.2020.00006.
[10] S. Jones, Cybernetics in society and art, in:
Proceedings of the 19th International Symposium of
Electronic Art, ISEA2013, ISEA International;
Australian Network for Art &amp; Technology; University
of Sydney, 2013-01-01. URL: http://hdl.handle.net/
2123/9863.
[11] C. P. Martin, J. Torresen, An interactive
musical prediction system with mixture density
recurrent neural networks, in: M. Queiroz, A. X. Sedó
(Eds.), Proceedings of the International Conference
[1] C. Ames, Automated composition in retrospect: on New Interfaces for Musical Expression, NIME
1956-1986, Leonardo 20 (1987) 169–185. doi:10. ’19, UFRGS, Porto Alegre, Brazil, 2019, pp. 260–265.
2307/1578334. doi:10.5281/zenodo.3672952.
[2] C.-Z. A. Huang, A. Vaswani, J. Uszkoreit, N. Shazeer, [12] W. Mackay, Responding to cognitive overload:
CoI. Simon, C. Hawthorne, A. M. Dai, M. D. Hofman, adaptation between users and technology,
IntellecM. Dinculescu, D. Eck, Music transformer: Gener- tica 30 (2000) 177–193.
ating music with long-term structure, in: Proc. of</p>
      <p>ICLR ’19, 2019. arXiv:1809.04281.
[3] C. J. Carr, Z. Zukowski, Generating albums with
samplernn to imitate metal, rock, and punk bands,
arXiV Preprint, 2018. arXiv:1811.06633.
[4] Y.-S. Huang, Y.-H. Yang, Pop music transformer:</p>
      <p>Beat-based modeling and generation of expressive
pop piano compositions, in: Proceedings of the</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>