On VAE Latent Space Vectors Distributed
           Evolution Driven Music Generation

                   Gleb Rogozinsky1[0000−0001−5698−2347] and Aleksei
                         Shchekochikhin2[0000−0003−2603−9133]
    1
        The Bonch-Bruevich Saint-Petersburg State University of Telecommunications
                             gleb.rogozinsky@gmail.com
                             2
                               ashekochikhin@gmail.com


          Abstract. Nowadays, the deep learning technics used in the algorithms
          of computer music generation are rapidly developing. One of the common
          problems of the deep learning-based methods is the successful reproduc-
          tion of music score sequences within the short time intervals as well as
          on long ones. To solve it, one may use so-called hierarchical models,
          which typically lead to a computational load. Earlier systems based on
          evolutionary computation methods, especially genetic algorithms, had al-
          ready proven their ability to model variations of musical sequences within
          the long time scales. The purpose of the presented study is to demon-
          strate how to incorporate deep learning models, evolutionary computa-
          tions, and rule-based computer music generation technics to approach
          the most effective features of each one technic correspondingly. The pa-
          per presents the combined method of music sequence generation based on
          a distributed genetic algorithm with the Variational Autoencoder(VAE)
          genotype-to-phenotype mapping. The research includes experimental in-
          vestigation of the latent space vectors evolutionary driven dynamics, to
          figure out the ways of how one can achieve the state of a controllable evo-
          lution process. The paper also presents the set of rule-based approaches
          to modify system-generated music sequences to meet predictable high
          perceptual differences at the low computational costs. The outcomes of
          the genetic algorithm runs at the various numbers of agents and differ-
          ent mutations are given, together with several examples of music scores,
          generated within the research.

          Keywords: Algorithmic Music Generation · Computer Music Technics
          · Genetic algorithm · VAE


1       Introduction
Algorithmic music generation, also known as algorithmic music, has a long his-
tory of development, originated in the days of Mozart to undergo the rapid
    Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
2      G. Rogozinsky, A. Shchekochikhin

progress due to co-evolution with the informational and communicational tech-
nologies. Comparing to the days of Illiac Suite by ILLIAC I (Illinois Automatic
Computer) mainframe coded by Lejaren Hiller in 1957, the state-of-art software
allows user to have his/her own pocket mobile ‘Illiac’ running on iOS/Android.
Meanwhile, the computational advantage still not able to reach the singular-
ity frontier for the computer music systems, mostly because of the simplicity-
complexity dualism of human-generated music. Human music can be simple
to its extreme, but at some particular moment, one little pause, or unexpected
modulation can turn several measures into the masterpiece. The authors’ present
study continues their previous research in the field of distributed computer music
generation systems, aimed to set of competing agents, for imitation of human-
featured process of selection between several creative thoughts.


2   Music generation technics comparison

Computer music generation remains one of the problems of machine creativity.
Since 1960s, applied technics has been evolved from simple rule models and
Markov chains [7] to complex ones, based on evolutionary computations [2],
[9], and deep learning models [18], [19], [20]. Still, there is no universal way to
compare different computer-aided generative technics. One of the methods to
value the results of generated music pieces is the Music Turing Test, one of its
first usages was described in [6]. The authors conducted a research to compare
the existing computer music generation technics, to figure out the most successful
ones. Fifty respondents were asked to listen to 12 different short music pieces,
10 of which were generated by computer algorithms, and to guess which one
was composed by human being. Table 1 gives a summary of the comparison of
music composition technics. There are some other approaches to evaluate the
quality of computer-generated music pieces. In [8] peers were asked to compare
generated samples by raking them in a range from 1 to 10. 1 grade corresponded
to ”completely random noise” while a 10 grade corresponded to ”composed by a
novice composer”. In [17] authors indicated the main drawback of such quality
evaluation technics, which are high costs and low reproducibility, and shown
how objective quantities, such as model entropy and mutual information as a
function of the distance between two notes, may be used for computer music
generation technic outcomes evaluation.


              Table 1. Turing test for music composition techniques

              Music composition technique Mean Turing test pass, %
              Composed by human                                  69
              Deep learning                                      60
              Complex technics                                   56
              Genetic algorithm                                  52
              Markov chains                                      46
              Cellular automata                                  30
                                  Title Suppressed Due to Excessive Length       3

   Nevertheless, while technics based on deep learning models appear to be most
advanced, there some known limitations of the usage of them:
 – deep learning methods do not actually compose a new piece of music, but
   rather produce a sequence of notes, which is statistically equal with the given
   dataset; thus the new genre could not appear in such generation model;
 – complex hierarchical models are to be applied to achieve good results for
   generation pieces in long time ranges [12];
 – the process of learning as well as generation still needs a lot of computations,
   but the effectiveness of recent models increases [4].
   On the other hand, evolutionary computation-based methods are proven to
provide a way to model music melody variations on long time scales [1].


3     Latent space vectors evolution generative model
In [13], we have proposed the method of music generation with a distributed
genetic algorithm. One of the main difficulties in genetic algorithm music gener-
ation system design is to define genotype-to-phenotype mapping as well as define
a fitting function. In [11], one may find a rich survey of different approaches for
evolutionary computer music generation systems design.
    It has been shown that VAEs and GANs can be used to perform the genotype-
to-phenotype mapping, performing the interactive evolutionary computation for
image generation[3] as well as the 2 bar music sequences[15].
    We propose the following technique for infinity personalized music generation,
based on the genetic evolution of latent vectors (see figure 1):
 – the distributed genetic algorithm is used as an attractor for the music gen-
   eration; each agent in the system is associated with the one user; all agents
   together form the population for the genetic algorithm;
 – VAE is used for the genotype-to-phenotype mapping;
 – each agent has a target state, the distance to that is used as the fitness
   function;
 – the agent target states are set by users, mapped to the natural language
   descriptions;
 – at each system run loop step, agents obtain the states of others to perform
   the next iteration of the genetic algorithm.
The current version of the system model is available online3 . MusicVAE autoen-
coder[12] from Magenta was used to perform the genotype-to-phenotype map-
ping to decode genetic algorithm run results into note sequences. MusicVAE
encoder is a two-layer bidirectional LSTM network. MusicVAE decoder is a hi-
erarchical RNN. The key assumption of implementing genotype-to-phenotype
mapping with the VAE model is to achieve perceptual continuity: we assume
that small distances in latent space will correspond to perceptually-close de-
coded music sequences.
3
    https://github.com/ashekochikhin/aGASim
4         G. Rogozinsky, A. Shchekochikhin


Fig. 1. Distributed Genetic Algorithm Music Generation with VAE Genotype-to-
Phenotype Mapping


  In the experiments, the following genetic algorithm realization was imple-
mented:

    – agent get current states of all other agents in the system;
    – the Hamming distance closest to the current target state agent is determined;
    – single point crossover runs at current and closest agents providing two new
      children candidates;
    – system specified number of random mutations runs;
    – closest to the target state candidate replaces current agents state.

The model system has the following variable parameters:

    – N - number of agents;
    – lch - chromosome length;
    – pm - number of mutations after crossover;
    – lΣ - gen’s alphabet length.

    Figure 2 shows the influence of mutations probability on the median popu-
lation distance of current agents states to the target ones. We denote constant
asymptote of the distance to the target state as stable distance. With the growth
of the population size influence of random mutation decreases, see figure 3. For
a given population size stable distance to target state Dst :

                           Dst ∼ lch (1 − e−c1 (pm /lch +c2 ) ),                (1)

where c1 and c2 are some constants defined by system configutaion.
    Figure 4 and 5 show the influences of mutation probability, and the number
of agents on the median distance of current agents states to the target ones for
different chromosome lengths. With the growth of chromosome length number of
agents required to achieve relatively close stable distance to the target increases.
    Figure 6 shows the decay of the distance to the target state with the agents’
population size. For a given chromosome length stable distance to target state
                                   Title Suppressed Due to Excessive Length         5


Fig. 2. Examples of modeling distributed GA run. Chromosome length is equal to 32.
Alphabet length is equal to 2. With the growth of the population, more agents, current
solutions become closer to the target vectors.
6      G. Rogozinsky, A. Shchekochikhin


Fig. 3. The dependency of stable distance to the target on the number of mutations.
Chromosome length is equal to 32. Alphabet length is equal to 2..


Fig. 4. Examples of modeling distributed GA run. Chromosome length is equal to 32.
Alphabet length is equal to 2. Mutations influences higher on smaller populations.
                                   Title Suppressed Due to Excessive Length          7


Fig. 5. Examples of modeling distributed GA run. Chromosome lengths is equal to
128. Alphabet length is equal to 2. Mutations influences higher on smaller populations.


may be approximated with an inhibitory dose-response curve, Dst :
                                            lch
                                 Dst ∼               c1 ,                          (2)
                                         1 + ( cN2 )

where c1 and c2 are constants defined by system configutaion.
   Figure 7 shows the growth of the stable distance to the target state with the
alphabet length. For a given population size stable distance to target state Dst :

                               Dst ∼ lch (1 − e−c1 lΣ ),                           (3)

where c1 is a constant defined by system configutaion.
   Provided analysis of the distributed genetic algorithm behavior on system
variable parameters is to be used to keep the correct run. As the genetic al-
gorithm is used not for optimization but generation, it should always be close
enough to the extremum but never stack on it. Random mutations may be used
to prevent continuous repeats of every solution. If there are not enough agents
to be close enough to the target system may generate virtual ones.


4    Discussion
Figure 8 gives an example of a generated sequence of the 16 bar length. Note
that implemented VAE was trained to produce the 2 bar length sequences. One
can notice that the presented sequence is in the key of B-flat minor. The melody
generally remains within the selected tonality. We must admit that with the
8       G. Rogozinsky, A. Shchekochikhin


Fig. 6. Example of modeling distributed GA run. The chromosomes’ lengths are equal
to 128. The alphabet length is equal to 2. The dependency of stable distance to the
target states on population size.


Fig. 7. Example of modeling distributed GA run. The chromosomes’ lengths are equal
to 32. The population size is equal to 16 agents. Dependence of stable distance to the
target states on the gen alphabet length.
                                   Title Suppressed Due to Excessive Length          9

described approach on the long timescales (32 to 64 bars) system ’losses’ memory
of the initial sequence and its attributes.


             Fig. 8. Example of generating of the 16 bar note sequence


    To get better scale stability, one may apply some algorithmic constraints
to the generated sequences. In [5] was shown how the constraints in the latent
space might be used to reach a better successful generation rate. We assume that
restrictions in the phenotype space are also practical. For example, one may use
major scale constraint to the mentioned sequence and get one presented in figure
9. To apply scale constraint, one should first determine the sequences key. We
provide a simplified algorithm to identify the key of the sequence by giving a
score to each note for each music key:

 – +2, if a note from the sequence is I, IV, or V scale degree;
 – +1, if a note from the sequence is any other scale degree;
 – −1, if a note from the sequence is out of a scale;
 – if the total scores of the sequence are equal for two different scales, the one
   with fewer symbols is to be chosen.

After the original scale is determined, the sequence may be quantized to the
desirable one by replacing out of scale notes with the closest one from the scale.
There are also more complex algorithmic technics to determine the original se-
quence’s key [16], as well as deep learning-based approaches for key extraction
from raw audio samples [10].
    Another constraint application is a rhythmic modification. In spite of original
VAE was trained to decode latent space vectors to a sequence in four-four time
signature, the generated sequence can also be easily quantized to three-four time,
see figure 10, or six-eight time, see figure 11.
    Rhythmic transformations may also be performed according to statistically
motivated rules. Depending on the music genre, different rhythmic sequences -
rudiments - are more or less common [14]. Applying rudimental rhythmic filtering
will result in not only statistically right results but in the generation of sequences,
which may be comfortably performed by a human.
10   G. Rogozinsky, A. Shchekochikhin


     Fig. 9. The generated sequence with applied major scale constraint


             Fig. 10. The generated sequence in three-four time


              Fig. 11. The generated sequence in six-eight time
                                    Title Suppressed Due to Excessive Length         11

    Therefore, one of the approaches to improve nowadays’ most advanced deep
learning-based music sequences generation techniques is to combine them with
the well- known algorithmic solutions such as genetic algorithm and rule-based
constraints. This approach leads to consistent results with a low computation
load.


5   Conclusion
The approach that has been discussed in the paper leads us to further study
of the proposed distributed genetic algorithm with VAE genotype-to-phenotype
mapping music generation model. The combining technics of algorithmic mu-
sic proved to play a key role in the effective composition. Different approaches
possess their peculiar strong and weak sides, thus suitably combining them will
produce a compositionally attractive outcome. Generally speaking, an agent-
based approach with competing ’ideas’ can be treated as a model of world music
evolution, i.e. starting from the basic simple sequences, it evolved into very com-
plex, polytonal, and even atonal structures. There are already some popular
applications and online services existing. We can assume that their effectiveness
both in the sense of speed and music ’ideas’ is also due to a proper combined
approach. The application area of such services allows us to estimate the future
of such systems. Various recreational zones, both private and public, with ability
to be adjusted, i.e. to the weather conditions, daytime, team progress, wearable
device data etc. In addition, as a set of assistive technologies for the human
composers, virtual reality randomization effects etc. One of the greatest features
of such systems is their ability to be individualized according to the client data.


References
 1. Arutyunov, V., Averkin, A.: Genetic algorithms for music varia-
    tion on genom platform. Procedia Computer Science 120, 317 –
    324       (2017).     https://doi.org/https://doi.org/10.1016/j.procs.2017.11.245,
    http://www.sciencedirect.com/science/article/pii/S1877050917324572,             9th
    International Conference on Theory and Application of Soft Computing, Comput-
    ing with Words and Perception, ICSCCW 2017, 22-23 August 2017, Budapest,
    Hungary
 2. Biles, J.: Genjam: A genetic algorithm for generating jazz solos (07 1994)
 3. Bontrager, P., Lin, W., Togelius, J., Risi, S.: Deep Interactive Evolution. arXiv
    e-prints arXiv:1801.08230 (Jan 2018)
 4. Engel, J., Agrawal, K.K., Chen, S., Gulrajani, I., Donahue, C., Roberts, A.: GAN-
    Synth: Adversarial neural audio synthesis. In: International Conference on Learning
    Representations (2019), https://openreview.net/forum?id=H1xQVn09FX
 5. Engel, J., Hoffman, M., Roberts, A.: Latent constraints: Learning to
    generate conditionally from unconditional generative models. In: In-
    ternational Conference on Learning Representations (ICLR) (2018),
    https://openreview.net/pdf?id=Sy8XvGb0-
 6. Hadjeres, G., Pachet, F., Nielsen, F.: Deepbach: a steerable model for bach chorales
    generation. In: ICML (2016)
12      G. Rogozinsky, A. Shchekochikhin

 7. Hiller, L.A., Isaacson, L.M.: Experimental Music; Composition with an Electronic
    Computer. Greenwood Publishing Group Inc., Westport, CT, USA (1979)
 8. Huang, A., Wu, R.: Deep learning for music. CoRR abs/1606.04930 (2016),
    http://arxiv.org/abs/1606.04930
 9. II, R.: Composing with Genetic Algorithms: GenDash, pp. 117–136 (01 2007)
10. Korzeniowski, F., Widmer, G.: End-to-end musical key estimation using a convo-
    lutional neural network (06 2017)
11. Miranda, E.R., Biles, J.A.: Evolutionary Computer Music. Springer-Verlag, Berlin,
    Heidelberg (2007)
12. Roberts, A., Engel, J., Raffel, C., Hawthorne, C., Eck, D.: A hi-
    erarchical latent vector model for learning long-term structure in mu-
    sic. In: International Conference on Machine Learning (ICML) (2018),
    http://proceedings.mlr.press/v80/roberts18a.html
13. Rogozinsky G., S.A.: The distributed system model for evolutionary generation of
    audio content. INFORMATION TECHNOLOGIES AND TELECOMMUNICA-
    TIONS 2(2), 20–26 (2014)
14. Sethares, W.: The geometry of musical rhythm: what makes a “good”
    rhythm good? Journal of Mathematics and the Arts 8, 135–137 (12 2014).
    https://doi.org/10.1080/17513472.2014.906116
15. Shafkat, I.: Music by means of human selection (Mar 2019),
    https://towardsdatascience.com/music-by-means-of-natural-selection-
    11934d7e89a3
16. Shmulevich, I., Yli-Harja, O.: Localized key-finding: Algorithms and ap-
    plications. Music Perception: An Interdisciplinary Journal 17 (07 2000).
    https://doi.org/10.2307/40285832
17. Tikhonov, A., Yamshchikov, I.P.: Music generation with variational re-
    current autoencoder supported by history. CoRR abs/1705.05458 (2017),
    http://arxiv.org/abs/1705.05458
18. van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A.,
    Kalchbrenner, N., Senior, A., Kavukcuoglu, K.: WaveNet: A Generative Model for
    Raw Audio. arXiv e-prints arXiv:1609.03499 (Sep 2016)
19. Vasquez, S., Lewis, M.: MelNet: A Generative Model for Audio in the Frequency
    Domain. arXiv e-prints arXiv:1906.01083 (Jun 2019)
20. Yang, L.C., Chou, S.Y., Yang, Y.H.: MidiNet: A Convolutional Generative
    Adversarial Network for Symbolic-domain Music Generation. arXiv e-prints
    arXiv:1703.10847 (Mar 2017)