<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>A. Borghesi)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Expert-MusiComb: Injective Domain Knowledge in a Neuro-Symbolic Approach for Music Generation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lorenzo Tribuiani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Giuliani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Allegra De Filippo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Borghesi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering (DISI), University of Bologna</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The significant expansion of data-driven technologies in the past decade has highlighted the crucial role of structured data, given the more relevant and meaningful informative content that they can provide to artificial intelligence (AI) applications. Nonetheless, there are domains based on inherently unstructured data, such as the audio domain. In those cases, the possibility of relying on an automated system capable of extracting structured features from raw data could serve as a pivotal element in enhancing and strengthening the capabilities of an AI system. In this work, we propose an automated feature extractor which leverages machine and deep learning methodologies to retrieve two higher-level musical attributes from short MIDI samples, namely the harmonic content of the sample - through its chords progression - and the role that such sample could have within a multi-track composition - i.e., melody, bass, or accompaniment. We perform our tests on a dataset containing ground truth information to assess quantitative results and later integrate our models within the state-of-the-art framework for combinatorial music generation MusiComb to check for harmonic and melodic consonance on the downstream generative task.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Music Generation Systems</kwd>
        <kwd>Generative AI</kwd>
        <kwd>Chord Prediction</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Constraint Programming</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Computer-aided music generation combines computer science, machine learning, and music theory to
compose, produce, or assist in creating music. This interdisciplinary field poses a significant challenge
due to the complex mix of creativity, emotion, and technical requirements involved, making it one of
the most demanding tasks for AI to undertake. MusiComb, originally conceived as an implementation
of the work theorized by Hyun et al. in the paper [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], emerges as a framework for combinatorial music
generation. It employs Constraint Programming to generate the final piece, while utilizing deep learning
and machine learning techniques for data preparation and generation. Through the fusion of short MIDI
samples, this system excels at crafting well-structured compositions and empowers users by allowing
them to shape the creative process through the customization of various music-related parameters.
As well summarized in the related paper [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]: it represents a novel music generation approach aimed
at overcoming generative model limitations, by properly combining a set of samples under user-defined
constraints.
      </p>
      <p>
        MusiComb, alongside ComMU, the MIDI sample dataset introduced in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and primarily utilized
during the initial development of the framework, has established a standardized set of significant
features of harmonic and structural attributes of each sample within the dataset. These features serve
as fundamental components utilized by the framework for sample combination and music generation.
While ComMU serves as an exemplary dataset for the tasks, the imperative for new datasets has become
evident. This necessity arises not only to incorporate fresh samples but also to furnish MusiComb with
a broader array of potential features, such as music genre, enabling users to explore a wider spectrum of
potential outcomes. This, coupled with the challenge of locating MIDI datasets labeled consistently
with ComMU, underscored the necessity for an automated feature extractor. Our primary focus lies in
estimating a subset of features that are not readily accessible or discernible through the MIDI protocol.
These features, such as track roles and chord progressions, pose greater complexity in estimation due to
their indirect nature. They hold significant relationships with the harmonic and structural attributes
of the samples. Establishing correlations between known properties or elements of the sample and
these desired features necessitates the use of machine learning and deep learning systems. Moreover,
evaluating the results, particularly in the chord progression domain, presents challenges as existing
systems often rely on human intervention.
      </p>
      <p>The concluding phase will focus primarily on the modifications and additions implemented within
the MusiComb framework, aiming to address various aspects and limitations inherent in the system
itself. A key addition involves an automatic sample extraction algorithm designed to extract small,
repetitive sequences (samples) from complex and lengthy MIDI files, facilitating the integration of new
datasets. Additionally, to enhance the flexibility of sample selection mechanisms and mitigate their
inherent rigidity based on user-selected parameters, minor adjustments have been incorporated into
the pipeline. The introduction of new MIDI datasets, coupled with these modifications, is intended to
render the framework more adaptable and operate more flexibly.</p>
      <p>This approach enables a broader range of possible combinations, thereby reducing the deterministic
nature of the overall process relative to the initial set of user parameters. Further elaboration on the
rationale for those approaches will be provided in subsequent discussions, specifically: Section 2 will
ofer a comprehensive overview of relevant existing works and the main rationale behind this work;
Section 3 will delve deeply into the challenges and complexities associated with feature extraction and
the main problematic aspects and dificulties encountered in addressing these challenges; Section 4
will introduce the primary integrations and modifications made to the MusiComb framework, ofering
a detailed explanation of the sample extraction algorithm and the key advantages resulting from the
adjustments to the sample selection pipeline, particularly in relaxing selection criteria. Some of the
compositions generated by Expert-MusiComb can be heard at the following link: https://soundcloud.
com/lorenzo-tribuiani/sets/musicomb.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Motivation</title>
      <p>
        The field of computer-aided music generation has recently experienced a significant advancement
in the use of end-to-end neural systems. Transformer-based models such as Music Transformer [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
paved the way to more successful projects like MusicLM [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Jukebox [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Noise2Music [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and even
professional tools like Suno. However, the adoption of these models introduces several drawbacks.
Among all, these systems are still subject to abrupt timbre changes and noisy outputs, which restrict
their professional use as well as ofering very limited user control and requiring high computational
demands, thus preventing them from real-time use [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. On top of that, the inherently opaque nature
of neural architectures has been proven to lead to unintended plagiarism, with a consequent lack of
recognition of the human artists whose compositions have been used to train the models.
      </p>
      <p>
        Driven by the goal of addressing these challenges, there is renewed interest in symbolic-based
generative models within the research community. Traditionally, Probabilistic and Hidden Markov
Models (HMMs) have been widely used for both chord and melody generation. For example, [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] uses
an HMM for Bach-inspired chorale harmonies, [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] involves pattern recognition and recombination
techniques to create compositions that replicate th style of various classical composers while [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
and [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] apply complex graphical models to melody harmonization. More recently, systems such as
Morpheus [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and GEDMAS [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] employed explicit rules and probabilistic methods to generate melodies
or entire tracks according to certain constraints, while Pachet et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] combined symbolic models with
neural architectures to exploit the power of both frameworks. Our former work, MusiComb [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], aligns
as well within this research area, as it employs a combinatorial approach to music generation although
it focuses on the arrangement of predefined segments of notes (samples) rather than generating music
on a note-by-note basis.
      </p>
      <p>
        The main strength of sample-based music composition comes from its higher compatibility with
contemporary pop music compositional and production standards since, over the past thirty years, the
introduction of samplers and Digital Audio Workstations (DAWs) has significantly shifted the music
industry’s workflow towards extensive use of sample libraries [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. However, given that the arranged
samples must meet specific properties and constraints to harmonically integrate into longer sequences,
the challenge of correctly locating them rapidly became problematic and task-intensive, especially as
the size of the databases started to expand [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Current sample libraries often provide some metadata
such as key signature and tempo, along with additional high-level labelling that can be used to create
iflters, but information about chord progressions, instrument type, and track role is most of the time
lacking, hence preventing a fully automated procedure. Similarly, research in sample-based music
generation is restricted by technical limitations, as major datasets for synthetic music generation such
as the Lakh MIDI Dataset [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] or MuseDB18 [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] lack this critical information. For this reason, our aim
is to build an automated pipeline that could work as well with larger datasets by employing machine
learning models to extract this kind of metadata whenever it is missing.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. MIDI Feature Estimation</title>
      <p>The core challenge of this work revolved around MIDI feature estimation. As previously mentioned,
MusiComb established a standardized set of features for each sample, essential for the proper functioning
of the framework. Specifically, eight distinct features were identified: Beats Per Minute (BPM), number
of measures, key signature, genre, track role, chord progression, time signature, and rhythm. These eight
features have been categorized into two main groups for clarity:
Direct features including BPM, number of measures, key signature, genre, time signature, and rhythm,
refer to those characteristics whose values are either explicitly written or easily extractable from
the MIDI data itself. Key signature can indeed present a minor obstacle in estimation, as it may not
always be explicitly encoded within the MIDI data. However, there are existing algorithms capable
of estimating the key signature with a reasonable level of confidence, such as the
KrumhanslSchmuckler algorithm utilized in this study. Also, is commonly assumed that genre can be
implicitly inferred from the properties of the dataset itself.</p>
      <p>Indirect features such as track-role and chord progression, are attributes that are unlikely to be
explicitly encoded within the MIDI protocol of the sample.</p>
      <p>The primary challenge in this study is to handle indirect features. While track-role estimation can
be addressed using classical classification techniques (i.e. SVM), chord progression estimation difers
fundamentally. This calls for alternative methods, such as using GRU layers typically employed for
Natural Language Processing tasks, and diferent evaluation metrics to efectively tackle these tasks.</p>
      <sec id="sec-3-1">
        <title>3.1. Track Role Estimation</title>
        <p>The track role (i.e., the function of a sample within a music piece) is challenging to estimate due to its
contextual nature. Defining track roles often relies on human interpretation and overall musical context,
posing challenges in establishing clear class boundaries without nuanced distinctions. Another challenge
arises from the similarity between track roles, with MusiComb standardizing six distinct classes: main
melody, sub melody, rif , accompaniment, pad, and bass. Some roles, like main and sub-melody, share
similar concepts and structures, making the classification process harder. The six-track role classes
can be grouped into three primary macro-groups: Melody, Accompaniment, and Bass, as shown in
Table 1. This grouping highlights structural similarities within each macro-group, complicating their
diferentiation. Additionally, the rif class exhibits similarities with both melody and accompaniment,
as illustrated in Fig. (1).</p>
        <p>In our study, significant efort was devoted to identifying a feature set suitable for training a Support
Vector Machine, due to a preliminary empirical evaluation we made of diferent ML algorithms, for
classifying six distinct MusiComb track roles. We focus on structural musical elements that vary across
classes while maintaining consistency or similarity within each class. The feature set for classification</p>
        <sec id="sec-3-1-1">
          <title>Accompaniment</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Bass</title>
        </sec>
        <sec id="sec-3-1-3">
          <title>Main melody</title>
        </sec>
        <sec id="sec-3-1-4">
          <title>Sub melody Rif</title>
        </sec>
        <sec id="sec-3-1-5">
          <title>Accompaniment Pad</title>
        </sec>
        <sec id="sec-3-1-6">
          <title>Bass</title>
          <p>was selected from those directly accessible from the MIDI protocol or modified versions thereof, ensuring
their availability in external datasets. Since track role definition is independent of harmonic properties,
only structural features were used for classification. In a data-informed approach, using personal
domain knowledge, and based on the primary characteristics of each group, a set of eleven independent
features was chosen for classification.
1-2. Mean chords number &amp; Mean notes number: The number of chords and individual notes is
crucial for the track role. We normalize them to mitigate the impact of sample length diferences,
ensuring a consistent representation of chord and note densities across samples.</p>
          <p>_ℎ_ =
__ =</p>
          <p>ℎ_
_</p>
          <p>_
_
3-4. Chords duration &amp; notes duration: Longer durations indicate greater importance in the score,
potentially influencing classification. To maintain consistency across samples of diferent lengths,
durations are normalized based on the number of measures.
5. Chords note distance: Not all simultaneous note sets adhere to conventional chord definitions 1.</p>
          <p>Hence, we consider the distances between notes within these groups, leveraging the consistent
1Some instances, like those with fewer than three notes, serve to reinforce a melody or enrich harmony.
ratios between note distances in chord modes well-known in music theory. We use a modulo
12 representation for note distances, disregarding octave information, to maintain consistent
distance measurements across octaves.
6. Notes distance: In accompaniments, chords can be played individually or as arpeggios, complicating
classification. To address this, we use the mean distance between individual notes. This metric,
expressed modulo 12, helps diferentiate chord arpeggios based on their mean distances.
7. Number of chord’s note: A single chord provides insights into the sample’s role. While traditionally
three notes define a chord, fewer may indicate bicords, and three or more suggest various structural
elements. This value is represented as the mean number of notes per chord.
8-9-10. Minimum octave, maximum octave and mean octave: Minimum and maximum octaves
set pitch variability boundaries within the track. The mean octave ofers a central reference,
emphasizing the most relevant octave in the piece. Recognizing these boundaries aids in excluding
certain track roles based on expected octave characteristics.
11. Instrument: Instrument features are included to account for their association with specific track
roles or octave ranges.
3.1.1. Preprocessing
SVM model.
3.1.2. Training and results
The ComMU dataset, the only dataset including all the metadata needed for the functioning of MusiComb,
was used for training. Preprocessing involved extracting the 11 relevant features and making adjustments
for robustness against minor variations. All data vectors were scaled using the z-score equation2 to
standardize the range, preventing larger-ranged data points from disproportionately influencing the
The SVM model was trained on the ComMU dataset and fine-tuned using
Grid Search with
crossvalidation to identify the best hyperparameters. Grid search was performed on three kernel types:
RBF, polynomial, and linear, with five</p>
          <p>C values (0.5, 1, 5, 10, 100) and two tolerance values (0.01 and
0.001). The dataset was split into five subsets for cross-validation. Table 2 shows the top 5 SVM models,
including their parameters and performance on each dataset subdivision, together with a K-Nearest
Neighbours baseline.</p>
        </sec>
        <sec id="sec-3-1-7">
          <title>Model Kernel</title>
        </sec>
        <sec id="sec-3-1-8">
          <title>Tolerance</title>
          <p>TAS1* TAS2
TAS3
TAS4
TAS5</p>
        </sec>
        <sec id="sec-3-1-9">
          <title>Mean test score Mean F1 score 0.001 0.01</title>
          <p>2 = −  , where  is the current data point,  is the mean and  the standard deviation.</p>
          <p>it from the main melody and accompaniment. The sub-melody class is frequently misclassified as the
main melody, while the main melody is sometimes incorrectly classified as sub-melody. This unexpected
behaviour could be influenced by the class distribution in the training set. Figure 2 illustrates that the
(a) Train
(b) Test
(c) Test predictions
main melody is more prevalent in the training dataset (2a), and the model has replicated this distribution
in its predictions (2c) on the test set3.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Chord Progression Estimation</title>
        <p>Chord progression estimation involves finding the ordered sequence of chords that accompany a melody.
It relies heavily on harmonic elements such as notes and pitches, leading to variations based on the
sample type.</p>
        <p>
          1. Chord progressions can be non-unique, influenced by the track role of a sample. Melody-like
samples present challenges as a single melody can harmonically match multiple chord
progressions, each producing a unique sound. In contrast, accompaniment and bass-like samples are
typically based on specific chord progressions, requiring a stricter classification approach. For
chord progression estimation, the macro groups from Table 1 will be used to define the three
primary classification domains.
2. The classification task has an extensive solution space defined by the unique possible chord names,
resulting in thousands of potential chord combinations based on harmonic rules. Balancing the
retention of harmonic information by simplifying the space is essential for ease of classification.
3. A harmonic metric tailored for chord progression estimation in melody samples is missing from
the literature. While classification accuracy is important, ensuring harmonic soundness in the
ifnal results is crucial. A metric emphasizing harmonic aspects over mere accuracy is required.
Point 1 suggests training a unique model for each of the three macro groups (Melody, Accompaniment,
and Bass) using a Shared Model Split Weights approach. Although there are significant diferences
between track roles, the estimation tasks are similar. Respectively, points 2 and 3 lead to a simplification
of the labels based on their names described in Section 3.2.1 and the adoption of a metric emphasizing
harmonic soundness, inspired by the work on Mathematical Harmony Analysis [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Moreover,
understanding the temporal or positional context of the sample representation is crucial. Both samples and
chord progressions can be viewed as time series and a time-aware representation is crucial. Building
on the work of Hyungui et al. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], recurrent neural networks based on GRU layers combined with an
autoregressive pipeline have been selected to address this task.
3.2.1. Preprocessing
We adopted a modified version of the Pitch Class Vector (PCV) representation from [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] for feature
preprocessing in our study. The PCV consists of 12 bins representing the 12 possible notes; we scale
these values using normalized Velocity4 to reintroduce accent information by emphasizing notes played
3The dataset split is the same adopted for ComMU and indicated in the dataset’s metadata.
4Velocity is a MIDI parameter that indicates the force applied when pressing a key on a MIDI keyboard. Normalized velocity
is a scaled representation in the [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ] range.
with greater force. Fig. 3 illustrates the modified PCV for a MIDI sample.
original
major
minor
diminished
augmented
alterations (7th, 8th, ...)
simplification
major
minor
        </p>
        <p>/*
major
major/minor
* Due to their dissonant nature, diminished
chords have been excluded from the dataset.</p>
        <p>Conversely, a classical approach using one-hot encoding has been employed for representing chord
names. As illustrated in table 3 all chord names present in the ComMU dataset have been mapped to just
two categories: Major and Minor chords. This reduction narrows down the solution space to 24 possible
elements. Finally, data augmentation was applied to the dataset by transposing samples and their
corresponding chord progressions across various note intervals, resulting in 11 additional entries for
each sample. This technique has been fundamental in developing the chord progression estimator. This
is primarily because transpositions of the same sample should result in corresponding transpositions of
the chord progression output, enhancing the system’s robustness. Injecting this knowledge is neither
direct nor easy; it is something the model must learn during training. This approach ofers a dual
benefit: it increases the dataset size by augmenting the existing samples, a technique known to improve
generalization, and it incorporates transposition invariance into the model during training.
3.2.2. Model Architecture
The model architecture uses an Autoregressive approach with GRU layers to capture temporal patterns.
For a sample with  measures, each measure is analyzed independently to reconstruct temporal
information focusing on chords. The input structure is as follows: Initial conditions, representing the
encoding of the three previous chords at time steps  − 1,  − 2, and  − 3 where all values are set to
zero for the first measure ( START token), measure n, the Pitch Class Vector (PCV) for the considered
feature and measure n+1, the features of the next measure to provide information on the harmonic
direction. Individual learned embeddings are used for each feature, and the final model input is a tensor
obtained by concatenating these embeddings, resulting in a shape of (5, 64).</p>
        <p>(a)
(b)</p>
        <p>The main model architecture features a modular design, with each module consisting of a BiGRU
layer with 128 cells and hyperbolic tangent activation, followed by Layer Normalization and Dropout
with a probability of 0.3. In this specific application, two modules were utilized with a final classification
head included in the model. Figure 4b illustrates the model’s autoregressive pipeline. The one-hot
encoding of each chord is added to the initial conditions as the window shifts forward, allowing the
model to use current and past chord information.
3.2.3. Training &amp; Results
The model predicts chords for each 4-sized window of samples. The ComMU dataset was augmented
and split into three datasets based on track-role macro groups: melody, accompaniment, and bass. The
model was trained thrice on each dataset with consistent hyperparameters5. Table 4 presents the results
of the best model weights evaluated on the corresponding test set for the specific task.</p>
        <p>Test Accuracy</p>
        <p>Test F1 Score</p>
        <sec id="sec-3-2-1">
          <title>Accompaniment Model</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>Melody Model</title>
          <p>Bass Model</p>
          <p>
            The accompaniment and bass models performed better than the melody model, which still achieved
a solid 63% accuracy and 61% F1 score. The confusion matrix highlighted frequent misclassifications,
especially between minor and specific major chord sequences, known as relative minors. This emphasizes
the importance of a metric focusing on harmonic aspects. In [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ], a general rule describing the
pleasantness of a note interval is introduced: two notes played together sound harmonious (consonant)
when their frequency ratio uses small whole numbers. Building upon this concept, we developed a more
robust metric by examining the distribution of frequency ratio values across the entire dataset for both
labels and predictions. Particularly, being  the number of measures in a sample, () the set of
frequencies of notes in that measure and ℎ() the set of frequencies of notes of the chord of that
measure, we define the set of frequency ratios for a measure as:
() = {︁
          </p>
          <p>∼  | ∀  ∈ (), ∀  ∈ ℎ()}︁    ∈ [0, . . . , ]
These two distributions, represented as matrices of numerator-denominator values, were then evaluated
in terms of Cosine Similarity.</p>
          <p>(a)
(b)</p>
          <p>Figure 5 displays the Frequency Ratio Distribution (FRD) matrices for both labels and predictions,
while Table 5 presents the cosine similarities between these distributions for all individual tasks. As
5Initial LR: 10− 3, minimum LR: 10− 7, LR scheduler: OnPlateau, Optimizer: AdamW, Epochs: 100, Batch size: 1024.
observed, the model has adjusted its weights to closely replicate the frequency ratio distribution of the
original labels. Given the assumption that the chord progressions labelled in the dataset sound good,
mimicking this distribution indicates the quality of the model’s predictions which is, thus, as good as
the labelled one.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Expert-MusiComb</title>
      <p>All the technologies discussed in Section 3 were applied to extend MusiComb, especially for dataset
expansion. Additionally, new features were incorporated for automatic sample extraction and flexible
data querying, enhancing the framework’s original functionalities in the newly established
ExpertMusicomb.</p>
      <sec id="sec-4-1">
        <title>4.1. Automatic Sample Extraction</title>
        <p>The automated sample extraction algorithm employs a maximization strategy to identify the longest
uninterrupted sequences of silence in the composition after replacing all detected samples with empty
elements. Given the function:</p>
        <p>Function __inner__(sequence, min_ws, max_ws, initial_index):
if _ &gt; ⌈()/2⌉ then</p>
        <p>return [], 0
else
 ← [];
for  in _, . . . , _ do
 ← _;
while  &lt; () −  do
if [ :  + ] == [ +  :  + 2] then
append [ :  + ] to ;
[ :  + 2] ← ∅ ;
 ←  + 2
else</p>
        <p>←  + 1
end
end
end
return subseq, silence_length
end
end
where, min_ws and max_ws represent the size range of potential samples, and initial_index denotes
the starting position for sliding windows within the sequence. By varying initial_index, diferent sample
collections are generated based on their initial positions. The goal is to identify the collection of samples
specified by initial_index such that the following statement is satisfied:
for  in 0, . . . , _ do
, __ℎ ← __inner__(samples, min_ws, max_ws, i);
return  if __ℎ == () ∨ __ℎ is</p>
        <p>MAX;
end</p>
        <p>Indeed, if the minimum and maximum sample sizes remain constant, as is typically the case since
excessively long samples are avoided, the algorithm operates in linear time.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Dataset Enlargement</title>
        <p>Expanding the dataset is crucial for advancing MusiComb. Integrating additional datasets into the
framework enhances its capacity by augmenting the sample pool. This, in turn, broadens the spectrum
of potential outcomes during generation. Moreover, it introduces fresh values for user-selectable
parameters, such as a wider array of genres and an extended range of chord progressions. The dataset
expansion employed two primary approaches:
• Incorporating additional MIDI datasets and extracting the required features using the methods
previously described in Section 3.
• Creating multiple chord progressions for each melody sample by adjusting the initial conditions
of the chord estimation model.</p>
        <p>Introducing multiple chord progressions for the same melody sample ofers a significant advantage. This
approach expands the dataset without the need for additional samples. Simultaneously, it empowers
the framework to merge the same melody sample with various chord progressions, fostering a higher
level of music generation. Because of the recurrent nature of the chord progression estimation model,
a slight alteration in the initial condition beyond the START token results in diferent outputs for the
same input sample. To implement this efectively in the final approach, a set of varied initial conditions
that guarantee distinctiveness and harmonic consistency in predictions needs to be identified. To
ensure harmonic consistency, the initial conditions are chosen from a subset of possible conditions
encountered by the model during training. Leveraging the RFDs Similarity discussed in Section 3 we can
trust that the model has familiarity with these initial conditions. To reduce the likelihood of repetitions,
a small subset, chosen as the nine most common initial conditions present in the training set, is selected.</p>
        <p>initial conditions* estimated Chord Progression
sample
commu00001.mid
commu00001.mid
commu00001.mid
commu00001.mid
commu00001.mid
commu00001.mid
commu00001.mid
commu00001.mid
commu00001.mid</p>
        <p>[]
vi-IV-V
V-vi-IV
I-IV-V
V-IV-III
vi-iii-IV
ii-V-I</p>
        <sec id="sec-4-2-1">
          <title>I-VIIb-I</title>
          <p>IV-I-V</p>
          <p>F-C-Dm-A#-F-C-Em-D
Dm-Am-Dm-A#-Gm-C-Am-D</p>
          <p>Am-F-C-F-Am-C-Em-D
Am-C-Dm-F-A#-C-Dm-D
Am-G-Dm-F-A#-C-Dm-D
F-Am-Em-F-F-Am-Em-D</p>
          <p>Am-Am-G-F-F-C-Em-D
Dm-Am-A#-A#-Dm-Am-A#-D</p>
          <p>Am-F-C-F-Am-C-Em-D
* Initial conditions are represented in terms of Roman Literals</p>
          <p>Table 6 presents an example of multiple chord progression estimations for the same MIDI sample
together with the set of chosen initial conditions. This method enables the framework to efectively
capture the relationship between a melody and potential chord progressions. Consequently, each
melody sample can be paired with various chord progressions, enhancing the truthfulness of the overall
music generation process without limiting each sample to a single chord progression.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Non-strict Data Query</title>
        <p>In each generation cycle, once the user has specified their desired parameters, the framework seeks
out samples that match the specified parameters and then employs a subset of these samples for music
generation. Following dataset enlargement, the subsequent step to enhance the framework’s capabilities
involves loosening the criteria used for sample selection. The goal is to increase the number of selected
samples for a given set of user-defined parameters (BPMs, genre, time signature, chord progression and
key) while maintaining the harmonic and structural consistency ensured by strict selection rules. Due
to the nature of the parameter, not all selection rules can be made less strict, but certain modifications
are possible, details will be provided in the following subsections.
4.3.1. BPMs
The BPM parameter allows for a more flexible selection approach. Staying close to the desired BPM
value, rather than fixing a specific one, adjustments can be made without significantly altering the
sample’s structure while enlarging the set of eligible samples. For a desired BPM represented by , the
sample’s BPM by (), and a selection indicator ℎ(), the selection rule becomes less strict to
accommodate minor BPM variations by changing from (1) to (2).</p>
        <p>ℎ() ⇐⇒ () = 
ℎ() ⇐⇒  −  ≤ () ≤  + 
(1)
(2)
where,  represents half of the maximum interval for the neighborhood, and  is a user-selectable
parameter ranging from 0 to 1. Table 7 demonstrates that neighbourhood selection rules enable a
rigid selection
neighborhood selection</p>
        <sec id="sec-4-3-1">
          <title>Mean samples per BPM value</title>
          <p>* neighbourhood selection parameters:  = 20 and  = 0.5
broader range of samples to be included for the same BPM value. To prevent the neighbourhood interval
from becoming excessively large and causing matching issues with the samples, the parameter 
remains constant. Additionally, with the neighbourhood selection rule, BPM values that are not present
in the dataset can be chosen as long as they fall within the interval of an existing BPM value. This
marks another advancement compared to MusiComb, where only BPM values existing in the dataset
were available for selection.
4.3.2. Harmonic Key
While the harmonic key is typically considered a less flexible parameter since Samples with diferent
keys may not sound harmonious together, it is still possible to transpose a music piece to the desired
harmonic key. We maintain consistency in the dataset by transposing all samples to the same harmonic
key (A minor/C major) during the feature extraction process. The user can then freely choose the desired
harmonic key. This ensures that most samples in the dataset can be selected, contingent on whether
the user’s chosen key is minor or major. Subsequently, a straightforward transposition to the desired
key is performed after generating the final music piece.</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>Rigid selection (ComMU) transposition (ComMU)</title>
        </sec>
        <sec id="sec-4-3-3">
          <title>A minor</title>
          <p>C major
other minor keys
other major keys
* original ComMU dataset only contains A minor and C major keys.
4.3.3. Chord Progression &amp; Time Signature
We also study how chord progression and time signature can be treated less strictly. The incorporation of
multiple chord progressions for each melody sample, as described Section 4.2, relaxes the selection rule
for chord progressions. This expands the range of possible combinations, enabling a single melody to
harmonize with various chord progressions. Moreover, while two samples with diferent time signatures
cannot be seamlessly layered together, users can be allowed to choose between multiple time signature
selections; this leads to music samples with time signature changes.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In our eforts to expand and enhance the existing MusiComb framework, we introduced a series of
methodologies and techniques that are crucial for its future development. Specifically, we presented
two models for Track Role and Chord Progression estimation, along with a set of new rules for the sample
selection process, which significantly bolster the framework’s capabilities, and an automatic sample
extraction algorithm. Our work underscores the importance of data quality in modern Generative AI
systems, demonstrating that it plays a pivotal role alongside the capabilities of the framework itself.
Combinatorial systems, such as MusiComb, heavily depend on the quality of the information available
about the involved elements. It’s evident that even a state-of-the-art framework, when operating under
incorrect preconditions (such as inaccurate sample information), may produce low-quality outputs.
Conversely, datasets annotated by humans may be limited in size and require substantial time to expand.
Striking a balance between data quality and the time needed to acquire it is therefore crucial for a
continually growing and evolving framework like MusiComb. Our systems have demonstrated robust
consistency with the data labels, serving not merely as tools for feature extraction but also introducing
new degrees of freedom. Even by solely utilizing the original ComMU dataset, we can explore new and
diverse generations while maintaining a high level of reliability in the quality of output.</p>
      <p>
        Furthermore, there is ample opportunity for new introductions and future advancements. As
previously mentioned, MusiComb is an ever-expanding framework capable of further expression beyond its
current capabilities. Among the potential areas for research, significant developments may involve: (1)
modifying the combinatorial backbone to enable the framework to integrate various chord progressions
and time signatures within the same musical piece. (2) Utilizing the Transformer XL, as presented in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
to generate missing samples or roles for specific generations. This approach allows the framework to
seamlessly incorporate both dataset and generated samples within the same piece.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work has been supported by the project TAILOR (funded by European Union’s Horizon 2020
research and innovation programme, GA No. 952215).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hyun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hwang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Park</surname>
          </string-name>
          , S. Han,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Commu: Dataset for combinatorial music generation</article-title>
          ,
          <source>ArXiv abs/2211</source>
          .09385 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Giuliani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ballerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Filippo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Borghesi</surname>
          </string-name>
          ,
          <article-title>Musicomb: a sample-based approach to music generation through constraints</article-title>
          ,
          <source>in: 2023 IEEE 35th International Conference on Tools with Artificial Intelligence (ICTAI)</source>
          ,
          <source>IEEE Computer Society</source>
          , Los Alamitos, CA, USA,
          <year>2023</year>
          , pp.
          <fpage>194</fpage>
          -
          <lpage>198</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>C.-Z. A. Huang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Shazeer</surname>
            , I. Simon,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hawthorne</surname>
            ,
            <given-names>A. M.</given-names>
          </string-name>
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>M. D.</given-names>
          </string-name>
          <string-name>
            <surname>Hofman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dinculescu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Eck</surname>
          </string-name>
          , Music transformer,
          <year>2018</year>
          . arXiv:
          <year>1809</year>
          .04281.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Agostinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. I.</given-names>
            <surname>Denk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Borsos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Engel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Verzetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Caillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tagliasacchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sharifi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Zeghidour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Frank</surname>
          </string-name>
          , Musiclm: Generating music from text,
          <year>2023</year>
          . arXiv:
          <volume>2301</volume>
          .
          <fpage>11325</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Payne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Jukebox: A generative model for music</article-title>
          , ArXiv abs/
          <year>2005</year>
          .00341 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. I.</given-names>
            <surname>Denk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Frank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Engel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chan</surname>
          </string-name>
          , W. Han,
          <article-title>Noise2music: Text-conditioned music generation with difusion models</article-title>
          ,
          <source>ArXiv abs/2302</source>
          .03917 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dadman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Bremdal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dalmo</surname>
          </string-name>
          ,
          <article-title>Toward interactive music generation: A position paper</article-title>
          ,
          <source>IEEE Access 10</source>
          (
          <year>2022</year>
          )
          <fpage>125679</fpage>
          -
          <lpage>125695</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Allan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <article-title>Harmonising chorales by probabilistic inference</article-title>
          , in: L.
          <string-name>
            <surname>Saul</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Weiss</surname>
          </string-name>
          , L. Bottou (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>17</volume>
          , MIT Press,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cope</surname>
          </string-name>
          ,
          <article-title>Computer modeling of musical intelligence in emi</article-title>
          ,
          <source>Computer Music Journal</source>
          <volume>16</volume>
          (
          <year>1992</year>
          )
          <fpage>69</fpage>
          -
          <lpage>83</lpage>
          . URL: http://www.jstor.org/stable/3680717.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S. F.</given-names>
            <surname>Stanisław</surname>
          </string-name>
          <string-name>
            <given-names>A.</given-names>
            <surname>Raczyński</surname>
          </string-name>
          , E. Vincent,
          <article-title>Melody harmonization with interpolated probabilistic models</article-title>
          ,
          <source>Journal of New Music Research</source>
          <volume>42</volume>
          (
          <year>2013</year>
          )
          <fpage>223</fpage>
          -
          <lpage>235</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>J.-F. Paiement</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Eck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Probabilistic melodic harmonization</article-title>
          , in: L.
          <string-name>
            <surname>Lamontagne</surname>
          </string-name>
          , M. Marchand (Eds.),
          <source>Advances in Artificial Intelligence</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2006</year>
          , pp.
          <fpage>218</fpage>
          -
          <lpage>229</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Herremans</surname>
          </string-name>
          , E. Chew, Morpheus:
          <article-title>Generating structured music with constrained patterns and tension</article-title>
          ,
          <source>IEEE Transactions on Afective Computing</source>
          <volume>10</volume>
          (
          <year>2019</year>
          )
          <fpage>510</fpage>
          -
          <lpage>523</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>C. Anderson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Eigenfeldt</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Pasquier</surname>
          </string-name>
          ,
          <article-title>The generative electronic dance music algorithmic system (GEDMAS)</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          <volume>9</volume>
          (
          <year>2021</year>
          )
          <fpage>5</fpage>
          -
          <lpage>8</lpage>
          . URL: https://doi.org/10.1609%
          <fpage>2Faiide</fpage>
          .
          <year>v9i5</year>
          .12649. doi:
          <volume>10</volume>
          .1609/aiide. v9i5.
          <fpage>12649</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pachet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <article-title>Sampling variations of sequences for structured music generation</article-title>
          ,
          <source>in: International Society for Music Information Retrieval Conference</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Nardi</surname>
          </string-name>
          ,
          <article-title>Library music: technology, copyright and authorship</article-title>
          , Current Issues in Music Research: Copyright, Power and Transnational Musical Processes. Lisboa: Edições Colibri (
          <year>2012</year>
          )
          <fpage>73</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zils</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pachet</surname>
          </string-name>
          ,
          <article-title>Musical mosaicing</article-title>
          ,
          <source>in: Digital Audio Efects (DAFx)</source>
          , volume
          <volume>2</volume>
          ,
          <year>2001</year>
          , p.
          <fpage>135</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <article-title>Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching</article-title>
          , Columbia University,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Rafii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Liutkus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.-R.</given-names>
            <surname>Stöter</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. I. Mimilakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bittner</surname>
          </string-name>
          ,
          <article-title>The MUSDB18 corpus for music separation</article-title>
          ,
          <year>2017</year>
          . URL: https://doi.org/10.5281/zenodo.1117372. doi:
          <volume>10</volume>
          .5281/zenodo.1117372.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>R.</given-names>
            <surname>David</surname>
          </string-name>
          ,
          <article-title>Mathematical harmony analysis (</article-title>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hyungui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Seungyeon</surname>
          </string-name>
          , L. Kyogu,
          <article-title>Chord generation from symbolic melody using blstm networks</article-title>
          ,
          <source>18th International Society for Music Information Retrieval Conference (ISMIR</source>
          <year>2017</year>
          )
          <article-title>(</article-title>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>