<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TunesFormer: Forming Irish Tunes with Control Codes by Bar Patching</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shangda Wu</string-name>
          <email>shangda@mail.ccom.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaobing Li</string-name>
          <email>lxiaobing@ccom.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Feng Yu</string-name>
          <email>yufeng@ccom.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maosong Sun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Technology, Tsinghua University</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Music AI and Information Technology, Central Conservatory of Music</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>HCMIR23: 2nd Workshop on Human-Centric Music Information Research</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper introduces TunesFormer, an eficient Transformer-based dual-decoder model specifically designed for the generation of melodies that adhere to user-defined musical forms. Trained on 214,122 Irish tunes, TunesFormer utilizes techniques including bar patching and control codes. Bar patching reduces sequence length and generation time, while control codes guide TunesFormer in producing melodies that conform to desired musical forms. Our evaluation demonstrates TunesFormer's superior eficiency, being 3.22 times faster than GPT-2 and 1.79 times faster than a model with linear complexity of equal scale while ofering comparable performance in controllability and other metrics. TunesFormer provides a novel tool for musicians, composers, and music enthusiasts alike to explore the vast landscape of Irish music. Our model and code are available at GitHub.</p>
      </abstract>
      <kwd-group>
        <kwd>Irish music</kwd>
        <kwd>melody generation</kwd>
        <kwd>control codes</kwd>
        <kwd>bar patching</kwd>
        <kwd>dual-decoder architecture</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>L : 1 / 8 SOE M : 4 / 4 SOE K : E m i n SOE | : PS F SP | EOS G 2 PS F G PS B G F E PS | ] SOE … ESO</p>
      <sec id="sec-1-1">
        <title>Character-level Transformer Decoder</title>
        <p>L : 1 / 8
M : 4 / 4
K : E m i n
| : PS F SP |
G 2 PS F G PS B G F E PS | ]
…
…
…</p>
      </sec>
      <sec id="sec-1-2">
        <title>Patch-level Transformer Decoder</title>
      </sec>
      <sec id="sec-1-3">
        <title>Linear Projection of Flattened Bar Patches</title>
        <p>0
1
2
3
4
N</p>
        <sec id="sec-1-3-1">
          <title>Shifted Outputs</title>
        </sec>
        <sec id="sec-1-3-2">
          <title>Patch Features</title>
        </sec>
        <sec id="sec-1-3-3">
          <title>Position +</title>
        </sec>
        <sec id="sec-1-3-4">
          <title>Patch Embeds</title>
        </sec>
        <sec id="sec-1-3-5">
          <title>Bar Patches</title>
          <p>START</p>
          <p>L:1/8</p>
          <p>M:4/4</p>
          <p>K:Emin
|: F |</p>
          <p>G2 FG BGFE |]</p>
          <p>The key contributions of this paper are as follows:
• As a dual-decoder model based on bar patching, TunesFormer significantly accelerates
generation speed while maintaining the quality of the generated music.
• TunesFormer enables users to generate melodies with diverse musical forms, providing
lfexibility and alignment with artistic vision through control codes.
• To support future research, we release the Irish Massive ABC Notation (IrishMAN)
dataset, an open-source collection of 216,284 Irish tunes in the ABC notation format.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>
        2.1. TunesFormer
TunesFormer uses bar patching [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] for melody generation, leveraging the ABC notation format1
ideal for representing Irish music. Bar patching divides scores into segments, such as bars,
shortening sequences and enhancing eficiency without sacrificing musical integrity.
      </p>
      <p>Fig. 1 showcases TunesFormer’s dual-decoder design. Bar patches are converted into
embeddings that input to the patch-level decoder, producing patch features. These are input to the
character-level decoder, which translates the patch features into the ABC notation sequences.</p>
      <p>Given  as sequence length and  as patch size, bar patching reduces the patch-level decoder
complexity from ( 2) to  (  22 ). Meanwhile, the character-level decoder complexity becomes
( ) . Considering  and  as parameter sizes for patch and character-level decoders
respectively, computational need shifts from ( +  ) ⋅  2 to  ⋅ (  22 ) +  ⋅  . This is particularly
advantageous for large sequences, high  to  ratios, and optimal  choices.</p>
      <p>
        In our implementation,  = 4096 ,  = 32 , yielding a 128 patch-length. The patch-level has 9
layers, and the character-level has 3, both with a 768 hidden size.
2.2. Control Codes
Inspired by CTRL [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], TunesFormer integrates control codes to denote musical forms. These
codes precede the ABC notation, letting users dictate tune structures. Introduced codes are:
• S:number of sections - Dictates melody sections, ranging 1-8 (e.g., S : 1 for a
singlesection melody, and S : 8 for a melody with eight sections), based on symbols like
[ | ,| | ,| ] ,| : ,: : , and : | used to represent section boundaries.
• B:number of bars - Sets number of bars within a section. It counts on the bar symbol | .
      </p>
      <p>
        The range is 1 to 32 (e.g., B : 1 for a one-bar section, and B : 3 2 for a section with 32 bars).
• E:edit distance similarity - Manages similarity between section  and previous section
 . Derived from Levenshtein distance [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]  (, ) , it measures section diferences:
(, ) = 1 −
      </p>
      <p>(, )
(||, ||)
(1)
where || and || are the string lengths of the two sections. It is discretized into 11 levels,
ranging from 0 to 10 (e.g., E : 0 for no similarity, and E : 1 0 for an exact match). For the  -th
section, there are  − 1 previous sections to compare with.</p>
      <p>
        While earlier methods leaned on hand-crafted rules or limited training data [
        <xref ref-type="bibr" rid="ref17 ref8">8, 17</xref>
        ], our control
codes directly extract precise musical form information from ABC notation, thus leveraging
large datasets to improve understanding of musical structures.
2.3. Dataset
The IrishMAN dataset2 has 216,284 Irish ABC tunes. 99% (214,122) are for training and 1% (2,162)
for validation, sourced from thesession.org and abcnotation.com. Uniformity is maintained by
converting tunes to XML and back using scripts3, with natural language fields removed.
      </p>
      <p>
        Tunes have control codes from ABC symbols (Section 2.2) indicating musical forms. The
music21-filtered subset[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] contains 34,211 human-annotated lead sheets. This subset helped
TunesFormer generate harmonized melodies. In addition, all tunes are public domain, ensuring
ethical and legal use for research and creative projects.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <p>
        In the experiments, we used baselines like LSTM [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for generating ABC notation, GPT-2 [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
for music generation [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ], and RWKV [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], which rivals Transformers in performance. All
models were trained on the same IrishMAN dataset split with character-level ABC tokenization,
using random sampling for decoding. The evaluation involved two objective metrics based on
1,000 tunes generated from scratch per model:
2https://huggingface.co/datasets/sander-wood/irishman
3https://wim.vree.org/svgParse/
      </p>
      <p>We used comparative evaluations due to the inconsistency in human values. Thirteen Irish
musicians compared melody pairs: one from thesession.org with chord symbols, and a
modelgenerated continuation from the initial two bars. Tune choice and order were randomized to
avoid bias. Participants selected the melody that best aligned with the below descriptions:
• Engagement: Captivating to the ear, evokes emotional resonance, and maintains the
listener’s interest.
• Authenticity: Representing the distinctive characteristics of Irish traditional music.
• Harmoniousness: Creating a natural flow that unifies melody and harmony into a
cohesive and pleasing musical experience.</p>
      <p>• Playability: Well-suited for performance and ofers a wide range of playing techniques.</p>
      <p>Participants chose between three options for each melody pair: 0 for human-composed, 1 for
model-generated, and 0.5 for no preference. Thus, scores ranged from 0 to 1. Participants were
instructed to skip melodies they were already familiar with to avoid bias.</p>
      <p>Table 1 shows the evaluation of music generation models. TunesFormer, with 88,425,984
parameters and a Transformer base, is 3.22 times faster than GPT-2 and 1.79 times faster than
RWKV. Its dual-decoder architecture focuses on character generation, explaining its eficiency
despite its large size. It is worth highlighting that TunesFormer’s eficiency does not come at the
expense of its performance. Particularly noteworthy is its remarkable controllability, matching
the highest scores achieved in authenticity and playability. The performance is enhanced
by the interaction between the patch-level and character-level decoders, where the former
contextualizes bar features, enabling the latter to create coherent compositions. In essence,
TunesFormer’s dual-decoder design boosts eficiency in melody generation without sacrificing
quality, and shows a significant advantage over its competitors in the field.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>This paper presents TunesFormer, a model that generates melodies using control codes and
bar patching. The use of control codes enhances user interaction, enabling personalized and
customizable music generation. The dual-decoder architecture employed by TunesFormer,
combined with its bar patching mechanism, yields significant improvements in generation
speed without compromising the quality of the generated music. Future directions include
incorporating more musical features and applying TunesFormer to various cultural traditions.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The authors gratefully acknowledge the financial support from the Special Program of National
Natural Science Foundation of China (Grant No. T2341003), the Advanced Discipline
Construction Project of Beijing Universities, the Major Program of National Social Science Fund of China
(Grant No. 21ZD19), and the Nation Culture and Tourism Technological Innovation Engineering
Project (Research and Application of 3D Music).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Dubnov,
          <string-name>
            <surname>G. Xia,</surname>
          </string-name>
          <article-title>The efect of explicit structure encoding of deep neural networks for symbolic music generation</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2018</year>
          ).
          <article-title>a r X i v : 1 8 1 1 . 0 8 3 8 0</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Makris</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Herremans,</surname>
          </string-name>
          <article-title>Hierarchical recurrent neural networks for conditional melody generation with long-term structure</article-title>
          ,
          <source>in: International Joint Conference on Neural Networks, IJCNN</source>
          <year>2021</year>
          , Shenzhen, China,
          <source>July 18-22</source>
          ,
          <year>2021</year>
          , IEEE,
          <year>2021</year>
          .
          <source>doi:1 0 . 1 1 0 9 / I J C N N 5 2</source>
          <volume>3 8 7 . 2 0 2 1 . 9 5 3 3 4 9 3 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Naruse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Takahata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mukuta</surname>
          </string-name>
          , T. Harada,
          <article-title>Pop music generation with controllable phrase lengths</article-title>
          ,
          <source>in: Proc. of the 23rd Int. Society for Music Information Retrieval Conf</source>
          .,
          <string-name>
            <surname>Bengaluru</surname>
          </string-name>
          , India,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Zhang,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Structure-enhanced pop music generation via harmony-aware learning</article-title>
          ,
          <source>in: MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14</source>
          ,
          <year>2022</year>
          , ACM,
          <year>2022</year>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>4 5 / 3 5 0 3 1 6 1 . 3 5 4 8 0 8 4 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , Popmnet:
          <article-title>Generating structured pop music melodies using neural networks</article-title>
          ,
          <source>Artif. Intell</source>
          . (
          <year>2020</year>
          ).
          <source>doi:1 0 . 1 0 1 6 / j . a r t i n t . 2</source>
          <volume>0 2 0 . 1 0 3 3 0 3 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Melons: generating melody with long-term structure using transformers and structure graph</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/ abs/2110.05020.
          <source>doi:1 0 . 4 8</source>
          <volume>5 5</volume>
          <fpage>0</fpage>
          <string-name>
            <surname>/ A R X I</surname>
          </string-name>
          <article-title>V . 2 1 1 0 . 0 5 0 2 0</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gomes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Dannenberg</surname>
          </string-name>
          ,
          <article-title>Controllable deep melody generation via hierarchical music structure representation</article-title>
          ,
          <source>in: Proceedings of the 22nd International Society for Music Information Retrieval Conference</source>
          ,
          <string-name>
            <surname>ISMIR</surname>
          </string-name>
          <year>2021</year>
          , Online, November 7-
          <issue>12</issue>
          ,
          <year>2021</year>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , T. Liu, Meloform:
          <article-title>Generating melody with musical form based on expert systems and neural networks</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2022</year>
          ).
          <article-title>a r X i v : 2 2 0 8 . 1 4 3 4 5</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B. L.</given-names>
            <surname>Sturm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ben-Tal</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Korshunova</surname>
          </string-name>
          ,
          <article-title>Music transcription modelling and composition using deep learning</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2016</year>
          ).
          <article-title>a r X i v : 1 6 0 4 . 0 8 7 2 3</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Geerlings</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Merono-Penuela</surname>
          </string-name>
          ,
          <article-title>Interacting with gpt-2 to generate controlled and believable musical sequences in abc notation</article-title>
          ,
          <source>in: Proceedings of the 1st Workshop on NLP for Music and Audio (NLP4MusA)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9</source>
          ,
          <year>2017</year>
          , Long Beach, CA, USA,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          , I. Simon,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hawthorne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dinculescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Eck</surname>
          </string-name>
          ,
          <article-title>Music transformer: Generating music with long-term structure</article-title>
          ,
          <source>in: 7th International Conference on Learning Representations, ICLR</source>
          <year>2019</year>
          ,
          <article-title>New Orleans</article-title>
          , LA, USA, May 6-
          <issue>9</issue>
          ,
          <year>2019</year>
          , OpenReview.net,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Sun, Exploring the eficacy of pre-trained checkpoints in text-to-music generation task</article-title>
          ,
          <source>in: The AAAI-23 Workshop on Creative AI Across Modalities</source>
          ,
          <year>2023</year>
          . URL: https: //openreview.net/forum?id=QmWXskBhesn.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Clamp: Contrastive language-music pre-training for crossmodal symbolic music information retrieval</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2023</year>
          ). URL: https://doi.org/10.48550/ arXiv.2304.11029.
          <article-title>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 3 0 4 . 1 1 0 2 9 . a r X i v : 2 3 0 4 . 1 1 0 2 9</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Keskar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>McCann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Varshney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <article-title>CTRL: A conditional transformer language model for controllable generation</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2019</year>
          ). URL: http://arxiv. org/abs/
          <year>1909</year>
          .05858.
          <article-title>a r X i v : 1 9 0 9 . 0 5 8 5 8</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>V. I.</given-names>
            <surname>Levenshtein</surname>
          </string-name>
          , et al.,
          <article-title>Binary codes capable of correcting deletions, insertions, and reversals</article-title>
          , in: Soviet physics doklady,
          <source>Soviet Union</source>
          ,
          <year>1966</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Xia,
          <article-title>Learning hierarchical metrical structure beyond measures</article-title>
          ,
          <source>in: Proceedings of the 23rd International Society for Music Information Retrieval Conference</source>
          ,
          <string-name>
            <surname>ISMIR</surname>
          </string-name>
          <year>2022</year>
          , Bengaluru, India, December 4-
          <issue>8</issue>
          ,
          <year>2022</year>
          ,
          <year>2022</year>
          . URL: https://archives. ismir.net/ismir2022/paper/000023.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Cuthbert</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Ariza, Music21: A toolkit for computer-aided musicology and symbolic music data</article-title>
          ,
          <source>International Society for Music Information Retrieval</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <article-title>Language models are unsupervised multitask learners, OpenAI blog (</article-title>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>B.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Alcaide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Anthony</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Albalak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Arcadinho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cao</surname>
          </string-name>
          , X. Cheng, M. Chung,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grella</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. K. G. V.</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Hou</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Kazienko</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kocon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kong</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Koptyra</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Lau</surname>
            ,
            <given-names>K. S. I.</given-names>
          </string-name>
          <string-name>
            <surname>Mantri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Mom</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Saito</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J. S.</given-names>
          </string-name>
          <string-name>
            <surname>Wind</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Wozniak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>RWKV: reinventing rnns for the transformer era</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2023</year>
          ). URL: https://doi.org/10.48550/arXiv.2305.13048.
          <article-title>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 3 0 5 . 1 3 0 4 8 . a r X i v : 2 3 0 5 . 1 3 0 4 8</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>