<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dissonant Ballerinas and Crafty Carrots: A Comparative Multi-modal Analysis of Italian Brain Rot</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anca Dinu</string-name>
          <email>anca.dinu@lls.unibuc.ro</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andra-Maria Florescu</string-name>
          <email>andra-maria.florescu@s.unibuc.ro</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marius Miclut</string-name>
          <email>marius.micluta-campeanu@.s.unibuc.ro</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>a-Câmpeanu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefana-Arina Tăbus</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudiu Creangă</string-name>
          <email>claudiu.creanga@s.unibuc.ro</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreiana Mihail</string-name>
          <email>andreiana.mihail@s.unibuc.ro</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Foreign Languages and Literatures</institution>
          ,
          <addr-line>5-7 Edgar Quinet St, Bucharest, 010017</addr-line>
          ,
          <country country="RO">Romania</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Mathematics and Computer Science</institution>
          ,
          <addr-line>14 Academiei St, Bucharest, 010014</addr-line>
          ,
          <country country="RO">Romania</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Interdisciplinary School of Doctoral Studies</institution>
          ,
          <addr-line>36-46 Mihail Kogălniceanu, Bucharest, 050107</addr-line>
          ,
          <country country="RO">Romania</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Bucharest</institution>
          ,
          <addr-line>90 Panduri Road, Bucharest, 050107</addr-line>
          ,
          <country country="RO">Romania</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper presents a comparative multi-modal analysis of Italian and Romanian brain rot memes, investigating the factors that contribute to its appeal and the linguistic and cultural distinctions between the two versions. To conduct this analysis, we introduce a multi-modal brain rot dataset named CRIB (Collection of Romanian and Italian Brain rot), a manually curated collection of 240 TikTok videos stratified by language (Italian, Romanian) and popularity, on which we examine textual, acoustic, and visual features. Our findings indicate that popularity is not significantly correlated with textual elements like sentiment, absurdity, or rhyme, or acoustic elements such as vocal features or sentiment of the sound. Instead, in Romanian language, video-level dynamics, specifically faster cutting speeds and a more rapid overall pace, are strong predictors of a video's success. The cross-linguistic analysis reveals significant diferences. Italian brain rot is textually more negative, exhibits higher perplexity, and uses more rhyme, while its sound is characterized by higher melodic range and loudness. Romanian audio is spectrally brighter with more erratic pitch variations.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;data set</kwd>
        <kwd>brain rot</kwd>
        <kwd>multi-modal</kwd>
        <kwd>Italian</kwd>
        <kwd>Romanian</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>short content, intentionally created to be absurd,
nonsensical, dissonant and funny, by content developers using
generative AI. One of the earliest examples is Nothing,
Forever in December 20222 (shortly after the launch of
ChatGPT), while Italian brain rot is a more recent trend
that gained popularity in early 2025. As a side note, short
animations like Skibidi toilet series or only in Ohio memes
series are not usually created using generative AI, though
they are also considered a form of brain rot.</p>
      <p>The creator of some of the first Italian brain rots,
like Ballerina Cappuccina, supposedly a Romanian,3,4
describes them as a satiric artistic experiment that both
mocks and celebrates pop culture and kitsch. The
characters in these creations are childlike, weird, and often
grotesque blends of humans, animals, plants and
various objects, named with Italian-sounding names, like
Ballerina Cappuccina in figure 1. The Italian brain rot
phenomenon gained traction especially among Gen Z and
Gen Alpha communities by means of social media
platforms like TikTok and Instagram and has quickly spread
to other languages, including Romanian, with
characters such as Morcoveat, ă in figure 2, a human-like carrot
character adapted from Romanian nursery rhymes.</p>
      <p>A common trait of brain rot is the uncertain or lack of
authorship. As Roland Barthes argued in The Death of the</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <sec id="sec-2-1">
        <title>The first recorded use of ‘brain rot’ is in Henry David</title>
        <p>
          Thoreau’s book Walden, published in in 1854, which
criticizes society’s tendency to devalue complex ideas in favor
of simple ones, indicating a general decline in mental and
intellectual efort: “While England endeavors to cure the
potato rot, will not any endeavor to cure the brain-rot –
which prevails so much more widely and fatally?” [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>Brain rot was the Oxford word of the year 2024.1 Its
primary sense is the supposed deterioration of a
person’s mental or intellectual state as the result of
overconsumption of low-quality, trivial, or non-challenging
online content. Its secondary sense, acquired over the
last year, is the online content itself likely to lead to such
deterioration. In particular, the term came to be used in
the last months to refer to a certain type of multi-modal</p>
      </sec>
      <sec id="sec-2-2">
        <title>Author [2], removing the author takes away a traditional</title>
        <p>source of authority and shifts the focus to the reader and
their interpretation. In a similar way, brain rot content
is mostly anonymous or pseudonymous, remixed and
layered, reflecting an anarchic and collective form of
creation.</p>
        <p>Moreover, brain rot has many common traits with
the Dadaism movement, which was on the same path of
anti-art and anti-meaning, it mocked the art rules and
traditions, and it embraced nonsense and absurdity. It
celebrated chaos and irrationality, at the same time being Figure 1: Example of Italian brain rot character Ballerina
psychedelic in the sense of artistic production from the Cappuccina
70s and 80s when using vivid colors, fractals, surreal
images, and neon gradients. Both forms of art were born
in an age of post truth and information overload.</p>
        <p>Finally, the brain rots’ original purpose seems to be
primarily amusement, but some of them were also used
for manipulation, commercial purposes or even political
propaganda.</p>
        <p>
          This study aims to explore whether brain rot manifests
diferently across cultures and languages. We chose to
begin with Italian brain rot, as it represents one of the
most visible and influential starting points for this
phenomenon. The decision to compare it with Romanian Figure 2: Example of Romanian brain rot character
brain rot is based on both similar and contrasting histori- Morcovea t,ă
cal, linguistic, and cultural characteristics. The main
common grounds between the two languages and cultures are
that they both belong to the Romance language family, well with both surrealism and absurdism, while also
emand that they have experienced authoritarian regimes in bracing the chaos and disjuncture of postmodernity. The
the 20th century that shaped their collective imagination, chimera-like mixing of humans, animals, plants, and
obcreative approaches and imagery. The diferences be- jects is not a new phenomenon, since anthropomorphism
tween the two lie in their geopolitical contexts, religious can be traced back from Egyptian, Greek, and Roman
antraditions (Catholic and Greek Orthodox), and features tiquity, going through Medieval art, all the way to the
of cultural production, which influence the tone, style, surrealist movement at the beginning of the 20th century
and content of their digital aesthetics. At the same time, (Max Ernst, Salvador Dali, or René Magritte).
both Italian and Romanian online cultures remain under- To the best of our knowledge, no computational
rerepresented in cultural and computational studies, which search has been conducted specifically on Italian brain
represents an opportunity to examine how the upper sim- rot. However, internet culture, including memes and viral
ilarities and diferences are reflected in brain rot popular content received a significant amount of attention. The
manifestation. detection of meme toxicity was investigated in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
Mul
        </p>
        <p>
          The main research questions are: (1) What makes a timodal sentiment analysis was conducted by integrating
brain rot go viral (besides algorithm recommendation/- text, image, and audio for improving sentiment detection
manipulation or pure chance)? and (2) Are there any from vlogs, spoken reviews, and human-machine
interaccultural or language diferences between the form or tions in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. A comprehensive survey which categorizes
content of brain rots in these two languages? advances in multimodal sentiment analysis can be found
in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] introduced a new benchmark for detecting
2. Related Work hate speech from multimodal memes. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] combined
LLMgenerated debates and fine-tuned judge models to detect
harmful memes with improved interpretability and
performance. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] proposed a template-based approach for
meme clustering by employing multi-dimensional
similarity features.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Numerous studies are currently interested in under</title>
        <p>
          standing what are the efects of digital content
overconsumption. Most of them focus on psychological,
neuro-biological, or meta-analytical perspectives [
          <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
          ].
        </p>
        <p>This type of digital content certainly breaks
conventional norms of art, narrative, and symbolism, aligning
languages, for optimal comparison. We used ChatGPT4o
for coding assistance and Python in Google Colab to
obThe data set was constructed manually from TikTok tain the graphical illustrations and to perform statistical
videos by searching for candidate examples through vari- tests.
ous methods: direct search queries, tags, the discover fea- Overall, for all brain rots, the negative sentiment was
ture, trending pages, compilation and analysis of video predominant. There were minimal variations in
senticlips, related videos (You may like), and the For You recom- ments in popular versus unpopular brain rots in both
mendations. This process cannot be reliably automated languages, as we can see in figures 3a and 3b for
Romadue to misleading tags being frequently used to influence nian and in figures 3c and 3d for Italian. However, the
the recommendation algorithm. Italian ones contained more negative sentiments than</p>
        <p>The extracted samples are stratified across two dimen- their Romanian counterpart.
sions: language (Italian and Romanian) and popularity We tested for statistical significance of these findings
(popular and unpopular). The dataset is well balanced: and the results are listed in table 1 from the Appendix.
120 brain rots per language, with 60 viral examples and For Romanian diferences of positive, neutral, and
60 less viewed examples, the typical threshold between negative sentiments between popular and unpopular
them being 100k views or at least 10k likes. texts, we performed a multivariate analysis of variance</p>
        <p>Given that we are interested in all aspects of commu- (MANOVA). The results indicate no statistically
signifinication, especially creative language use, we filtered cant multivariate efect. To further investigate potential
out posts with extremely low lexical diversity, as well as diferences at the level of individual sentiments, given
re-uploads, translations, and repetitive songs. potential concerns about normality and homogeneity of</p>
        <p>Moreover, some of the tools used to generate these variance, Mann-Whitney U tests were performed for each
videos can be traced by watermarks, although we note emotional score. The results confirmed that there are no
that some users specifically crop the content or blur such statistically significant diferences.
indicators. In alphabetical order, clips have been created The same tests on the Italian brain rots obtained the
or adjusted with: CapCut, ChatGPT, Hailuo AI, Kling AI same results: no statistical diferences between the
senti(version 1.6), PixVerse.ai, Runway, VEED. TikTok also ments of popular and unpopular texts.
ofers its own tools for video creation. We further performed, for each language, the same</p>
        <p>
          We collected four subcategories of brain rots for each statistical tests only on the general category of brain
language. The general category in each language in- rots, which varies less than the combination of all four
cludes the notorious Italian brain rot characters with brain rot categories (general, regina brain rot, morcoveat, ă,
local adaptations (with 60 examples per language). For and schelet for Romanian, and general, matteo, politicians
Italian, the skeleton category consists of 20 videos with a and celebrities, scheletro for Italian) to test the diferences
poignant tone. The Matteo category comprises 20 memes between popular and unpopular categories. The results
that exhibit more positive attitudes, and the Politicians showed again no significant diferences.
and Celebrities class consists of 20 political satire videos. Finally, a multivariate analysis of variance (MANOVA)
For Romanian, we followed the same count and structure. was conducted to examine whether the distribution of
We have selected schelet, the corresponding category of emotional sentiment scores (positive, neutral, and
negaItalian schletro, Regina brain rot (brain rot queen), featur- tive) difered between Romanian and Italian brain rots.
ing longer stories, and conversely Morcoveat, ă, consisting This time, results were statistically significant, indicating
of very short clips. Throughout this process, we have ex- a robust multivariate efect of language on sentiment
tracted a total of 240 videos with subtitles and metadata.5 composition. To further explore individual sentiment
difWe name the dataset CRIB (Collection of Romanian and ferences between languages, Mann-Whitney U tests were
Italian Brain rot). applied separately for each sentiment category. Results
revealed that positive sentiment was significantly higher
4. Text in Romanian brain rots (p &lt; 0.0001; mean: 0.377 vs. 0.159),
that neutral sentiment was also significantly higher in
4.1. Sentiment Analysis of Text Romanian memes (p &lt; 0.0001; mean: 0.385 vs. 0.171),
and that negative sentiment was significantly higher in
To obtain sentiment analysis scores for the textual tran- Italian ones (p &lt; 0.0001; mean: 0.670 vs. 0.238).
scripts of the brain rots, we employed
cardifnlp/twitterxlm-roberta-base-sentiment pre-trained model [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], 4.2. Semantic Similarity and Perplexity
which returns a percentage of negative, neutral, and
positive sentiments associated with each text in both
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>5Using yt-dlp version 2025.04.30.</title>
        <p>
          To measure the degree of “absurdity” or
“unpredictability” of the brain rots, we employed two complementary
measures: semantic similarity and perplexity.
Semantic similarity was computed for a given text of a brain
rot as the average pairwise cosine similarity between all
word embeddings obtained with the sentence transformer
paraphrase-multilingual-MiniLM-L12-v2 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. This
relfects how semantically cohesive the vocabulary is —
higher values indicate that the words tend to belong to
similar semantic fields or contexts, suggesting internal
consistency or coherence, while lower values indicate
some incoherence or inconsistencies. Perplexity
quantiifes how unpredictable a text is from the perspective of
a pre-trained language model (GPT-2 [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]). Lower
per(a) Sentiment scores for Romanian popular brain rots plexity values indicate that the model finds the sequence
more predictable and fluent, while higher values suggest
syntactic or lexical irregularities, or content that deviates
from typical language patterns.
        </p>
        <p>For both semantic similarity and perplexity scores, we
tested the diferences in means between popular and
unpopular brain rots per language and we also tested the
diference in mean between the two languages. We used
one-way ANOVA and we also conducted non-parametric
Mann-Whitney U tests. The results of the statistical tests
are summarized in table 2 from the Appendix.</p>
        <p>For Romanian brain rots, we evaluated whether the
two metrics, semantic similarity and perplexity, difer
(b) Sentiment scores for Romanian unpopular brain significantly between brain rot texts categorized as
poprots ular versus unpopular. The results of the ANOVA test
revealed no statistically significant diferences.</p>
        <p>The Mann-Whitney tests results also show that there
is no significant diference between popular and
unpopular brain rot texts for neither semantic similarity or
perplexity scores. However, the p-value for semantic
similarity was very close to the significance threshold
(0.068), indicating a slight preference for more
irregular/nonstandard language in popular brain rot texts (M =
168.59 for popular, M = 159.22 for unpopular).</p>
        <p>We also performed statistical tests on the Romanian
general subcategory of brain rots that varies less than the
whole Romanian dataset. The ANOVA results confirmed
(c) Sentiment scores for Italian popular brain rots the ones obtained with the the same statistical tests on
all the data, with similar p-values that did not surpass
the 0.05 threshold. This time, the semantic similarity
p-value of the Mann-Whitney test revealed a statistically
significant diference in semantic similarity (p &lt; 0.05),
suggesting that popular brain rots are slightly more
semantically coherent (with a diference in mean of 0.002)
than unpopular brain rots.</p>
        <p>For Italian, the statistical test results suggest that
neither semantic similarity, nor perplexity difers
significantly across the popular and unpopular brain rot texts.</p>
        <p>As in the case of Romanian, we also tested the
statistical significance of the diferences between popular and
(d) Sentiment scores for Italian unpopular brain rots unpopular texts w.r.t. semantic similarity and perplexity
Figure 3: Sentiment scores (Negative, Neutral, Positive) for scores, for Italian general category of brain rots, but the
each language and popularity group of the brain rot texts results were not statistically significant.</p>
        <p>Cross-linguistically, we tested whether semantic
simi</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Sound</title>
      <p>
        larity and perplexity difer significantly between
Romanian and Italian brain rot texts. The diference in mean
semantic similarity score between Italian (M = 0.59) and 5.1. Audio Processing and Separation
Romanian (M = 0.58) was very small. The perplexity
scores that reflect language uncertainties or language in- We developed a multi-stage audio pipeline to isolate and
consistencies were substantially higher for Italian brain characterize the vocal component of Italian and
Romarot texts (M = 218.74) than for Romanian ones (M = nian brain rot, with the goal of quantifying attributes that
163.91). While one-way ANOVA test revealed no sig- may influence their popularity, as well as perform
comnificant diference in semantic similarity between Italian parisons between the two languages. First, each video file
and Romanian, for perplexity it yielded a statistically sig- was converted to a high-resolution WAV format and
pronificant efect of the language category with a p-value cessed with a state-of-the-art source-separation model
&lt; 0.001, suggesting that the Italian brain rot texts are in- (Demucs) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] to obtain separate vocal and music stems.
deed more language-unpredictable than their Romanian Vocal tracks were then normalized for consistent
loudcounterparts. The Mann-Whitney tests confirmed the ness, ensuring that subsequent analyses would not be
ANOVA results showing no significance for semantic afected by variations in recording level.
similarity (p = 0.06), and a highly significant diference
between the two groups for perplexity (p &lt; 0.0001). 5.2. Speech Features
      </p>
      <sec id="sec-3-1">
        <title>After isolating the vocal stems, we computed a com</title>
        <p>
          4.3. Rhyme prehensive set of acoustic descriptors. Pitch contours
We estimated the rhyme density by the following method- were extracted via librosa’s [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] pyin algorithm,
yieldology. We employed a computational method based on ing mean F0, F0 variance, 95th–5th percentile range, and
sub-string similarity at word endings in two and three slope-entropy to quantify melodic movement. On 25
letters. Since the speech to text automatic transcription ms/10 ms-hop frames, we computed Mel-frequency
cepdid not identify correctly the verses with end-line, we stral coeficients (MFCCs) 1–13 (means and variances),
checked for all rhyme pairs locally, within a distance of spectral centroid, bandwidth, roll-of, zero-crossing rate,
two verses. The rhyme coeficient was calculated as the and spectral-flux (means and variances) to capture
brightratio between the total number of distinct, non-adjacent ness, noisiness, and timbral dynamics. Rhythmic patterns
word pairs that share the same sufix of 2 or 3 letters and were quantified by onset detection—calculating
syllablethe total number of words in the text. rate and pause-duration statistics—and by RMS (root
        </p>
        <p>We tested the diferences in rhyme scores of popular mean square) energy envelopes (temporal centroid, mean,
versus unpopular brain rot texts for both languages, and and variance). Each clip’s features formed one row in a
also the diferences in rhyme scores between the two master data matrix.
languages with Mann-Whitney tests. The results are We then compared these matrices in two ways: within
shown in table 3 from the Appendix. each language (popular vs. unpopular memes) and across</p>
        <p>
          The diference of the means between Romanian popu- languages (Italian vs. Romanian). All continuous features
lar (M = 0.19) and unpopular (M = 0.16) brain rots sug- were tested with Mann–Whitney U and corrected for
gests a slight preference towards more rhymed ones in multiple comparisons using the Benjamini–Hochberg
the popular group. However, for our sample of 60 (30 procedure at q &lt; 0.05 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
popular and 30 unpopular brain rots), this diference is After correction, for the analyses of the popular vs.
not statistically significant (p = 0.135 ≥ 0.05). unpopular groups within each language, no features
re
        </p>
        <p>For Italian, there is also no statistical diference be- mained significant, likely due to modest sample sizes and
tween the rhyme coeficient of the popular and unpop- the number of comparisons. Reporting raw p &lt; 0.05
findular groups, since their means are very close (0.215 vs. ings (visible in Table 5 from the Appendix) highlights,
0.228) and the p-value is 0.6880 (&gt;0.05). though, that in the Italian dataset, popular brain rots</p>
        <p>
          We also compared Romanian and Italian rhyme coefi- exhibited significantly greater fundamental-frequency
cients (120 each). The mean for Italian rhyme coeficient variance (var_f0) and a wider pitch range (range_f0) than
is 0.221, higher than the mean for Romanian which is their unpopular counterparts. These results suggest that
0.177. This time, the Mann-Whitney test returned the a more expressive, melodic content, characterized by
p-value of 0.0001 (&lt;0.05), which means that Italian brain larger and more varied pitch intervals, is associated with
rots do rhyme more than their Romanian counterparts. higher popularity in Italian clips. In the Romanian
corpus, popular audios difered from unpopular ones in
three respects: they featured a marginally faster onset
rate (syllable_rate), a more stable mid-high spectral
texture (mfcc_9_var), and a darker high-frequency timbre
(mfcc_11_mean). These patterns imply that a slightly 5.3. Sentiment Analysis of Sound
quicker rhythm, smoother structure in the mid-high band,
and less pronounced very high frequencies tend to co- Full-audio (voice + music) sentiment was assessed
occur with popularity. None of these comparisons sur- using the pretrained speech-emotion classifier
vived false discovery rate (FDR) correction at q &lt; 0.05, superb/wav2vec2-base-superb-er [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. It categorizes
reflecting exploratory sample size considerations. short audio clips into four emotion labels (angry, happy,
        </p>
        <p>In addition to our broad comparisons, we also exam- neutral, sad). Each brain rot’s waveform was resampled
ined popular versus unpopular clips within each the- to 16 kHz and fed into the model, after which we
matic category—namely general, scheletro, matteo, and captured the full probability distribution over three
politicians and celebrities in Italian and general, schelet, coarse categories, Negative (-1), Neutral (0), and Positive
morcoveat, ă, and regina brain rot in Romanian. After (+1). These labels were mapped from the original model
applying Benjamini–Hochberg correction, only the Ro- outputs as such: Negative (-1) for any emotion other
manian general subset yielded features that remained than happy or neutral (i.e. angry and sad), Neutral (0)
significant. In those 60 clips, one clear acoustic hallmark for labels predicted as neutral, and Positive (+1) for
of popularity is a smoother timbre in the mid-to-high fre- labels predicted as happy. This mapping yields a better
quencies. Specifically, popular memes have a noticeably comparison point for our study, in alignment with the
lower mfcc_9_var (181 vs. 232 in unpopular clips), mean- sentiment analysis performed for the brain rots’ texts.
ing the shape of the spectrum around 3–4 kHz stays more Separate sentiment files were generated for four groups:
consistent rather than flitting up and down. This steadi- Italian-Popular, Italian-Unpopular, Romanian-Popular,
ness makes the audio feel more even and less “choppy”. and Romanian-Unpopular. The distributions for each
There is also a trend toward lower mfcc_11_var (170 audio in the corresponding groups is shown in Figure 4.
vs. 193), which means fewer abrupt jumps in the very When we treat the three sentiment probabilities
(Neghigh frequencies (around 5–6 kHz), so sibilance and hiss ative, Neutral, and Positive) as a multivariate outcome,
are kept under tighter control. These findings suggest neither Italian nor Romanian shows a significant
diferthat Romanian general brain rots become more engaging ence between popular and unpopular brain rot audios.
when their soundtrack maintains a steady, coherent tone. A MANOVA on the Italian data yields Wilks’  = 0.992</p>
        <p>When we examine the full set of 120 Italian and 120 (F(3,116) = 0.30, p = 0.826), and the same test on Romanian
Romanian brain rots side by side (as can be seen in Table returns Wilks’  = 0.966 (F(3,116) = 1.37, p = 0.257).
4 from the Appendix), a clear pattern of prosodic con- However, comparing Italian versus Romanian clips
trast emerges. Italian vocals move through a broader and reveals a modest but statistically significant language
more dynamic pitch range, peaking and dipping over a efect on the combined sentiment vector (Wilks’  = 0.965,
span that is markedly larger than in the Romanian sam- F(3,236) = 2.87, p = 0.037). In other words, the full-audio
ples, which makes them feel more melodically engaging. sentiment difers more by language than by popularity
Romanian clips, on the other hand, swing their pitch con- within a language.
tours in a less predictable fashion, with higher entropy We also used Mann–Whitney tests to look at each
senof slope suggesting sudden twists in intonation that can timent dimension in isolation and to account for potential
come across as more spontaneous or volatile. In the spec- concerns about normality and homogeneity of variance.
tral domain, Romanian brain rot tracks push energy into Comparing Italian against Romanian across all 240 clips,
the upper frequencies: their average spectral centroids, the sentiment score distributions show however no
robandwidths, and roll-of points all sit higher than in Ital- bust separation: Negative (U=6,536; p=0.217), Neutral
ian. Yet these high-frequency bands in Romanian audio (U=8,088; p=0.099), and Positive (U=6,878; p=0.550) all lie
are less stable over time, fluctuating more from moment above the conventional 0.05 threshold. The Neutral score
to moment and lending a grainier, more restless tim- comes closest (p ≈ 0.10), hinting at a possible tendency
bral texture. Italian clips display a darker, more subdued for Italian memes to register slightly higher neutral-vibe
high-end, but they deliver their muted tones with greater probabilities than Romanian ones, but this trend remains
consistency and, at the same time, add sharp bursts of statistically inconclusive.
change, especially in those same upper bands, so that the
audio doesn’t feel monotone. They also sound uniformly 6. Video
louder, amplifying their broad, dramatic pitch gestures
and deep spectral shifts, whereas Romanian tracks em- To analyze the visual content of the 240 videos, we
embrace a leaner, quieter tone. These findings point to two ployed the Gemini Flash 2.5 multimodal model. Each
distinct audio “signatures” that likely reflect both the un- video was individually processed by the model with a
derlying voice synthesis models and the cultural styles prompt instructing it to perform a visual-only analysis
of meme production in each language. and extract a series of predefined attributes. The model
returned its analysis for each video as a structured JSON
(a) Italian Popular Brain Rot Audios
(b) Italian Unpopular Brain Rot Audios
(c) Romanian Popular Brain Rot Audios
(d) Romanian Unpopular Brain Rot Audios
object, which allowed for the systematic collection of
data for our study. In an initial test, the model
demonstrated a notable capability in discerning the language of
origin from visual cues alone, achieving an accuracy of
71.67% (Romanian vs. Italian) by identifying culturally
specific items (e.g., the Carpathian Mountains). However,
it struggled significantly to predict a video’s popularity,
achieving an accuracy of only 47.92% (random chance).</p>
        <p>This initial finding suggests that a video’s success is not
determined by straightforward, immediately classifiable
visual markers of appeal. A deeper statistical analysis
was therefore performed to uncover more subtle visual
attributes that may correlate with popularity (Table 6 in
the Appendix).</p>
        <p>A key finding emerged from the analysis of the videos’
dynamic properties, specifically the rate of the shot
transitions. The data indicates a clear and statistically
significant tendency for popular videos to feature a much
faster cutting speed. Videos categorized with Very Fast or
Fast transitions were substantially more prevalent among
popular content, as it can be seen in figure 5. This
observation is supported by a positive Pearson correlation
of 0.1712 between cutting speed and popularity, which
was found to be statistically significant (p=0.0231). This
suggests that as the frequency of cuts increases, so does
the likelihood of a video being popular, pointing to a
dynamic, high-energy visual style as a key component
of audience engagement.</p>
        <p>Further reinforcing the importance of dynamism, the
overall pacing of the videos also proved to be a significant
diferentiator. A statistically significant diference was
found in the distribution of pacing levels between popular
and unpopular videos, as confirmed by a Mann-Whitney
U Test (p=0.0492). Popular videos were most frequently
described as having a Fast overall pace. Moreover, when
the correlation between pacing and popularity was
analyzed by language, a significant diference was observed.</p>
        <p>For the Romanian videos, we found a positive Pearson Some Romanian brain rot content includes political
procorrelation of 0.2791, indicating that as overall pacing paganda related to the presidential elections, often
carryincreases, videos in Romanian tend to be more popular. ing extremist nationalist undertones. This suggests that
In stark contrast, the Italian videos showed a negligible such content goes beyond seemingly harmless absurdist
correlation of 0.0047, revealing virtually no linear rela- humor and may serve as manipulative material.
tionship between pacing and popularity. This language- The Romanian characters are inspired from the
Romaspecific finding suggests that the overall positive trend nian folklore, such as the Balaur (a dragon-like creature),
observed in the combined dataset is almost exclusively Morcovea t,ă (a carrot shaped boy inspired by Jules
Redriven by the Romanian-language content. While the nard’s Poil de Carotte), from Romanian historical figures
perceived tempo is an important factor, its influence on such as rulers or poets (Mihai Viteazul, Ion Creangă,
popularity appears to be culturally moderated. Mihai Eminescu), or from global pop culture such as
Hat</p>
        <p>In contrast to the clear influence of dynamism, the- sune Miku, Sonic the Hedgehog, or Disney characters,
matic elements such as absurdity and narrative structure portrayed in explicit real-life situations like relationship
showed no significant correlation with video popularity. with siblings or modern dating.</p>
        <p>Although it was hypothesized that surreal, chaotic, or The topics in Italian brain rot also revolve around daily
illogical content might be a good indicator of popularity, routine and politics, with celebrity characters such as
the analysis did not bear this out. Perceived absurdity lev- Volodimir Zelensky or Emmanuel Macron, portrayed
els yielded very low Pearson correlation coeficients and in funny and ironical ways, frequently with misspelled
statistically insignificant p-values in the Mann-Whitney names in order to undermine their authority and to
detests. For instance, the overall absurdity level, despite glamorize them. There are also very popular
characbeing High in the majority of videos, had no discernible ters like Ballerina Cappuccina, a tragic but very graceful
statistical relationship with popularity (p=0.2937). ifgure featuring sometimes grotesque ballet poses, the</p>
        <p>In conclusion, the image-level analysis indicates that ironical and dreamy Skeletro, with emo fragility and self
while the kinetic and rhythmic aspects of a video’s con- deprecating depression and out dated romanticism.
struction are influential, their efect on popularity is not The music is mostly depressive both in popular and
universal. The visual dynamism, characterized by rapid unpopular Italian or Romanian brain rot, written in
micutting and a fast overall tempo, appears to be a powerful nor tonality which gives to the brain rots a serious, sad
element in capturing and retaining audience attention, and dark sounding. We identified with the Shazam tool a
but this trend is highly dependent on the cultural con- variety of musical pieces, from classical music by
Fredtext of the video. Our data shows this relationship is eric Chopin or Max Richter, through very popular tunes
strong for Romanian-language content but non-existent by Ennio Morricone or Bobby McFerrin all the way to
for Italian-language content, suggesting that the prefer- some trap and rap pieces. Most of the titles of these
ence for such a style is culturally specific rather than a pieces include the words spooky, scary, depressive. We
general driver of engagement. noticed that for the popular category the music is slightly
less dark than for the unpopular. Italian popular brain
rots showcase a wide array of soundtracks from various
7. Manual Analysis genres, while unpopular ones typically feature the same
three most used tracks used by viral brain rots. At the
same time, the Romanian unpopular samples contain
diverse soundtracks from obscure sources and underground
artists. Conversely, a single track is present in more than
half of all Romanian popular brain rots. This suggests
that more efort was required at the beginning of this
trend in Italy, but after gaining notoriety, the key to viral
clips was to exploit the same soundtracks as established
by the initial wave, making them more recognizable.</p>
        <p>The visuals are overloaded, kitschy, anti-narrative,
absurd, over-sized, ironic, cynical, and, in a way,
selfdestructive. The popular ones are more animated, present
more complex and colorful imagery, the cutting is
obviously more dynamic, full of narrative.</p>
        <p>One of the most striking brain rot characteristic revealed
by the manual scrutiny of all the brain rots represents
their core element: the dissonance between the tone,
the music and the content. The tone is neutral - an AI
generated voice- but the text is often deviant,
triggering some extreme emotions. There is no harmony or
coherence whatsoever between text, speech, music and
image. Another notable trait is the unsettling blend of
baby talk, nursery rhymes, and children inappropriate
content (topics) and language (slang, pejorative jargon,
NSFW words).</p>
        <p>
          Overall, the brain rots seem a good example of
E(xtended)-creativity, rather than of F(ixed)-creativity,
since they tend to break the rules and not only to create
new content using existing ones [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
        </p>
        <p>The manual analysis of the textual content revealed the
prevalent topics and characters used in both languages.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>8. Conclusion</title>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This study conducted a multi-modal analysis of Italian This research is supported by the project “Romanian Hub
and Romanian brain rot, seeking to identify the factors for Artificial Intelligence - HRIA”, Smart Growth,
Digidriving popularity and to map the cultural and linguis- tization and Financial Instruments Program, 2021-2027,
tic diferences between the two languages. Our findings MySMIS no. 334906 and by CNCS/CCCDI UEFISCDI,
show that the popularity of these memes is not primarily SiRoLa project, PN-IV-P1-PCE-2023-1701, within PNCDI
determined by their textual content. Features like sen- IV, Romania.
timent, narrative absurdity, or rhyme density showed
no significant link to a video’s success. The same can
be said about standard audio features and speech sen- References
timent. Instead, the analysis revealed that popularity
is strongly correlated with visual dynamics, specifically
a faster cutting speed and overall pace (for Romanian
language).</p>
      <p>The research also uncovered clear distinctions between
the Italian and Romanian versions of brain rot. Italian
brain rots were textually more negative, more
unpredictable, and used rhyme more frequently. Acoustically,
their vocals were characterized by greater melodic range
and loudness. Conversely, Romanian content was more
neutral, while its vocals were spectrally brighter and
showed more erratic pitch changes. These diferences
extend to thematic content, with each language favoring
culturally specific characters and references.</p>
      <p>As this is a global phenomenon, we plan to extend
this study to other Romance languages in future work.</p>
      <p>Regarding the small sample size, we intend to include
samples that would not fit in either popular or unpopular
categories, thus rephrasing the popularity aspect as a
continuous problem rather than as binary classification.</p>
      <p>In essence, the success of brain rot appears to depend
on a combination of universal and culturally-specific
elements. While fast-paced, dynamic visuals serve as
a universal driver for engagement, the content itself is
distinctly shaped by the linguistic, acoustic, and thematic
norms of its target culture. What makes the genre unique
is the strange mix of message, image and sound.</p>
    </sec>
    <sec id="sec-6">
      <title>9. Limitations</title>
      <p>One limitation of this study is that the dataset, while
carefully curated, is relatively small and manually collected,
focusing only on Italian and Romanian content. This
may limit the generalization possibilities of our findings
to brain rot in other languages or on a larger scale.
Secondly, our analysis identifies strong correlations, such
as the link between cutting speed and popularity, but
does not establish causation. Other confounding factors
not examined here may be at play. Finally, brain rot is a
rapidly evolving digital phenomenon, and the features
that define it today may change over time, potentially
dating our specific observations.
ppendix</p>
    </sec>
    <sec id="sec-7">
      <title>Supple</title>
      <p>entary</p>
    </sec>
    <sec id="sec-8">
      <title>Statistical Tests Tables</title>
      <p>F(1, 118) = 2.35, p = 0.128
F(1, 118) = 2.38, p = 0.126</p>
      <p>Mann-Whitney
Comparison of key audio features between Italian and Romanian brain rot, after FDR correction that difer at q&lt;0.05
Raw p &lt; 0.05 comparisons of audio features between all popular and unpopular brain rots within Italian and Romanian corpora
Image attributes that might have been a good indicator of video popularity
zcr_mean
flux_var
mfcc_1_var</p>
      <p>Italian
Language
Italian
Romanian
9.90 ×
1.60 ×
D
eclaratio
n
o
n</p>
      <p>G
enerative</p>
      <p>AI
During the
preparation</p>
      <p>Italian
202</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Thoreau</surname>
          </string-name>
          , Walden, Mercer University Press,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Barthes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dove</surname>
          </string-name>
          ,
          <source>The Death of the Author</source>
          ,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. M. F.</given-names>
            <surname>Yousef</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alshamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tlili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H. S.</given-names>
            <surname>Metwally</surname>
          </string-name>
          ,
          <article-title>Demystifying the new dilemma of brain rot in the digital era: A review</article-title>
          ,
          <source>Brain Sciences</source>
          <volume>15</volume>
          (
          <year>2025</year>
          ). URL: https://www.mdpi.com/2076-3425/15/3/283. doi:
          <volume>10</volume>
          .3390/brainsci15030283.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Satici</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. G.</given-names>
            <surname>Tekin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Deniz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Satici</surname>
          </string-name>
          ,
          <article-title>Doomscrolling scale: its association with personality traits, psychological distress, social media use, and wellbeing</article-title>
          ,
          <source>Applied Research in Quality of Life</source>
          <volume>18</volume>
          (
          <year>2023</year>
          )
          <fpage>833</fpage>
          -
          <lpage>847</lpage>
          . URL: https://doi.org/10.1007/s11482-022-10110-7. doi:
          <volume>10</volume>
          . 1007/s11482-022-10110-7.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D. S. M.</given-names>
            <surname>Pandiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. T. K.</given-names>
            <surname>Sang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ceolin</surname>
          </string-name>
          ,
          <article-title>Toxic memes: A survey of computational perspectives on the detection and explanation of meme toxicities</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2406.07353. arXiv:
          <volume>2406</volume>
          .
          <fpage>07353</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Soleymani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schuller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-F.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pantic</surname>
          </string-name>
          ,
          <article-title>A survey of multimodal sentiment analysis</article-title>
          ,
          <source>Image and Vision Computing</source>
          <volume>65</volume>
          (
          <year>2017</year>
          )
          <fpage>3</fpage>
          -
          <lpage>14</lpage>
          . URL: https://www.sciencedirect. com/science/article/pii/S0262885617301191. doi:https://doi.org/10.1016/j.imavis.
          <year>2017</year>
          .
          <volume>08</volume>
          .003, multimodal
          <article-title>Sentiment Analysis and Mining in the Wild Image</article-title>
          and
          <string-name>
            <given-names>Vision</given-names>
            <surname>Computing</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gandhi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Adhvaryu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Poria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Cambria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hussain</surname>
          </string-name>
          ,
          <article-title>Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions</article-title>
          ,
          <source>Information Fusion</source>
          <volume>91</volume>
          (
          <year>2023</year>
          )
          <fpage>424</fpage>
          -
          <lpage>444</lpage>
          . URL: https://www.sciencedirect.com/science/ article/pii/S1566253522001634. doi:https://doi. org/10.1016/j.inffus.
          <year>2022</year>
          .
          <volume>09</volume>
          .025.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Firooz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Goswami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ringshia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Testuggine</surname>
          </string-name>
          ,
          <article-title>The hateful memes challenge: Detecting hate speech in multimodal memes</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/
          <year>2005</year>
          .04790. arXiv:
          <year>2005</year>
          .04790.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Towards explainable harmful meme detection through multimodal debate between large language models</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2401.13298. arXiv:
          <volume>2401</volume>
          .
          <fpage>13298</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bloem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ilievski</surname>
          </string-name>
          ,
          <article-title>Clustering internet memes through template matching and multi-dimensional similarity</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2505. 00056. arXiv:
          <volume>2505</volume>
          .
          <fpage>00056</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Barbieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Espinosa</given-names>
            <surname>Anke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Camacho-Collados</surname>
          </string-name>
          ,
          <article-title>XLM-T: Multilingual language models in Twitter for sentiment analysis and beyond</article-title>
          ,
          <source>in: Proceedings of the Thirteenth Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>258</fpage>
          -
          <lpage>266</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .lrec-
          <volume>1</volume>
          .
          <fpage>27</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          , Sentence-BERT:
          <article-title>Sentence embeddings using Siamese BERT-networks</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>3982</fpage>
          -
          <lpage>3992</lpage>
          . URL: https://aclanthology.org/D19-1410/. doi:
          <volume>10</volume>
          . 18653/v1/
          <fpage>D19</fpage>
          -1410.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <article-title>Language models are unsupervised multitask learners, OpenAI blog (</article-title>
          <year>2019</year>
          ). URL: https://cdn.openai.
          <article-title>com/better-language-models/ language_models_are_unsupervised_multitask_ learners</article-title>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rouard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Défossez</surname>
          </string-name>
          ,
          <article-title>Hybrid transformers for music source separation</article-title>
          ,
          <source>in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICASSP49357.
          <year>2023</year>
          .
          <volume>10096956</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B.</given-names>
            <surname>McFee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>McVicar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Faronbi</surname>
          </string-name>
          , I. Roman,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gover</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Librosa</surname>
          </string-name>
          ,
          <year>2025</year>
          . URL: https://doi.org/ 10.5281/zenodo.15006942. doi:
          <volume>10</volume>
          .5281/zenodo. 15006942.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Benjamini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hochberg</surname>
          </string-name>
          ,
          <article-title>Controlling the false discovery rate: A practical and powerful approach to multiple testing</article-title>
          ,
          <source>Journal of the Royal Statistical Society: Series B (Methodological) 57</source>
          (
          <year>2018</year>
          )
          <fpage>289</fpage>
          -
          <lpage>300</lpage>
          . URL: https://doi.org/10. 1111/j.2517-
          <fpage>6161</fpage>
          .
          <year>1995</year>
          .tb02031.x. doi:
          <volume>10</volume>
          .1111/j. 2517-
          <fpage>6161</fpage>
          .
          <year>1995</year>
          .tb02031.x.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17] S.-w. Yang,
          <string-name>
            <given-names>P.-H.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-S.</given-names>
            <surname>Chuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-I. J.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lakhotia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.-T.</given-names>
            <surname>Lin</surname>
          </string-name>
          , et al.,
          <source>SUPERB: Speech processing Universal PERformance Benchmark</source>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2105</volume>
          .
          <fpage>01051</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dinu</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.-M. Florescu</surname>
          </string-name>
          ,
          <article-title>Testing language creativity of large language models and humans</article-title>
          , in: M.
          <string-name>
            <surname>Hämäläinen</surname>
            , E. Öhman,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Bizzoni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Miyagawa</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Alnajjar (Eds.),
          <source>Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities</source>
          , Association for Computational Linguistics, Albuquerque, USA,
          <year>2025</year>
          , pp.
          <fpage>426</fpage>
          -
          <lpage>436</lpage>
          . URL: https://aclanthology.org/
          <year>2025</year>
          .nlp4dh-
          <fpage>1</fpage>
          . 37/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2025</year>
          .nlp4dh-
          <fpage>1</fpage>
          .37.
          <source>M = 0.195 vs. 0.161 M = 0.215 vs. 0.228 M = 0.222 vs. 0</source>
          .178 p =
          <volume>0</volume>
          .135 p =
          <volume>0</volume>
          .688 p =
          <volume>0</volume>
          .0001 spec_
          <article-title>centroid_mean (Hz) spec_bandwidth_mean (Hz) spec_rollof_mean (</article-title>
          <source>Hz) 4570.8 162.0 1.79 0.141 1961 1.24e6 1936 2.45e5 3625 3.50e6 0.0975 0.00868 8.84 -197.4 1.79e4 0.1098 0.01162 9.73 -220.7 2.30e4 86.4 4.24e3 -11.34 20.7</source>
          790
          <fpage>499</fpage>
          -
          <lpage>2</lpage>
          .
          <fpage>52</fpage>
          -
          <lpage>8</lpage>
          .57 271
          <fpage>210</fpage>
          -
          <lpage>7</lpage>
          .
          <fpage>59</fpage>
          -
          <lpage>4</lpage>
          .92
          <fpage>167</fpage>
          -
          <lpage>2</lpage>
          .
          <source>40 98 101 q-value 0.6001 0.0162 0.0492 0.0858 0.2485 0.2785 0.2937 3.73e3 93.7 25</source>
          .3 892
          <fpage>627</fpage>
          -
          <lpage>4</lpage>
          .
          <fpage>48</fpage>
          -
          <lpage>11</lpage>
          .42
          <fpage>318</fpage>
          -
          <lpage>13</lpage>
          .44
          <fpage>218</fpage>
          -
          <lpage>9</lpage>
          .
          <fpage>10</fpage>
          -
          <lpage>5</lpage>
          .82
          <fpage>203</fpage>
          -
          <lpage>3</lpage>
          .
          <source>49 112 105 0.0160 0.0028 0.0260 0.0062 0.0300 4.29 -5.65 2484</source>
          <volume>135 4.09</volume>
          <fpage>217</fpage>
          -
          <lpage>4</lpage>
          .
          <fpage>18</fpage>
          <string-name>
            <surname>Mann-Whitney</surname>
            <given-names>P</given-names>
          </string-name>
          -value
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>