-

Dissonant Ballerinas and Crafty Carrots: A Comparative Multi-modal Analysis of Italian Brain Rot

Anca Dinu

anca.dinu@lls.unibuc.ro 0 3

Andra-Maria Florescu

andra-maria.florescu@s.unibuc.ro 2 3

Marius Miclut

marius.micluta-campeanu@.s.unibuc.ro

a-Câmpeanu

1 2 3

Stefana-Arina Tăbus

Claudiu Creangă

claudiu.creanga@s.unibuc.ro 1 2 3

Andreiana Mihail

andreiana.mihail@s.unibuc.ro 1 3 0 Faculty of Foreign Languages and Literatures , 5-7 Edgar Quinet St, Bucharest, 010017 , Romania 1 Faculty of Mathematics and Computer Science , 14 Academiei St, Bucharest, 010014 , Romania 2 Interdisciplinary School of Doctoral Studies , 36-46 Mihail Kogălniceanu, Bucharest, 050107 , Romania 3 University of Bucharest , 90 Panduri Road, Bucharest, 050107 , Romania

2025

This paper presents a comparative multi-modal analysis of Italian and Romanian brain rot memes, investigating the factors that contribute to its appeal and the linguistic and cultural distinctions between the two versions. To conduct this analysis, we introduce a multi-modal brain rot dataset named CRIB (Collection of Romanian and Italian Brain rot), a manually curated collection of 240 TikTok videos stratified by language (Italian, Romanian) and popularity, on which we examine textual, acoustic, and visual features. Our findings indicate that popularity is not significantly correlated with textual elements like sentiment, absurdity, or rhyme, or acoustic elements such as vocal features or sentiment of the sound. Instead, in Romanian language, video-level dynamics, specifically faster cutting speeds and a more rapid overall pace, are strong predictors of a video's success. The cross-linguistic analysis reveals significant diferences. Italian brain rot is textually more negative, exhibits higher perplexity, and uses more rhyme, while its sound is characterized by higher melodic range and loudness. Romanian audio is spectrally brighter with more erratic pitch variations.

eol>data set brain rot multi-modal Italian Romanian

short content, intentionally created to be absurd, nonsensical, dissonant and funny, by content developers using generative AI. One of the earliest examples is Nothing, Forever in December 20222 (shortly after the launch of ChatGPT), while Italian brain rot is a more recent trend that gained popularity in early 2025. As a side note, short animations like Skibidi toilet series or only in Ohio memes series are not usually created using generative AI, though they are also considered a form of brain rot.

The creator of some of the first Italian brain rots, like Ballerina Cappuccina, supposedly a Romanian,3,4 describes them as a satiric artistic experiment that both mocks and celebrates pop culture and kitsch. The characters in these creations are childlike, weird, and often grotesque blends of humans, animals, plants and various objects, named with Italian-sounding names, like Ballerina Cappuccina in figure 1. The Italian brain rot phenomenon gained traction especially among Gen Z and Gen Alpha communities by means of social media platforms like TikTok and Instagram and has quickly spread to other languages, including Romanian, with characters such as Morcoveat, ă in figure 2, a human-like carrot character adapted from Romanian nursery rhymes.

A common trait of brain rot is the uncertain or lack of authorship. As Roland Barthes argued in The Death of the

1. Introduction The first recorded use of ‘brain rot’ is in Henry David

Thoreau’s book Walden, published in in 1854, which criticizes society’s tendency to devalue complex ideas in favor of simple ones, indicating a general decline in mental and intellectual efort: “While England endeavors to cure the potato rot, will not any endeavor to cure the brain-rot – which prevails so much more widely and fatally?” [ 1 ].

Brain rot was the Oxford word of the year 2024.1 Its primary sense is the supposed deterioration of a person’s mental or intellectual state as the result of overconsumption of low-quality, trivial, or non-challenging online content. Its secondary sense, acquired over the last year, is the online content itself likely to lead to such deterioration. In particular, the term came to be used in the last months to refer to a certain type of multi-modal

Author [2], removing the author takes away a traditional

source of authority and shifts the focus to the reader and their interpretation. In a similar way, brain rot content is mostly anonymous or pseudonymous, remixed and layered, reflecting an anarchic and collective form of creation.

Moreover, brain rot has many common traits with the Dadaism movement, which was on the same path of anti-art and anti-meaning, it mocked the art rules and traditions, and it embraced nonsense and absurdity. It celebrated chaos and irrationality, at the same time being Figure 1: Example of Italian brain rot character Ballerina psychedelic in the sense of artistic production from the Cappuccina 70s and 80s when using vivid colors, fractals, surreal images, and neon gradients. Both forms of art were born in an age of post truth and information overload.

Finally, the brain rots’ original purpose seems to be primarily amusement, but some of them were also used for manipulation, commercial purposes or even political propaganda.

This study aims to explore whether brain rot manifests diferently across cultures and languages. We chose to begin with Italian brain rot, as it represents one of the most visible and influential starting points for this phenomenon. The decision to compare it with Romanian Figure 2: Example of Romanian brain rot character brain rot is based on both similar and contrasting histori- Morcovea t,ă cal, linguistic, and cultural characteristics. The main common grounds between the two languages and cultures are that they both belong to the Romance language family, well with both surrealism and absurdism, while also emand that they have experienced authoritarian regimes in bracing the chaos and disjuncture of postmodernity. The the 20th century that shaped their collective imagination, chimera-like mixing of humans, animals, plants, and obcreative approaches and imagery. The diferences be- jects is not a new phenomenon, since anthropomorphism tween the two lie in their geopolitical contexts, religious can be traced back from Egyptian, Greek, and Roman antraditions (Catholic and Greek Orthodox), and features tiquity, going through Medieval art, all the way to the of cultural production, which influence the tone, style, surrealist movement at the beginning of the 20th century and content of their digital aesthetics. At the same time, (Max Ernst, Salvador Dali, or René Magritte). both Italian and Romanian online cultures remain under- To the best of our knowledge, no computational rerepresented in cultural and computational studies, which search has been conducted specifically on Italian brain represents an opportunity to examine how the upper sim- rot. However, internet culture, including memes and viral ilarities and diferences are reflected in brain rot popular content received a significant amount of attention. The manifestation. detection of meme toxicity was investigated in [ 5 ]. Mul

The main research questions are: (1) What makes a timodal sentiment analysis was conducted by integrating brain rot go viral (besides algorithm recommendation/- text, image, and audio for improving sentiment detection manipulation or pure chance)? and (2) Are there any from vlogs, spoken reviews, and human-machine interaccultural or language diferences between the form or tions in [ 6 ]. A comprehensive survey which categorizes content of brain rots in these two languages? advances in multimodal sentiment analysis can be found in [ 7 ]. [ 8 ] introduced a new benchmark for detecting 2. Related Work hate speech from multimodal memes. [ 9 ] combined LLMgenerated debates and fine-tuned judge models to detect harmful memes with improved interpretability and performance. [ 10 ] proposed a template-based approach for meme clustering by employing multi-dimensional similarity features.

Numerous studies are currently interested in under

standing what are the efects of digital content overconsumption. Most of them focus on psychological, neuro-biological, or meta-analytical perspectives [ 3, 4 ].

This type of digital content certainly breaks conventional norms of art, narrative, and symbolism, aligning languages, for optimal comparison. We used ChatGPT4o for coding assistance and Python in Google Colab to obThe data set was constructed manually from TikTok tain the graphical illustrations and to perform statistical videos by searching for candidate examples through vari- tests. ous methods: direct search queries, tags, the discover fea- Overall, for all brain rots, the negative sentiment was ture, trending pages, compilation and analysis of video predominant. There were minimal variations in senticlips, related videos (You may like), and the For You recom- ments in popular versus unpopular brain rots in both mendations. This process cannot be reliably automated languages, as we can see in figures 3a and 3b for Romadue to misleading tags being frequently used to influence nian and in figures 3c and 3d for Italian. However, the the recommendation algorithm. Italian ones contained more negative sentiments than

The extracted samples are stratified across two dimen- their Romanian counterpart. sions: language (Italian and Romanian) and popularity We tested for statistical significance of these findings (popular and unpopular). The dataset is well balanced: and the results are listed in table 1 from the Appendix. 120 brain rots per language, with 60 viral examples and For Romanian diferences of positive, neutral, and 60 less viewed examples, the typical threshold between negative sentiments between popular and unpopular them being 100k views or at least 10k likes. texts, we performed a multivariate analysis of variance

Given that we are interested in all aspects of commu- (MANOVA). The results indicate no statistically signifinication, especially creative language use, we filtered cant multivariate efect. To further investigate potential out posts with extremely low lexical diversity, as well as diferences at the level of individual sentiments, given re-uploads, translations, and repetitive songs. potential concerns about normality and homogeneity of

Moreover, some of the tools used to generate these variance, Mann-Whitney U tests were performed for each videos can be traced by watermarks, although we note emotional score. The results confirmed that there are no that some users specifically crop the content or blur such statistically significant diferences. indicators. In alphabetical order, clips have been created The same tests on the Italian brain rots obtained the or adjusted with: CapCut, ChatGPT, Hailuo AI, Kling AI same results: no statistical diferences between the senti(version 1.6), PixVerse.ai, Runway, VEED. TikTok also ments of popular and unpopular texts. ofers its own tools for video creation. We further performed, for each language, the same

We collected four subcategories of brain rots for each statistical tests only on the general category of brain language. The general category in each language in- rots, which varies less than the combination of all four cludes the notorious Italian brain rot characters with brain rot categories (general, regina brain rot, morcoveat, ă, local adaptations (with 60 examples per language). For and schelet for Romanian, and general, matteo, politicians Italian, the skeleton category consists of 20 videos with a and celebrities, scheletro for Italian) to test the diferences poignant tone. The Matteo category comprises 20 memes between popular and unpopular categories. The results that exhibit more positive attitudes, and the Politicians showed again no significant diferences. and Celebrities class consists of 20 political satire videos. Finally, a multivariate analysis of variance (MANOVA) For Romanian, we followed the same count and structure. was conducted to examine whether the distribution of We have selected schelet, the corresponding category of emotional sentiment scores (positive, neutral, and negaItalian schletro, Regina brain rot (brain rot queen), featur- tive) difered between Romanian and Italian brain rots. ing longer stories, and conversely Morcoveat, ă, consisting This time, results were statistically significant, indicating of very short clips. Throughout this process, we have ex- a robust multivariate efect of language on sentiment tracted a total of 240 videos with subtitles and metadata.5 composition. To further explore individual sentiment difWe name the dataset CRIB (Collection of Romanian and ferences between languages, Mann-Whitney U tests were Italian Brain rot). applied separately for each sentiment category. Results revealed that positive sentiment was significantly higher 4. Text in Romanian brain rots (p < 0.0001; mean: 0.377 vs. 0.159), that neutral sentiment was also significantly higher in 4.1. Sentiment Analysis of Text Romanian memes (p < 0.0001; mean: 0.385 vs. 0.171), and that negative sentiment was significantly higher in To obtain sentiment analysis scores for the textual tran- Italian ones (p < 0.0001; mean: 0.670 vs. 0.238). scripts of the brain rots, we employed cardifnlp/twitterxlm-roberta-base-sentiment pre-trained model [ 11 ], 4.2. Semantic Similarity and Perplexity which returns a percentage of negative, neutral, and positive sentiments associated with each text in both

5Using yt-dlp version 2025.04.30.

To measure the degree of “absurdity” or “unpredictability” of the brain rots, we employed two complementary measures: semantic similarity and perplexity. Semantic similarity was computed for a given text of a brain rot as the average pairwise cosine similarity between all word embeddings obtained with the sentence transformer paraphrase-multilingual-MiniLM-L12-v2 [ 12 ]. This relfects how semantically cohesive the vocabulary is — higher values indicate that the words tend to belong to similar semantic fields or contexts, suggesting internal consistency or coherence, while lower values indicate some incoherence or inconsistencies. Perplexity quantiifes how unpredictable a text is from the perspective of a pre-trained language model (GPT-2 [ 13 ]). Lower per(a) Sentiment scores for Romanian popular brain rots plexity values indicate that the model finds the sequence more predictable and fluent, while higher values suggest syntactic or lexical irregularities, or content that deviates from typical language patterns.

For both semantic similarity and perplexity scores, we tested the diferences in means between popular and unpopular brain rots per language and we also tested the diference in mean between the two languages. We used one-way ANOVA and we also conducted non-parametric Mann-Whitney U tests. The results of the statistical tests are summarized in table 2 from the Appendix.

For Romanian brain rots, we evaluated whether the two metrics, semantic similarity and perplexity, difer (b) Sentiment scores for Romanian unpopular brain significantly between brain rot texts categorized as poprots ular versus unpopular. The results of the ANOVA test revealed no statistically significant diferences.

The Mann-Whitney tests results also show that there is no significant diference between popular and unpopular brain rot texts for neither semantic similarity or perplexity scores. However, the p-value for semantic similarity was very close to the significance threshold (0.068), indicating a slight preference for more irregular/nonstandard language in popular brain rot texts (M = 168.59 for popular, M = 159.22 for unpopular).

We also performed statistical tests on the Romanian general subcategory of brain rots that varies less than the whole Romanian dataset. The ANOVA results confirmed (c) Sentiment scores for Italian popular brain rots the ones obtained with the the same statistical tests on all the data, with similar p-values that did not surpass the 0.05 threshold. This time, the semantic similarity p-value of the Mann-Whitney test revealed a statistically significant diference in semantic similarity (p < 0.05), suggesting that popular brain rots are slightly more semantically coherent (with a diference in mean of 0.002) than unpopular brain rots.

For Italian, the statistical test results suggest that neither semantic similarity, nor perplexity difers significantly across the popular and unpopular brain rot texts.

As in the case of Romanian, we also tested the statistical significance of the diferences between popular and (d) Sentiment scores for Italian unpopular brain rots unpopular texts w.r.t. semantic similarity and perplexity Figure 3: Sentiment scores (Negative, Neutral, Positive) for scores, for Italian general category of brain rots, but the each language and popularity group of the brain rot texts results were not statistically significant.

Cross-linguistically, we tested whether semantic simi

5. Sound

larity and perplexity difer significantly between Romanian and Italian brain rot texts. The diference in mean semantic similarity score between Italian (M = 0.59) and 5.1. Audio Processing and Separation Romanian (M = 0.58) was very small. The perplexity scores that reflect language uncertainties or language in- We developed a multi-stage audio pipeline to isolate and consistencies were substantially higher for Italian brain characterize the vocal component of Italian and Romarot texts (M = 218.74) than for Romanian ones (M = nian brain rot, with the goal of quantifying attributes that 163.91). While one-way ANOVA test revealed no sig- may influence their popularity, as well as perform comnificant diference in semantic similarity between Italian parisons between the two languages. First, each video file and Romanian, for perplexity it yielded a statistically sig- was converted to a high-resolution WAV format and pronificant efect of the language category with a p-value cessed with a state-of-the-art source-separation model < 0.001, suggesting that the Italian brain rot texts are in- (Demucs) [ 14 ] to obtain separate vocal and music stems. deed more language-unpredictable than their Romanian Vocal tracks were then normalized for consistent loudcounterparts. The Mann-Whitney tests confirmed the ness, ensuring that subsequent analyses would not be ANOVA results showing no significance for semantic afected by variations in recording level. similarity (p = 0.06), and a highly significant diference between the two groups for perplexity (p < 0.0001). 5.2. Speech Features

After isolating the vocal stems, we computed a com

4.3. Rhyme prehensive set of acoustic descriptors. Pitch contours We estimated the rhyme density by the following method- were extracted via librosa’s [ 15 ] pyin algorithm, yieldology. We employed a computational method based on ing mean F0, F0 variance, 95th–5th percentile range, and sub-string similarity at word endings in two and three slope-entropy to quantify melodic movement. On 25 letters. Since the speech to text automatic transcription ms/10 ms-hop frames, we computed Mel-frequency cepdid not identify correctly the verses with end-line, we stral coeficients (MFCCs) 1–13 (means and variances), checked for all rhyme pairs locally, within a distance of spectral centroid, bandwidth, roll-of, zero-crossing rate, two verses. The rhyme coeficient was calculated as the and spectral-flux (means and variances) to capture brightratio between the total number of distinct, non-adjacent ness, noisiness, and timbral dynamics. Rhythmic patterns word pairs that share the same sufix of 2 or 3 letters and were quantified by onset detection—calculating syllablethe total number of words in the text. rate and pause-duration statistics—and by RMS (root

We tested the diferences in rhyme scores of popular mean square) energy envelopes (temporal centroid, mean, versus unpopular brain rot texts for both languages, and and variance). Each clip’s features formed one row in a also the diferences in rhyme scores between the two master data matrix. languages with Mann-Whitney tests. The results are We then compared these matrices in two ways: within shown in table 3 from the Appendix. each language (popular vs. unpopular memes) and across

The diference of the means between Romanian popu- languages (Italian vs. Romanian). All continuous features lar (M = 0.19) and unpopular (M = 0.16) brain rots sug- were tested with Mann–Whitney U and corrected for gests a slight preference towards more rhymed ones in multiple comparisons using the Benjamini–Hochberg the popular group. However, for our sample of 60 (30 procedure at q < 0.05 [ 16 ]. popular and 30 unpopular brain rots), this diference is After correction, for the analyses of the popular vs. not statistically significant (p = 0.135 ≥ 0.05). unpopular groups within each language, no features re

For Italian, there is also no statistical diference be- mained significant, likely due to modest sample sizes and tween the rhyme coeficient of the popular and unpop- the number of comparisons. Reporting raw p < 0.05 findular groups, since their means are very close (0.215 vs. ings (visible in Table 5 from the Appendix) highlights, 0.228) and the p-value is 0.6880 (>0.05). though, that in the Italian dataset, popular brain rots

We also compared Romanian and Italian rhyme coefi- exhibited significantly greater fundamental-frequency cients (120 each). The mean for Italian rhyme coeficient variance (var_f0) and a wider pitch range (range_f0) than is 0.221, higher than the mean for Romanian which is their unpopular counterparts. These results suggest that 0.177. This time, the Mann-Whitney test returned the a more expressive, melodic content, characterized by p-value of 0.0001 (<0.05), which means that Italian brain larger and more varied pitch intervals, is associated with rots do rhyme more than their Romanian counterparts. higher popularity in Italian clips. In the Romanian corpus, popular audios difered from unpopular ones in three respects: they featured a marginally faster onset rate (syllable_rate), a more stable mid-high spectral texture (mfcc_9_var), and a darker high-frequency timbre (mfcc_11_mean). These patterns imply that a slightly 5.3. Sentiment Analysis of Sound quicker rhythm, smoother structure in the mid-high band, and less pronounced very high frequencies tend to co- Full-audio (voice + music) sentiment was assessed occur with popularity. None of these comparisons sur- using the pretrained speech-emotion classifier vived false discovery rate (FDR) correction at q < 0.05, superb/wav2vec2-base-superb-er [ 17 ]. It categorizes reflecting exploratory sample size considerations. short audio clips into four emotion labels (angry, happy,

In addition to our broad comparisons, we also exam- neutral, sad). Each brain rot’s waveform was resampled ined popular versus unpopular clips within each the- to 16 kHz and fed into the model, after which we matic category—namely general, scheletro, matteo, and captured the full probability distribution over three politicians and celebrities in Italian and general, schelet, coarse categories, Negative (-1), Neutral (0), and Positive morcoveat, ă, and regina brain rot in Romanian. After (+1). These labels were mapped from the original model applying Benjamini–Hochberg correction, only the Ro- outputs as such: Negative (-1) for any emotion other manian general subset yielded features that remained than happy or neutral (i.e. angry and sad), Neutral (0) significant. In those 60 clips, one clear acoustic hallmark for labels predicted as neutral, and Positive (+1) for of popularity is a smoother timbre in the mid-to-high fre- labels predicted as happy. This mapping yields a better quencies. Specifically, popular memes have a noticeably comparison point for our study, in alignment with the lower mfcc_9_var (181 vs. 232 in unpopular clips), mean- sentiment analysis performed for the brain rots’ texts. ing the shape of the spectrum around 3–4 kHz stays more Separate sentiment files were generated for four groups: consistent rather than flitting up and down. This steadi- Italian-Popular, Italian-Unpopular, Romanian-Popular, ness makes the audio feel more even and less “choppy”. and Romanian-Unpopular. The distributions for each There is also a trend toward lower mfcc_11_var (170 audio in the corresponding groups is shown in Figure 4. vs. 193), which means fewer abrupt jumps in the very When we treat the three sentiment probabilities (Neghigh frequencies (around 5–6 kHz), so sibilance and hiss ative, Neutral, and Positive) as a multivariate outcome, are kept under tighter control. These findings suggest neither Italian nor Romanian shows a significant diferthat Romanian general brain rots become more engaging ence between popular and unpopular brain rot audios. when their soundtrack maintains a steady, coherent tone. A MANOVA on the Italian data yields Wilks’ = 0.992

When we examine the full set of 120 Italian and 120 (F(3,116) = 0.30, p = 0.826), and the same test on Romanian Romanian brain rots side by side (as can be seen in Table returns Wilks’ = 0.966 (F(3,116) = 1.37, p = 0.257). 4 from the Appendix), a clear pattern of prosodic con- However, comparing Italian versus Romanian clips trast emerges. Italian vocals move through a broader and reveals a modest but statistically significant language more dynamic pitch range, peaking and dipping over a efect on the combined sentiment vector (Wilks’ = 0.965, span that is markedly larger than in the Romanian sam- F(3,236) = 2.87, p = 0.037). In other words, the full-audio ples, which makes them feel more melodically engaging. sentiment difers more by language than by popularity Romanian clips, on the other hand, swing their pitch con- within a language. tours in a less predictable fashion, with higher entropy We also used Mann–Whitney tests to look at each senof slope suggesting sudden twists in intonation that can timent dimension in isolation and to account for potential come across as more spontaneous or volatile. In the spec- concerns about normality and homogeneity of variance. tral domain, Romanian brain rot tracks push energy into Comparing Italian against Romanian across all 240 clips, the upper frequencies: their average spectral centroids, the sentiment score distributions show however no robandwidths, and roll-of points all sit higher than in Ital- bust separation: Negative (U=6,536; p=0.217), Neutral ian. Yet these high-frequency bands in Romanian audio (U=8,088; p=0.099), and Positive (U=6,878; p=0.550) all lie are less stable over time, fluctuating more from moment above the conventional 0.05 threshold. The Neutral score to moment and lending a grainier, more restless tim- comes closest (p ≈ 0.10), hinting at a possible tendency bral texture. Italian clips display a darker, more subdued for Italian memes to register slightly higher neutral-vibe high-end, but they deliver their muted tones with greater probabilities than Romanian ones, but this trend remains consistency and, at the same time, add sharp bursts of statistically inconclusive. change, especially in those same upper bands, so that the audio doesn’t feel monotone. They also sound uniformly 6. Video louder, amplifying their broad, dramatic pitch gestures and deep spectral shifts, whereas Romanian tracks em- To analyze the visual content of the 240 videos, we embrace a leaner, quieter tone. These findings point to two ployed the Gemini Flash 2.5 multimodal model. Each distinct audio “signatures” that likely reflect both the un- video was individually processed by the model with a derlying voice synthesis models and the cultural styles prompt instructing it to perform a visual-only analysis of meme production in each language. and extract a series of predefined attributes. The model returned its analysis for each video as a structured JSON (a) Italian Popular Brain Rot Audios (b) Italian Unpopular Brain Rot Audios (c) Romanian Popular Brain Rot Audios (d) Romanian Unpopular Brain Rot Audios object, which allowed for the systematic collection of data for our study. In an initial test, the model demonstrated a notable capability in discerning the language of origin from visual cues alone, achieving an accuracy of 71.67% (Romanian vs. Italian) by identifying culturally specific items (e.g., the Carpathian Mountains). However, it struggled significantly to predict a video’s popularity, achieving an accuracy of only 47.92% (random chance).

This initial finding suggests that a video’s success is not determined by straightforward, immediately classifiable visual markers of appeal. A deeper statistical analysis was therefore performed to uncover more subtle visual attributes that may correlate with popularity (Table 6 in the Appendix).

A key finding emerged from the analysis of the videos’ dynamic properties, specifically the rate of the shot transitions. The data indicates a clear and statistically significant tendency for popular videos to feature a much faster cutting speed. Videos categorized with Very Fast or Fast transitions were substantially more prevalent among popular content, as it can be seen in figure 5. This observation is supported by a positive Pearson correlation of 0.1712 between cutting speed and popularity, which was found to be statistically significant (p=0.0231). This suggests that as the frequency of cuts increases, so does the likelihood of a video being popular, pointing to a dynamic, high-energy visual style as a key component of audience engagement.

Further reinforcing the importance of dynamism, the overall pacing of the videos also proved to be a significant diferentiator. A statistically significant diference was found in the distribution of pacing levels between popular and unpopular videos, as confirmed by a Mann-Whitney U Test (p=0.0492). Popular videos were most frequently described as having a Fast overall pace. Moreover, when the correlation between pacing and popularity was analyzed by language, a significant diference was observed.

For the Romanian videos, we found a positive Pearson Some Romanian brain rot content includes political procorrelation of 0.2791, indicating that as overall pacing paganda related to the presidential elections, often carryincreases, videos in Romanian tend to be more popular. ing extremist nationalist undertones. This suggests that In stark contrast, the Italian videos showed a negligible such content goes beyond seemingly harmless absurdist correlation of 0.0047, revealing virtually no linear rela- humor and may serve as manipulative material. tionship between pacing and popularity. This language- The Romanian characters are inspired from the Romaspecific finding suggests that the overall positive trend nian folklore, such as the Balaur (a dragon-like creature), observed in the combined dataset is almost exclusively Morcovea t,ă (a carrot shaped boy inspired by Jules Redriven by the Romanian-language content. While the nard’s Poil de Carotte), from Romanian historical figures perceived tempo is an important factor, its influence on such as rulers or poets (Mihai Viteazul, Ion Creangă, popularity appears to be culturally moderated. Mihai Eminescu), or from global pop culture such as Hat

In contrast to the clear influence of dynamism, the- sune Miku, Sonic the Hedgehog, or Disney characters, matic elements such as absurdity and narrative structure portrayed in explicit real-life situations like relationship showed no significant correlation with video popularity. with siblings or modern dating.

Although it was hypothesized that surreal, chaotic, or The topics in Italian brain rot also revolve around daily illogical content might be a good indicator of popularity, routine and politics, with celebrity characters such as the analysis did not bear this out. Perceived absurdity lev- Volodimir Zelensky or Emmanuel Macron, portrayed els yielded very low Pearson correlation coeficients and in funny and ironical ways, frequently with misspelled statistically insignificant p-values in the Mann-Whitney names in order to undermine their authority and to detests. For instance, the overall absurdity level, despite glamorize them. There are also very popular characbeing High in the majority of videos, had no discernible ters like Ballerina Cappuccina, a tragic but very graceful statistical relationship with popularity (p=0.2937). ifgure featuring sometimes grotesque ballet poses, the

In conclusion, the image-level analysis indicates that ironical and dreamy Skeletro, with emo fragility and self while the kinetic and rhythmic aspects of a video’s con- deprecating depression and out dated romanticism. struction are influential, their efect on popularity is not The music is mostly depressive both in popular and universal. The visual dynamism, characterized by rapid unpopular Italian or Romanian brain rot, written in micutting and a fast overall tempo, appears to be a powerful nor tonality which gives to the brain rots a serious, sad element in capturing and retaining audience attention, and dark sounding. We identified with the Shazam tool a but this trend is highly dependent on the cultural con- variety of musical pieces, from classical music by Fredtext of the video. Our data shows this relationship is eric Chopin or Max Richter, through very popular tunes strong for Romanian-language content but non-existent by Ennio Morricone or Bobby McFerrin all the way to for Italian-language content, suggesting that the prefer- some trap and rap pieces. Most of the titles of these ence for such a style is culturally specific rather than a pieces include the words spooky, scary, depressive. We general driver of engagement. noticed that for the popular category the music is slightly less dark than for the unpopular. Italian popular brain rots showcase a wide array of soundtracks from various 7. Manual Analysis genres, while unpopular ones typically feature the same three most used tracks used by viral brain rots. At the same time, the Romanian unpopular samples contain diverse soundtracks from obscure sources and underground artists. Conversely, a single track is present in more than half of all Romanian popular brain rots. This suggests that more efort was required at the beginning of this trend in Italy, but after gaining notoriety, the key to viral clips was to exploit the same soundtracks as established by the initial wave, making them more recognizable.

The visuals are overloaded, kitschy, anti-narrative, absurd, over-sized, ironic, cynical, and, in a way, selfdestructive. The popular ones are more animated, present more complex and colorful imagery, the cutting is obviously more dynamic, full of narrative.

One of the most striking brain rot characteristic revealed by the manual scrutiny of all the brain rots represents their core element: the dissonance between the tone, the music and the content. The tone is neutral - an AI generated voice- but the text is often deviant, triggering some extreme emotions. There is no harmony or coherence whatsoever between text, speech, music and image. Another notable trait is the unsettling blend of baby talk, nursery rhymes, and children inappropriate content (topics) and language (slang, pejorative jargon, NSFW words).

Overall, the brain rots seem a good example of E(xtended)-creativity, rather than of F(ixed)-creativity, since they tend to break the rules and not only to create new content using existing ones [ 18 ].

The manual analysis of the textual content revealed the prevalent topics and characters used in both languages.

8. Conclusion Acknowledgments

This study conducted a multi-modal analysis of Italian This research is supported by the project “Romanian Hub and Romanian brain rot, seeking to identify the factors for Artificial Intelligence - HRIA”, Smart Growth, Digidriving popularity and to map the cultural and linguis- tization and Financial Instruments Program, 2021-2027, tic diferences between the two languages. Our findings MySMIS no. 334906 and by CNCS/CCCDI UEFISCDI, show that the popularity of these memes is not primarily SiRoLa project, PN-IV-P1-PCE-2023-1701, within PNCDI determined by their textual content. Features like sen- IV, Romania. timent, narrative absurdity, or rhyme density showed no significant link to a video’s success. The same can be said about standard audio features and speech sen- References timent. Instead, the analysis revealed that popularity is strongly correlated with visual dynamics, specifically a faster cutting speed and overall pace (for Romanian language).

The research also uncovered clear distinctions between the Italian and Romanian versions of brain rot. Italian brain rots were textually more negative, more unpredictable, and used rhyme more frequently. Acoustically, their vocals were characterized by greater melodic range and loudness. Conversely, Romanian content was more neutral, while its vocals were spectrally brighter and showed more erratic pitch changes. These diferences extend to thematic content, with each language favoring culturally specific characters and references.

As this is a global phenomenon, we plan to extend this study to other Romance languages in future work.

Regarding the small sample size, we intend to include samples that would not fit in either popular or unpopular categories, thus rephrasing the popularity aspect as a continuous problem rather than as binary classification.

In essence, the success of brain rot appears to depend on a combination of universal and culturally-specific elements. While fast-paced, dynamic visuals serve as a universal driver for engagement, the content itself is distinctly shaped by the linguistic, acoustic, and thematic norms of its target culture. What makes the genre unique is the strange mix of message, image and sound.

9. Limitations

One limitation of this study is that the dataset, while carefully curated, is relatively small and manually collected, focusing only on Italian and Romanian content. This may limit the generalization possibilities of our findings to brain rot in other languages or on a larger scale. Secondly, our analysis identifies strong correlations, such as the link between cutting speed and popularity, but does not establish causation. Other confounding factors not examined here may be at play. Finally, brain rot is a rapidly evolving digital phenomenon, and the features that define it today may change over time, potentially dating our specific observations. ppendix

Supple

entary

Statistical Tests Tables

F(1, 118) = 2.35, p = 0.128 F(1, 118) = 2.38, p = 0.126

Mann-Whitney Comparison of key audio features between Italian and Romanian brain rot, after FDR correction that difer at q<0.05 Raw p < 0.05 comparisons of audio features between all popular and unpopular brain rots within Italian and Romanian corpora Image attributes that might have been a good indicator of video popularity zcr_mean flux_var mfcc_1_var

Italian Language Italian Romanian 9.90 × 1.60 × D eclaratio n o n

G enerative

AI During the preparation

Italian 202

[1]

Thoreau , Walden, Mercer University Press, 2011 .

[2]

Barthes ,

Heath ,

Dove , The Death of the Author , 1977 .

[3]

A. M. F.

Yousef ,

Alshamy ,

Tlili ,

A. H. S.

Metwally , Demystifying the new dilemma of brain rot in the digital era: A review , Brain Sciences 15 ( 2025 ). URL: https://www.mdpi.com/2076-3425/15/3/283. doi: 10 .3390/brainsci15030283.

[4]

S. A.

Satici ,

E. G.

Tekin ,

M. E.

Deniz ,

Satici , Doomscrolling scale: its association with personality traits, psychological distress, social media use, and wellbeing , Applied Research in Quality of Life 18 ( 2023 ) 833 - 847 . URL: https://doi.org/10.1007/s11482-022-10110-7. doi: 10 . 1007/s11482-022-10110-7.

[5]

D. S. M.

Pandiani ,

E. T. K.

Sang ,

Ceolin , Toxic memes: A survey of computational perspectives on the detection and explanation of meme toxicities , 2024 . URL: https://arxiv.org/abs/2406.07353. arXiv: 2406 . 07353 .

[6]

Soleymani ,

Garcia ,

Jou ,

Schuller ,

S.-F.

Chang ,

Pantic , A survey of multimodal sentiment analysis , Image and Vision Computing 65 ( 2017 ) 3 - 14 . URL: https://www.sciencedirect. com/science/article/pii/S0262885617301191. doi:https://doi.org/10.1016/j.imavis. 2017 . 08 .003, multimodal Sentiment Analysis and Mining in the Wild Image and

Vision

Computing .

[7]

Gandhi ,

Adhvaryu ,

Poria ,

Cambria ,

Hussain , Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions , Information Fusion 91 ( 2023 ) 424 - 444 . URL: https://www.sciencedirect.com/science/ article/pii/S1566253522001634. doi:https://doi. org/10.1016/j.inffus. 2022 . 09 .025.

[8]

Kiela ,

Firooz ,

Mohan ,

Goswami ,

Singh ,

Ringshia ,

Testuggine , The hateful memes challenge: Detecting hate speech in multimodal memes , 2021 . URL: https://arxiv.org/abs/ 2005 .04790. arXiv: 2005 .04790.

[9]

Lin ,

Luo ,

Gao ,

Ma ,

Wang ,

Yang , Towards explainable harmful meme detection through multimodal debate between large language models , 2024 . URL: https://arxiv.org/abs/2401.13298. arXiv: 2401 . 13298 .

[10]

Bloem ,

Ilievski , Clustering internet memes through template matching and multi-dimensional similarity , 2025 . URL: https://arxiv.org/abs/2505. 00056. arXiv: 2505 . 00056 .

[11]

Barbieri ,

L. Espinosa

Anke ,

Camacho-Collados , XLM-T: Multilingual language models in Twitter for sentiment analysis and beyond , in: Proceedings of the Thirteenth Language Resources and Evaluation Conference , European Language Resources Association, Marseille, France, 2022 , pp. 258 - 266 . URL: https://aclanthology.org/ 2022 .lrec- 1 . 27 .

[12]

Reimers , I. Gurevych , Sentence-BERT: Sentence embeddings using Siamese BERT-networks , in: K. Inui,

Jiang ,

Ng , X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , Association for Computational Linguistics , Hong Kong, China, 2019 , pp. 3982 - 3992 . URL: https://aclanthology.org/D19-1410/. doi: 10 . 18653/v1/ D19 -1410.

[13]

Radford , J. Wu ,

Child ,

Luan ,

Amodei ,

Sutskever , et al., Language models are unsupervised multitask learners, OpenAI blog ( 2019 ). URL: https://cdn.openai. com/better-language-models/ language_models_are_unsupervised_multitask_ learners .pdf.

[14]

Rouard ,

Massa ,

Défossez , Hybrid transformers for music source separation , in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2023 , pp. 1 - 5 . doi: 10 .1109/ICASSP49357. 2023 . 10096956 .

[15]

McFee ,

McVicar ,

Faronbi , I. Roman,

Gover , et al., Librosa , 2025 . URL: https://doi.org/ 10.5281/zenodo.15006942. doi: 10 .5281/zenodo. 15006942.

[16]

Benjamini ,

Hochberg , Controlling the false discovery rate: A practical and powerful approach to multiple testing , Journal of the Royal Statistical Society: Series B (Methodological) 57 ( 2018 ) 289 - 300 . URL: https://doi.org/10. 1111/j.2517- 6161 . 1995 .tb02031.x. doi: 10 .1111/j. 2517- 6161 . 1995 .tb02031.x.

[17] S.-w. Yang,

P.-H.

Chi ,

Y.-S.

Chuang ,

C.-I. J.

Lai ,

Lakhotia ,

Y. Y.

Lin ,

A. T.

Liu ,

Shi ,

Chang ,

G.-T.

Lin , et al., SUPERB: Speech processing Universal PERformance Benchmark , 2021 . arXiv: 2105 . 01051 .

[18]

Dinu , A.-M. Florescu , Testing language creativity of large language models and humans , in: M. Hämäläinen , E. Öhman, Y.

Bizzoni , S.

Miyagawa , K. Alnajjar (Eds.), Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities , Association for Computational Linguistics, Albuquerque, USA, 2025 , pp. 426 - 436 . URL: https://aclanthology.org/ 2025 .nlp4dh- 1 . 37/. doi: 10 .18653/v1/ 2025 .nlp4dh- 1 .37. M = 0.195 vs. 0.161 M = 0.215 vs. 0.228 M = 0.222 vs. 0 .178 p = 0 .135 p = 0 .688 p = 0 .0001 spec_ centroid_mean (Hz) spec_bandwidth_mean (Hz) spec_rollof_mean ( Hz) 4570.8 162.0 1.79 0.141 1961 1.24e6 1936 2.45e5 3625 3.50e6 0.0975 0.00868 8.84 -197.4 1.79e4 0.1098 0.01162 9.73 -220.7 2.30e4 86.4 4.24e3 -11.34 20.7 790 499 - 2 . 52 - 8 .57 271 210 - 7 . 59 - 4 .92 167 - 2 . 40 98 101 q-value 0.6001 0.0162 0.0492 0.0858 0.2485 0.2785 0.2937 3.73e3 93.7 25 .3 892 627 - 4 . 48 - 11 .42 318 - 13 .44 218 - 9 . 10 - 5 .82 203 - 3 . 49 112 105 0.0160 0.0028 0.0260 0.0062 0.0300 4.29 -5.65 2484 135 4.09 217 - 4 . 18 Mann-Whitney

-value