<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Interplay between Musical Preferences and Personality through the Lens of Language</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eliran Shem-Tov</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ella Rabinovich</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>The Academic College of Tel Aviv-Yafo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tel Aviv-Yafo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Israel</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <fpage>6</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>Music serves as a powerful reflection of individual identity, often aligning with deeper psychological traits. Prior research has established correlations between musical preferences and personality, while separate studies have demonstrated that personality is detectable through linguistic analysis. Our study bridges these two research domains by investigating whether individuals' musical preferences leave traces in their spontaneous language through the lens of the Big Five personality traits (Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism). Using a carefully curated dataset of over 500,000 text samples from nearly 5,000 authors with reliably identified musical preferences, we build advanced models to assess personality characteristics. Our results reveal significant personality diferences across fans of five musical genres. We release resources for future research at the intersection of computational linguistics, music psychology and personality analysis.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;natural language processing</kwd>
        <kwd>personality traits</kwd>
        <kwd>music psychology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>"Our language is the reflection of ourselves. A
language is an exact reflection of the character
and growth of its speakers."</p>
      <p>Cesar Chavez (civil rights activist)</p>
      <p>
        Music is a powerful expression of individuality, often reflecting deeper aspects of one’s character and
personality. The relationship between musical preferences and personality has long been a subject of
interest in psychology [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
        ]. Prior studies indicate that certain genres are associated with specific
personality traits, suggesting that music genre preferences ofer valuable insights into individuals’
psychological profile [
        <xref ref-type="bibr" rid="ref3 ref5 ref6">3, 5, 6</xref>
        ].
      </p>
      <p>
        Originally introduced by Goldberg [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the Big Five personality traits — Openness (OPN),
Conscientiousness (CON), Extroversion (EXT), Agreeableness (AGR), and Neuroticism (NEU),
collectively denoted by the acronym "OCEAN" — were established in psychology as one of the most
common ways for assessing one’s personality. Over decade of computational studies have shown that
our personality is reflected in our language (to the extent detectable by automatic tools), driving the
development of methods for personality assessment in linguistic productions [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8, 9, 10</xref>
        ].
      </p>
      <p>Our study aims to connect the two lines of research – we explore the relationship between musical
preferences and personality traits, as manifested in authentic written language. Specifically, we ask if
there are detectable (similarities and) diferences in the Big Five personality traits in the language
of people with various musical preferences, when authoring spontaneous textual content on social
media. We consider five popular and diverse musical genres: Classical music, Hip-Hop, Metal, Indie
and Electronic. The hypothesis driving this research posits that despite the inherent complexity of
automatic personality analysis, the diferences in personality traits of people with various musical
preferences "shine through" their language, to the extent that can be automatically captured through
advanced natural language processing techniques.</p>
      <p>We explore the hypothesis on a carefully curated large and diverse dataset, consisting of over 500,000
text samples written by nearly 5,000 authors with reliably identified musical preferences. These samples
were collected from a range of public non-music-related online forums, ensuring a broad topical
representation. Each participant was linked to a single dominant music genre they frequently engage
with, enabling unbiased textual personality extraction and a focused analysis of the relationship between
personality traits and musical preferences. Leveraging text generation capabilities of contemporary
LLMs we then gather high-quality data for training personality detectors along the five dimensions:
Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism.</p>
      <p>The models were further applied to the text samples authored by fans of diferent music genres.
Statistical tests reveal significant and reliably detected diferences in personality characteristics of
individuals with diverse musical preferences. As an example, Classical music enthusiasts show higher
levels of Agreeableness and lower levels of Extroversion, while Hip-Hop fans have lower Agreeableness and
higher Neuroticism. Additionally, Metal fans exhibit a higher propensity for Neuroticism, and Electronic
music listeners show greater Openness.</p>
      <p>Our contribution in this study is, therefore, manyfold: First, we release a carefully collected dataset
of authentic linguistic productions of users with diverse and reliably identified musical preferences.
Second, we train models for accurate personality detection from text, and release their training data.
Finally, through large-scale empirical analysis, we shed a new and interesting light on the association
between musical preferences and personality traits, as manifested in our language. All our data and
code are available at https://github.com/eliranshemtov/Musical-Preferences-And-Textual-Expression.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Personality and Musical Preferences Although sparse, first prior art goes back to 1950s and reveals
clear ties between musical preferences and personality traits. These studies typically rely on
selfreported personality traits (primarily OCEAN), and are of relatively small-scale. Cattell and Anderson
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Cattell and Saunders [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] suggested that music could satisfy deep and unconscious needs, thus
providing insights into personality. Specifically, they developed the IPAT Music Preference Test, which
identified stable music-preference factors reflecting unconscious aspects of personality. Rentfrow and
Gosling [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] found that Openness to experience correlated with a preference for Classical music, while
Extroversion was linked to energetic and rhythmic music. Moreover, Agreeableness was associated
with upbeat and conventional music.
      </p>
      <p>
        Additional studies conformed with earlier works and found that the Openness to experience trait
is correlated with many music genres, including new age, classical, world, blues, country, folk, jazz,
and alternative [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. Furthermore, population scored high in Conscientiousness showed a negative
correlation with folk and alternative music, while Extroversion correlated with R&amp;B (Rhythm and
Blues) and Rap music. Agreeableness positively correlated with Country and Folk, and Neuroticism
only showed a positive correlation with Alternative music [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A recent study by Greenberg et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
examined musical preferences across 53 countries, revealing that Extroversion was correlated with
stronger reactions to contemporary styles, Openness was correlated with "sophisticated music", whereas
Neuroticism was connected to intense musical styles, reflecting "inner angst and frustration".
Personality Detection from Text Automatic detection of personality traits from text roots back to
the pioneering work of Francis Galton in 1884 (reprinted in 1949 – Galton [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]). The author claimed
that personality could be efectively captured through the adjectives found in language and written text,
laying the groundwork for future studies in personality extraction. Since then, multiple (early) works
used the combination of linguistic cues and automatic language processing methods for personality
detection, focusing on the Big Five traits extracted from the Essays dataset [
        <xref ref-type="bibr" rid="ref14">14, 15, 16</xref>
        ].
      </p>
      <p>
        The release of the MyPersonality dataset by Kosinski et al. [17] — not a non-controversial outcome
of data collection through a Facebook application — sparked another line of research, using more
advanced methods, inspired by a range of deep-learning architectures [
        <xref ref-type="bibr" rid="ref9">18, 9, 19, 20, 21</xref>
        ]. Working with
MyPersonality [17], whose support was oficially discontinued in 2018, and Essays [ 15], researches
achieved higher accuracy: around 0.70 on the binary (high or low) personality trait detection task.
      </p>
      <p>
        Advancements in pre-trained large language models (LLMs) put forward additional opportunities for
the field of personality detection from text. Approaches vary from transfer learning for classification
[21, 22], through graph-based methods [23, 24, 25], to using the most powerful SOTA models (e.g.,
GPT4) in zero-shot scenario [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Notably, the studies still manage to achieve a moderate accuracy on
the two datasets, with GPT4 obtaining the very weak correlation (Pearson’s =0.31) to self-reported
scores in the MyPersonality dataset.
      </p>
      <p>Personality, Musical Preferences and Language Our study is the first, to the best of our knowledge,
to explore the association of personality traits detected from text, to musical preferences.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>We collected a large and diverse dataset of spontaneous and authentic written content from individuals
with distinct musical preferences. We aimed to gather texts that reflect a wide range of commonly
discussed topics, deliberately excluding any direct references to music, to avoid potential confounds
and ensure that the analysis accurately captures the relationship between musical preferences of an
individual and their personality traits.</p>
      <sec id="sec-3-1">
        <title>3.1. Users with Various Music Preferences</title>
        <p>Our dataset in this study was collected from the Reddit discussion platform:1 a well-known, highly
organized, and topic-categorized home of over 2.3 billion monthly active users,2 in over 130 thousand
communities, also known as subreddits. Reddit maintains various active communities (subreddits) for
discussions of music in diferent genres. Among the most active musical subreddits are Indie, Electronic,
Hip-Hop, Metal, and Classical music. Ofering five distinct preferences, these genres’ fans are the main
subject of our study.</p>
        <p>First we retrieved ∼ 1M most recent posts from each of the threads, and identified top-K most active
users in each subreddit, where K was set to 1,000, excluding those in the "intersection" with other genres:
users that authored posts or comments in any of the other four music communities. As a concrete
example, the most recent 1,000,154 comments in the Metal subreddit (r/metal) were posted by 74,055
Redditors, where 1,000 most active authors (w/o any activity in the other subreddits), were considered
for this study. We refer to the set of active 1,000 users associated with a single genre (e.g., Metal) as
reliable enough for the purpose of this work. Similar assumptions about the association between Reddit
authors’ activity and their meta properties were shown reliable for the country of origin [26].</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset Collection</title>
        <p>Given a Reddit UserID of each of the 5,000 authors, the entire user’s digital footprint can be retrieved
from the platform, providing direct access to a person’s linguistic productions across a variety of
topics, spanning several years. We applied multiple filtering and cleanup steps: (1) excluded subreddits
(in)directly related to music,3 (2) aiming at long enough, ideally paragraph-length texts for personality
analysis, we filtered out texts shorter than 40 tokens, 4 and (3) removed duplicate entries. These steps
resulted in a clean and high quality dataset; the dataset statistics are reported in Table 1.</p>
        <p>Additionally, Figure 1 reports the distribution of user participation in the top-10 most-popular
communities in our collected data, split by musical preference. Notably, the various topical threads are</p>
        <sec id="sec-3-2-1">
          <title>1https://www.reddit.com/ 2https://arc.net/l/quote/byjckxas 3About 150 threads were excluded per manual inspection. 4The minimal number of tokens is aligned with the mean paragraph length in our training data (see Section 5).</title>
          <p>represented roughly equally by the fans of the five musical genres in our study, suggestive of a dataset
free of topical confounds.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Personality Detection from Language: Datasets and Approaches</title>
      <p>A typical approach to personality detection from text includes the analysis of written or spoken
language to identify and predict various personality traits of individuals; these traits are often defined
by psychological models, with Big Five being the most popular one. In this section we briefly survey
existing tools and datasets used for personality detection, and describe approaches to this task.</p>
      <p>Importantly, we conclude that the existing annotated training datasets for personality classifier are
not suficiently adequate for the task, sufering from various drawbacks, as detailed below.</p>
      <sec id="sec-4-1">
        <title>4.1. Existing Personality Datasets</title>
        <p>Various datasets have been used for personality detection in prior studies. These datasets typically
consist of textual data along with labeled personality traits (can be discrete or continuous), which are
used to train and evaluate machine learning models. We describe the two most commonly used datasets
for these models evaluation — (1) Essays [15] and (2) MyPersonality [17] — in more detail below.
4.1.1. Essays Dataset
Originating from 2,468 essays (collected between 1997 and 2004, each written by an undergraduate
student), these writing submissions were part of a psychology course assignment [15]. Students’
personality scores were determined using the Big Five inventory, through a self-reported questionnaire
that measures the five personality traits. Each essay was then labeled with five personality scores, based
on a student’s questionnaire, where the mean continuous scores where transformed into binary labels:
"yes" (for high-), and "no" (for low-trait presence, respectively).</p>
        <p>Despite its relative popularity, this dataset has several obvious limitations: First, annotating the entire
essay (of average length of 663.10 tokens) with a single self-reported score is misleading, since diferent
sections of the essay written by the same person may exhibit varying levels of each trait. Second,
converting scores into binary labels with a median split oversimplifies the inherently continuous nature
of a personality characteristic. Per our inspection, only a small ratio of texts can be associated with one
of a trait extremes ("yes" or "no"), while the vast majority span the more neutral range.
4.1.2. MyPersonality Dataset
MyPersonality dataset contains information from a Facebook application originally used by over 6
million users [17]. Similarly to Essays, personality scores were assigned to authors based on self-reported
personality questionnaire, along with several social network measures, including network size, density,
brokerage, and transitivity. The final dataset includes textual Facebook status updates assigned with
continuous scores for each personality trait per author. MyPersonality has been particularly useful
for research because it combines social media text with personality trait labels, allowing the study of
personality in a more naturalistic setting. However, the project was discontinued in 2018 due to the
heavy burden of its maintenance:5 its data is no longer publicly available, and only a minor subset of
9,917 texts by 250 users is available online.</p>
        <p>MyPersonality dataset also has several notable drawbacks: First, the publicly available subset of
the dataset is very limited, which significantly restricts the robustness of any analysis conducted.
Furthermore, the self-reported personality scores are assigned to all user’s statuses, regardless of how
each trait manifests in individual texts, thereby posing a limitation, similar to that of Essays.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Approaches to Personality Detection</title>
        <p>Prior to developing a novel dataset and model, we performed a thorough assessment of four freely
available of-the-shelf tools: (1) Psychology Insights, 6 (2) Personality Recognizer,7 (3) Apply Magic
Sauce,8 and (4) Jkwieser’s Personality Prediction.9 Notably, tools (1) and (3) were trained on the
(undocumented) parts of the MyPersonality dataset, while tool’s (4) training data includes Essays.
Personality Recognizer (tool (2)) showed considerably inferior results; we excluded it from further
experiments. Our choice of these tools was driven by their availability or the ease of reproducibility.</p>
        <p>Evaluating performance of the three tools on both datasets yields disappointingly (but not surprisingly)
low results. We report a tool’s per-trait accuracy: the ratio of data examples assigned with the correct
(binary) label out of the total amount. Considering the random baseline of 0.5, and despite the exposure
of the tools to the datasets, the accuracy is low (inline with what is reported in prior art), questioning
the applicability of these approaches to the task at hand.</p>
        <p>Table 2 reports classification accuracy results of existing tools on the two available datasets. We
attribute the poor results mainly to the limitations of the existing datasets (see Section 4.1), and anticipate
that high quality training data coupled with contemporary modeling, will advance the state-of-the-art
in this field. We describe our approach and methodology in the next section.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Methodology</title>
      <p>Considering the detailed drawbacks of the existing personality datasets, we decided to leverage the
capabilities of generative AI for collecting a novel Big Five personality dataset that will be used to
train personality classifiers. Contemporary LLMs excel at creating diverse, highly naturalistic textual
content, indistinguishable from human writing [27, 28]; we used Google’s Gemini10 and OpenAI’s
GPT11 for generating high-quality, diverse dataset(s) with short passages, each associated with low
or high personality trait level. The five collected datasets (one per trait) were evaluated by human
annotators, and used for training classification models for traits ranking on the continuous 0-1 scale.
Below we provide details on dataset generation and model training.</p>
      <sec id="sec-5-1">
        <title>5.1. Big Five Generated Datasets (GenBigFive)</title>
        <p>We applied a systematic approach to generate textual content that could be used to train and evaluate
classification models for each of the Big Five personality traits, as described below.
5.1.1. Definition Compilation
We gathered detailed definitions for each of the Big Five personality traits (OPN, CON, EXT, AGR, NEU)
from academic and online resources; these definitions were manually concatenated into a readable and
comprehensive format to serve as the basis for our prompts.
5.1.2. Prompt Creation
For each personality trait, we wrote a primary prompt that started with the trait’s definition and ended
with a request for the LLM to generate several paragraphs demonstrating a high (and similarly, low)
level of the trait. As a concrete example, for Extroversion, we generated texts "written by" people who
are highly extroverted and then by those who score low on Extroversion, i.e. considered introverts.
Appendix A contains our final prompt definitions for the five traits. We iteratively modified the requests
6https://github.com/fuguixing/psychology-insights-frontend
7https://farm2.user.srcf.net/research/personality/recognizer.html
8https://applymagicsauce.com
9https://github.com/jkwieser/personality-prediction-from-text
10https://gemini.google.com/app
11https://chatgpt.com/
to ensure the generated texts were unique, authentic, and avoided repetition. We also instructed the
models to write as if they were diferent personas, iterating over a variety of topics and writer profiles,
e.g., texts written by high school teenagers, by sports lovers (fans or practitioners), politicians, and also
texts about finance, news, traveling, etc.
5.1.3. Dataset Generation
Using GPT3.5 for training set collection, and Gemini for test set, we collected hundreds of short passages
exhibiting the two extremes of each personality trait. Table 3 reports the final statistics of the collected
GenBigFive dataset, after cleanup and filtering, and Table 4 presents a few generated text examples,
produced as exhibiting "high" and "low" trait level by GPT3.5.</p>
        <p>Hey there! I absolutely thrive in social settings – the more people, the merrier! It’s like every conversation
is an opportunity to learn something new or share a laugh. I simply love being in the middle of it all,
soaking up the energy and excitement around me. Whether it’s a party, a networking event, or just a
casual get-together, count me in!
Social gatherings always leave me drained. It’s not that I dislike people, but the constant buzz of conver- low
sation exhausts me. I find solace in quiet moments alone, where I can recharge my energy and gather
my thoughts. [...] Being the center of attention is my worst nightmare; I much prefer blending into the
background and observing rather than being in the spotlight.</p>
        <p>Protecting our planet and its delicate ecosystems has always been a priority close to my heart. Every
action we take contributes to the health of our environment. Whether it’s reducing our carbon footprint
or advocating for renewable energy sources, we have the power to make a positive impact.</p>
        <p>AGR It’s utterly bafling how some people can’t seem to handle the simplest tasks without constant
handholding. I mean, come on, do I look like your personal assistant? Get it together and figure it out yourself
for once. I’ve got my own stuf to deal with, and I certainly don’t have time to babysit grown adults who
can’t take responsibility for themselves.
rank
high
high
low
5.1.4. Human Evaluation
We evaluated the quality of generated data through human annotation of 250 samples: 50 samples were
randomly selected for each trait, split equally between low and high. We used the Appen platform12
to recruit native English speakers, who were presented with a personality trait description (identical
to what the LLM was prompted with, see Appendix A.2) along with a few examples. Quizzes were
integrated into the task to ensure high-quality outcome. Each sample was annotated by five annotators,
with an exception of 17 samples, that were annotated only by three. Each example was assigned a trait
level (low or high) based on the majority vote.</p>
        <p>With suficient training, the task turned out to be relatively easy: annotation agreement with GPT3.5
generation was 94.7% and inter-annotator agreement was =0.76.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Building New Personality Classifier(s)</title>
        <p>We used the collected datasets for each of the five traits for training reliable and accurate personality
classifiers. Here, we follow the approach successfully used in prior studies for classification of emotion
[29] and the degree of concept abstractness [30], training a logistic regression classifier with passage
embeddings as feature vectors. This methodology was chosen for its simplicity, efectiveness, the
inherent ability to produce a continuous output score (posterior probability), and eficiency (fast
training and inference).</p>
        <p>We trained five logistic regression models, one per trait, where the passages with low and high trait
level were encoded into embeddings using the intfloat/e5-large-v2 encoder [31],13 due to its
proven benefits on multiple tasks. We trained the models on GPT3.5-generated data (train set) and
tested on Gemini-generated data (test set); that, in order to avoid possible confounds in train and test
data, generated by the same LLM. We also tested the of-the-shelf tools (see Section 4.2), showing that
the newly trained classifiers yield superior results on the unseen test set.
5.2.1. Evaluation Results
We report evaluation accuracy of the five classifiers in Table 5. As a complementary experiment, we
also trained the classifiers on the GPT3.5-generated dataset combined with the MyPersonality dataset,
and tested on the Gemini-generated test set. High test set accuracy is achieved for each of the five traits,
with Openness (OPN) obtaining the lowest accuracy of 0.874. Additional MyPersonality training data
harms the results in all cases but CON. Finally, existing tools yielded inferior, nearly random results.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Experimental Results and Discussion</title>
      <p>We next applied the intfloat/e5-large-v2 encoder (see Section 5.2) to over 500K Reddit posts and
comments, written by nearly 5,000 Redditors with five distinct musical preferences, roughly 1,000 for
each genre. We used the pre-trained classification models for prediction, assigning each text with five
continuous scores, across five personality trait dimensions.</p>
      <p>Table 6 presents example posts from our dataset, along with their automatically assigned low and
high personality scores.</p>
      <p>Wow....there is so much awesome in this artwork. Pretty amazing &lt;redditor username&gt; could render such
distinct subjects as the numerous blooming flowers, the Greek vase fragmented story and the bronze
relief sculpture. Mind blowing visually. I bet there’s language symbolized in the flowers interacting with
the decorative elements.</p>
      <p>Yup you pretty much hit the nail on the head for me. My partner of 2 years is pretty vocal about marriage
and her parents have been married since before she was born. My parents split when I was super young
so I pretty much haven’t put too much thought into marriage into my relationship. I just figured I’ll see
how I feel in the moment when our relationship hits 5y or something...
feel like i’m going crazy. everything is up in the air. no idea where i’ll be in august. i’m applying for jobs
and my soul tears apart each time because the process is designed to be fucking excruciating and make
you feel worthless (so they can hire you cheaper). [...]
This is great. Eve and Amanda and Elza Brabant talk about the 3 reasons we eat: we eat for nutrients, 0.184
we eat for pleasure, and we eat for community. I like how you are thinking: I think what you are doing
enhances your eating. Best.
score</p>
      <sec id="sec-6-1">
        <title>6.1. Human Evaluation</title>
        <p>We conducted human evaluation, this time for labeling actual Redditors’ content, scored by the classifiers,
to make sure the predictive ranking fits human intuition. Similarly to the first task, 250 samples in total
where selected for annotation — where the continuous score was binarized into two extremes — this
time yielding lower inter-annotator agreement of =0.67 and agreement of 86.2% with scores (low or
high) assigned by the models. We note that the lower agreement (recall the 94.7% for generated content)
is expected since human annotation is compared to the (imperfect) classifiers outcome.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Authors’ Personality Exhibits Association with their Musical Preferences</title>
        <p>Next we assigned each user with scores, reflecting their mean personality ranking on each of the Big
Five traits. Specifically, we averaged the prediction scores across all texts authored by a given user.
Consequently, a community-level personality trait score was computed by averaging over individual
scores of users in the specific community. As a concrete example, the mean OPN score of the Classical
music lovers was calculated by averaging over the OPN score of the 982 this genre’s fans in our dataset.</p>
        <p>
          Figure 2 presents the results. Many of the findings align with common intuition and prior results
in non-linguistic studies [
          <xref ref-type="bibr" rid="ref12 ref6">6, 12</xref>
          ]: Classical music enthusiasts tend to exhibit the highest levels of
Agreeableness, while Hip-Hop fans show the lowest levels, compared to all other genres. Additionally,
Classical music enthusiasts generally show lower levels of Extroversion, Hip-Hop and Metal fans tend
to have higher levels of Neuroticism.
        </p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Statistical Significance of the Findings</title>
        <p>We turn to evaluate the significance of diferences between the groups’ scores. Our primary statistical
test is ANOVA [32] – a statistical method that compares means across multiple groups to identify
statistically significant diferences. It analyzes both within-group and between-group variation to
determine if observed diferences are due to actual efects or random variation. Particularly useful for
experiments involving more than two groups, ANOVA extends the capabilities of the two-group t-test.
ANOVA’s primary output, the F-statistic, indicates whether the means of the groups are significantly
diferent. A -value lower than 0.05 is typically used as significance threshold.</p>
        <p>When applied on our per-community personality scores across each dimension, the test results —
all virtually zero, significant at &lt;5.0e-46 — strongly suggest that there are considerable diferences in
the means across the groups for each personality trait; the extremely low -values confirm that these
diferences are significant.
6.3.1. Pairwise Significance and Efect Size
While ANOVA provides a broad analysis by testing whether there are any significant diferences across
multiple groups, it does not specify where those diferences lie. We further applied pairwise group
analysis using two-tailed unpaired test for diference in means; specifically, we used the non-parametric
Mann-Whitney test [33], that does not imply assumptions on the underlying population distribution.</p>
        <p>Statistical significance tests benefit from large samples under test, where even extremely small
diferences may show up significant. A common way to overcome this bias, is by reporting the
additional measure of effect-size. As such, Cohen’s  efect size [ 34] quantifies the magnitude of
the diference between two groups in terms of standard deviations, helping to understand the practical
significance of the observed diferences. A higher Cohen’s  absolute value indicates a larger diference
between the groups, where the common interpretation is as follows: absolute Cohen’s  value between
0.2 and 0.5, denotes a small efect size, 0.5–0.8 – medium, and 0.8 or higher indicates high efect size.</p>
        <p>Figure 3 presents our pairwise findings. Evidently, there exist significant associations between musical
preferences and personality traits. Classical music listeners are notably less extroverted compared to
both Hip-Hop and Electronic fans, as indicated by large negative efect sizes. Additionally, Classical
music lovers exhibit higher levels of Agreeableness, particularly when compared to Hip-Hop listeners.
In terms of Openness, Hip-Hop shows the lowest presence compared to the other genres, especially
Classical and Electronic. Also, Hip-hop and Metal listeners display higher levels of Neuroticism when
compared to Classical music fans. Our findings suggest that the degree of certain trait, as detected in
authentic writing, is strongly associated with a person’s musical preferences.</p>
      </sec>
      <sec id="sec-6-4">
        <title>6.4. Predicting Genre from Personality</title>
        <p>As an additional experiment, we approach the challenging task of predicting a user’s musical preference
based solely on a five-dimensional personality vector derived from their authentic writing. Specifically,
we trained another logistic regression classifier for five-class classification task: given a personality
feature vector with five scores — OPN, AGR, CON, EXT, NEU — predict the individual’s musical
preference out of the five genres we consider in this study. Notably, each user is associated in our study
with a single genre (see Section 3). Applied on a random split of the 4,683 users into 80-20 train and
test, the classifier achieved the accuracy of 0.424, considerably exceeding the five-class random baseline
of 0.2: while insuficient for any practical purpose, this finding supports once again that personality
traits "shine through" our language, and here, as well, are associated with our musical preferences.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>Our study presents a comprehensive analysis of the relationship between musical preferences and
personality traits as those are extracted from textual data from social media. Using a carefully curated
dataset and reliably trained models, we reveal significant associations between personality traits, as
defined by the Big Five model and extracted from text, and five distinct musical genres. Ultimately, this
research opens up new possibilities for understanding how our personalities shape, and are shaped
by, the music we engage with, ofering a compelling intersection of language, music and psychology.
Additionally, we release resources for future research.</p>
    </sec>
    <sec id="sec-8">
      <title>Ethical Considerations</title>
      <p>We use publicly available data to study how individuals’ musical preferences are reflected in their
spontaneous language through the lens of the Big Five personality traits. The use of publicly available
data from social media platforms, such as Reddit, may raise normative and ethical concerns. These
concerns are extensively studied by the research community as reported in e.g., Proferes et al. [35].</p>
      <p>Here we address two main concerns: Anonymity: Data used for this research can only be associated
with participants’ user IDs, which, in turn, cannot be linked to any identifiable information, or used to
infer any demographic trait. Consent: Jagfeld et al. [36] debated the need to obtain informed consent
for using social media data mainly because it is not straightforward to determine if posts pertain to a
public or private context. Ethical guidelines for social media research [37] and practice in comparable
research projects [38], as well as Reddit’s terms of use,14 regard it as acceptable to waive explicit consent
if users’ anonymity is protected.</p>
      <p>We hired human annotators (native English speakers from specific geographies) for annotating
samples of data for the presence of personality traits during this study. The annotators were hired via
the Appen annotation platform,15 and were payed above the US federal minimum wage. Comments left
by our annotators at the end of the task indicate they found the work interesting and enjoyable.</p>
    </sec>
    <sec id="sec-9">
      <title>Limitations</title>
      <p>Despite our eforts to carefully mitigate potential confounds, this study is not without limitations. The
use of Reddit as a data source introduces certain biases, as its demographic skews toward younger
males, which may afect the generalizability of our findings. Additionally, assuming that the most active
14https://www.redditinc.com/policies/user-agreement-september-12-2021
15https://www.appen.com/
participants in genre-specific subreddits are representative of the broader fan base is a strong, though
not unreasonable, assumption, as not all subscribers may strongly identify with the genre. Moreover,
(unknown) factors such as age and gender were not explicitly controlled for, potentially influencing the
results. Nevertheless, we believe that the large scale of our dataset helps enhance generalizability and
mitigate demographic biases. Furthermore, prior research has validated the approach of using high
activity in topic-related subreddits as a proxy for user afiliation with that topic.</p>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgments</title>
      <p>We are grateful to our four anonymous reviewers for their useful comments and constructive feedback.</p>
    </sec>
    <sec id="sec-11">
      <title>Declaration on Generative AI</title>
      <sec id="sec-11-1">
        <title>The authors have not employed any Generative AI tools while writing the paper.</title>
        <p>[15] J. W. Pennebaker, L. A. King, Linguistic styles: language use as an individual diference., Journal
of personality and social psychology 77 (1999) 1296.
[16] F. Mairesse, M. A. Walker, M. R. Mehl, R. K. Moore, Using linguistic cues for the automatic
recognition of personality in conversation and text, Journal of artificial intelligence research 30
(2007) 457–500.
[17] M. Kosinski, S. C. Matz, S. D. Gosling, V. Popov, D. Stillwell, Facebook as a research tool for
the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines.,
American psychologist 70 (2015) 543.
[18] N. Majumder, S. Poria, A. Gelbukh, E. Cambria, Deep learning-based document modeling for
personality detection from text, IEEE intelligent systems 32 (2017) 74–79.
[19] H. Zheng, C. Wu, Predicting personality using facebook status based on semi-supervised learning,
in: Proceedings of the 2019 11th international conference on machine learning and computing,
2019, pp. 59–64.
[20] Z. Ren, Q. Shen, X. Diao, H. Xu, A sentiment-aware deep learning approach for personality
detection from text, Information Processing &amp; Management 58 (2021) 102532.
[21] K. Yang, R. Y. Lau, A. Abbasi, Getting personal: A deep learning artifact for text-based measurement
of personality, Information Systems Research 34 (2023) 194–222.
[22] B. Alshouha, J. Serrano-Guerrero, F. Chiclana, F. P. Romero, J. A. Olivas, Personality trait detection
via transfer learning., Computers, Materials &amp; Continua 78 (2024).
[23] T. Yang, F. Yang, H. Ouyang, X. Quan, Psycholinguistic tripartite graph network for personality
detection, in: Proceedings of the 59th Annual Meeting of the Association for Computational
Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume
1: Long Papers), 2021, pp. 4229–4239.
[24] Y. Zhu, L. Hu, X. Ge, W. Peng, B. Wu, Contrastive graph transformer network for personality
detection., 2022.
[25] Y. Zhu, Y. Xia, M. Li, T. Zhang, B. Wu, Data augmented graph neural networks for personality
detection, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 2024, pp.
664–672.
[26] E. Rabinovich, Y. Tsvetkov, S. Wintner, Native language cognate efects on second language lexical
choice, Transactions of the Association for Computational Linguistics 6 (2018) 329–342.
[27] J. Wu, S. Yang, R. Zhan, Y. Yuan, D. F. Wong, L. S. Chao, A survey on llm-gernerated text detection:</p>
        <p>Necessity, methods, and future directions, arXiv preprint arXiv:2310.14724 (2023).
[28] S. Dathathri, A. See, S. Ghaisas, P.-S. Huang, R. McAdam, J. Welbl, V. Bachani, A. Kaskasoli,
R. Stanforth, T. Matejovicova, et al., Scalable watermarking for identifying large language model
outputs, Nature 634 (2024) 818–823.
[29] J. Aggarwal, E. Rabinovich, S. Stevenson, Exploration of gender diferences in covid-19 discourse
on reddit, in: ACL 2020 Workshop on Natural Language Processing for COVID-19 (NLP-COVID),
2021.
[30] D. Francis, E. Rabinovich, F. Samir, D. Mortensen, S. Stevenson, Quantifying cognitive factors in
lexical decline, Transactions of the Association for Computational Linguistics 9 (2021) 1529–1545.
[31] L. Wang, N. Yang, X. Huang, B. Jiao, L. Yang, D. Jiang, R. Majumder, F. Wei, Text embeddings by
weakly-supervised contrastive pre-training, arXiv preprint arXiv:2212.03533 (2022).
[32] E. R. Girden, ANOVA: Repeated measures, volume 84, Sage, 1992.
[33] H. B. Mann, D. R. Whitney, On a test of whether one of two random variables is stochastically
larger than the other, The annals of mathematical statistics (1947) 50–60.
[34] J. Cohen, Statistical power analysis for the behavioral sciences, Routledge, 2013.
[35] N. Proferes, N. Jones, S. Gilbert, C. Fiesler, M. Zimmer, Studying reddit: A systematic overview of
disciplines, approaches, methods, and ethics, Social Media+ Society 7 (2021) 20563051211019004.</p>
        <p>URL: https://journals.sagepub.com/doi/full/10.1177/20563051211019004.
[36] G. Jagfeld, F. Lobban, P. Rayson, S. H. Jones, Understanding who uses reddit: Profiling individuals
with a self-reported bipolar disorder diagnosis, arXiv preprint arXiv:2104.11612 (2021). URL:
https://arxiv.org/pdf/2104.11612.pdf.
[37] A. Benton, G. Coppersmith, M. Dredze, Ethical research protocols for social media health research,
in: Proceedings of the first ACL workshop on ethics in natural language processing, 2017, pp.
94–102. URL: https://aclanthology.org/W17-1612/.
[38] W. Ahmed, P. A. Bath, G. Demartini, Using twitter as a data source: An overview of ethical, legal,
and methodological challenges, The ethics of online research 2 (2017) 79–107.</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>A. Appendices</title>
      <sec id="sec-12-1">
        <title>A.1. Personality Traits (Short) Definitions</title>
        <p>Openness Openness (also referred to as Openness to experience) emphasizes imagination and insight
the most out of all five personality traits. People who are high in Openness tend to have a broad range
of interests. They are curious about the world and other people and are eager to learn new things and
enjoy new experiences. People who are high in this personality trait also tend to be more adventurous
and creative. Conversely, people low in this personality trait are often much more traditional and may
struggle with abstract thinking. Intellect, imagination, and Openness describe your imagination and
how creative you are. It refers to your sense of curiosity about the world and your willingness to try
new things, and to be exposed to new experiences.</p>
        <p>Conscientiousness Conscientiousness is a trait that refers to how thoughtful and goal-oriented you
are. It reflects the degree of your control over your impulses and your level of organization and work
ethic. Conscientiousness describes a person’s ability to regulate impulse control in order to engage in
goal-directed behaviors. It measures elements such as control and persistence of behavior.
Extroversion Extroversion (or extraversion) is a personality trait characterized by excitability,
sociability, talkativeness, assertiveness, and high amounts of emotional expressiveness. People high in
extroversion are outgoing and tend to gain energy in social situations. Being around others helps
them feel energized and excited. People who are low in this personality trait or introverted tend to be
more reserved. They have less energy to expend in social settings and social events can feel draining.
Introverts often require a period of solitude and quiet in order to "recharge". Extroversion reflects how
you interact socially. It describes your emotional expression and how comfortable you are in your
environment.</p>
        <p>Agreeableness This personality trait includes attributes such as trust, altruism, kindness, afection,
and other prosocial behaviors. People who are high in agreeableness tend to be more cooperative
while those low in this personality trait tend to be more competitive and sometimes even manipulative.
Agreeableness is a personality trait that describes how you treat your relationships with others. It
reflects how kind and helpful you are toward people. Overall, high agreeableness means you desire to
keep things running smoothly and value social harmony.</p>
        <p>Neuroticism Neuroticism is a personality trait characterized by sadness, moodiness, and emotional
instability. Individuals who are high in neuroticism tend to experience mood swings, anxiety, irritability,
and sadness. People low in this trait tend to be more stable and emotionally resilient. Neuroticism is
a personality trait that refers to your emotional stability. As a personality dimension, neuroticism is
characterized by unsettling thoughts and feelings of sadness or moodiness.</p>
      </sec>
      <sec id="sec-12-2">
        <title>A.2. Guidelines for Human Annotators (used also as Prompts for Generative Models)</title>
        <p>For each personality trait, we wrote a primary prompt that started with the trait’s definition and ended
with a request for the LLM to generate paragraphs demonstrating a high (and similarly, low) level of the
trait. As a concrete example, for Extroversion, we generated texts "written by" people who are highly
extroverted and then by those who rank low on Extroversion, i.e. considered introverts.</p>
        <p>We iteratively modified the requests to ensure the generated texts were unique, authentic, and avoided
repetition. We also instructed the models to write as if they were diferent personas, iterating over a
variety of topics and writer profiles, e.g., texts written by high school teenagers, by sports lovers (fans or
practitioners), politicians, and also texts about finance, news, traveling, etc. The models were presented
with multiple (found in literature) examples for texts with high and low personality trait level.16
A.2.1. Prompt Used for Collecting Openness Texts
We used the following prompt (also used as a trait description to human annotators:)
Openness Trait Description: Openness (also referred to as Openness to experience) emphasizes
imagination and insight the most out of all five personality traits. People who are high in Openness
tend to have a broad range of interests. They are curious about the world and other people and are
eager to learn new things and enjoy new experiences. People who are high in this personality trait also
tend to be more adventurous and creative. Conversely, people low in this personality trait are often
much more traditional and may struggle with abstract thinking. Intellect, imagination, and Openness
describe also how creative you are.</p>
        <p>Openness to experience refers to one’s willingness to try new things as well as engage in imaginative
and intellectual activities. It includes the ability to "think outside of the box", curiosity about and
tolerance for diverse cultural and intellectual experiences.</p>
        <p>People who are considered to have high Openness are more likely to be: Very creative, Open to
trying new things, Focused on tackling new challenges, happy to think about abstract concepts, enjoy
learning and trying new things, have an active imagination, be more creative, be intellectually curious,
think about abstract concepts, enjoy challenges, like to travel, have a wide range of interests, Curious,
Imaginative, Creative, Open to trying new things, Unconventional. They have a basic tendency for
Actions (a need for variety, novelty, and change). Interest in travel, many diferent hobbies, knowledge
of foreign cuisine, diverse vocational interests, and friends who share tastes. People who are considered
to have low Openness are more likely to Dislike change, Do not enjoy new things, Resist new ideas, Not
very imaginative, Dislike abstract or theoretical concepts, dislike change, be likely to stick to routines,
not be imaginative or creative, have more traditional thinking, be more grounded, Predictable, Not very
imaginative, Dislikes change, Prefer routine, Traditional.</p>
        <p>Openness vs. Closedness to Experience: Those who score high on Openness to experience are
perceived as creative and artistic. They prefer variety and value independence. They are curious about
their surroundings and enjoy traveling and learning new things. People who score low on Openness
to experience prefer routine. They are uncomfortable with change and trying new things, so they
prefer the familiar over the unknown. As they are practical people, they often find it dificult to think
creatively or abstractly.</p>
        <p>&lt;for models&gt; Consider the following task: Please provide 10 paragraphs of 40-150 words each,
"written by" people with high Openness personality. All the paragraphs should be very diverse and
should not be repeated at all. Don’t even repeat sentences. Let’s start with the first paragraph and then
continue with 9 more iterations. Pretend to be diferent males or females of a variety of ages, with
diferent socioeconomic statuses with high Openness, i.e. the Openness personality trait is strong and
well noticeable in their texts. Avoid repeating the word Openness in your writing. You can tell personal
details about the writer but don’t introduce them with details in a formal manner (don’t start with
name, age and occupation!). The texts should not be about the writer but written by them.
Alternative Endings Several alternative closing instructions were further provided in the prompt
for generation of diverse content:
16All five traits guidelines and prompts are available in the data that will be released upon this paper acceptance.</p>
        <p>(a) Write distinct 40-150-word length paragraphs. The paragraphs should be written by totally
diferent people, but all should have in common the strong Openness personality trait. Try to make the
paragraphs unique and avoid repeating yourself.</p>
        <p>(b) Generate 10 paragraphs written by characters with low Openness. Here are the rules: 1. The
paragraph length should be 40 to 150 words. 2. Topics should be random and not repeated. 3. Avoid
repeating sentences. 4. Use casual daily internet language. 5. Texts should not be about Openness at all
but demonstrate low Openness. 6. Don’t introduce the character at the beginning.</p>
        <p>(c) Generate 10 distinct paragraphs, each should be 40-150 words long, about diferent random topics.
Paragraphs or sentences (or even parts of sentences) should not be repeated as much as possible. The
texts should be "written by" characters with high Openness. Use causal daily language and don’t
mention the term Openness.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Cattell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <article-title>The measurement of personality and behavior disorders by the ipat music preference test</article-title>
          .,
          <source>Journal of Applied Psychology</source>
          <volume>37</volume>
          (
          <year>1953</year>
          )
          <fpage>446</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Rentfrow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Gosling</surname>
          </string-name>
          ,
          <article-title>The do re mi's of everyday life: the structure and personality correlates of music preferences</article-title>
          .,
          <source>Journal of personality and social psychology 84</source>
          (
          <year>2003</year>
          )
          <fpage>1236</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bonneville-Roussy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Rentfrow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Potter</surname>
          </string-name>
          ,
          <article-title>Music through the ages: Trends in musical engagement and preferences from adolescence through middle adulthood</article-title>
          .,
          <source>Journal of personality and social psychology 105</source>
          (
          <year>2013</year>
          )
          <fpage>703</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ferwerda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tkalcic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          ,
          <article-title>Personality traits and music genre preferences: how music taste varies over age groups</article-title>
          ,
          <source>in: 1st Workshop on Temporal Reasoning in Recommender Systems (RecTemp) at the 11th ACM Conference on Recommender Systems, Como, August</source>
          <volume>31</volume>
          ,
          <year>2017</year>
          ., volume
          <year>1922</year>
          ,
          <article-title>CEUR-</article-title>
          <string-name>
            <surname>WS</surname>
          </string-name>
          ,
          <year>2017</year>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Nave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Minxha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Greenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kosinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stillwell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rentfrow</surname>
          </string-name>
          ,
          <article-title>Musical preferences predict personality: Evidence from active listening and facebook likes</article-title>
          ,
          <source>Psychological science 29</source>
          (
          <year>2018</year>
          )
          <fpage>1145</fpage>
          -
          <lpage>1158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>I.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gibson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shapiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Semerci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Greenberg</surname>
          </string-name>
          ,
          <article-title>“just the way you are”: Linking music listening on spotify and personality</article-title>
          ,
          <source>Social Psychological and Personality Science</source>
          <volume>12</volume>
          (
          <year>2021</year>
          )
          <fpage>561</fpage>
          -
          <lpage>572</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <article-title>The development of markers for the big-five factor structure</article-title>
          .,
          <source>Psychological assessment 4</source>
          (
          <year>1992</year>
          )
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Greenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Baron-Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Stillwell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kosinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Rentfrow</surname>
          </string-name>
          ,
          <article-title>Musical preferences are linked to cognitive styles</article-title>
          ,
          <source>PloS one 10</source>
          (
          <year>2015</year>
          )
          <article-title>e0131151</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <article-title>Who am i? personality detection based on deep learning for texts</article-title>
          ,
          <source>in: 2018 IEEE international conference on communications (ICC)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Matz</surname>
          </string-name>
          ,
          <article-title>Large language models can infer psychological dispositions of social media users</article-title>
          ,
          <source>PNAS nexus 3</source>
          (
          <year>2024</year>
          )
          <article-title>pgae231</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>R. B. Cattell</surname>
            ,
            <given-names>D. R.</given-names>
          </string-name>
          <string-name>
            <surname>Saunders</surname>
          </string-name>
          ,
          <article-title>Musical preferences and personality diagnosis: I. a factorization of one hundred and twenty themes</article-title>
          ,
          <source>The Journal of Social Psychology</source>
          <volume>39</volume>
          (
          <year>1954</year>
          )
          <fpage>3</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>D. M. Greenberg</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          <string-name>
            <surname>Wride</surname>
            ,
            <given-names>D. A.</given-names>
          </string-name>
          <string-name>
            <surname>Snowden</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spathis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Potter</surname>
            ,
            <given-names>P. J.</given-names>
          </string-name>
          <string-name>
            <surname>Rentfrow</surname>
          </string-name>
          ,
          <article-title>Universals and variations in musical preferences: A study of preferential reactions to western music in 53 countries</article-title>
          .,
          <source>Journal of personality and social psychology 122</source>
          (
          <year>2022</year>
          )
          <fpage>286</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Galton</surname>
          </string-name>
          ,
          <article-title>The measurement of character</article-title>
          .,
          <string-name>
            <surname>Prentice-Hall</surname>
          </string-name>
          , Inc,
          <year>1949</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Pennebaker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Francis</surname>
          </string-name>
          , Cognitive, emotional, and
          <article-title>language processes in disclosure</article-title>
          ,
          <source>Cognition &amp; emotion 10</source>
          (
          <year>1996</year>
          )
          <fpage>601</fpage>
          -
          <lpage>626</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>