<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>MuRS</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Towards Characterising Induced Emotions: Exploiting Physiological Data and Investigating the Efect of Music Familiarity</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ismaël Tankeu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Geofray Bonnin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ENIT</institution>
          ,
          <addr-line>Tarbes</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Loria</institution>
          ,
          <addr-line>Nancy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>2</volume>
      <fpage>0009</fpage>
      <lpage>0009</lpage>
      <abstract>
        <p>Music recommendation aims to suggest songs that align with listeners' preferences. Recent work has shown a strong correlation between induced emotion, i.e., the emotion actually felt during music listening, and music appreciation. However, the potential of leveraging induced emotions for music recommendation still remains unexplored. This paper explores the use of physiological data and music features to predict discrete emotional responses, as defined by the Geneva Emotional Music Scale (GEMS) model. Our results show that integrating physiological data and music familiarity enhances prediction accuracy compared to integrating music features only. Additionally, feature importance analysis revealed that, although music features remained the primary predictor of induced emotions, physiological data contributed substantially to the prediction model.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Physiological Data</kwd>
        <kwd>Music Features</kwd>
        <kwd>Music Listening</kwd>
        <kwd>Induced Emotions</kwd>
        <kwd>Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Music appreciation can be influenced by various psychological phenomena. For instance, music that
induces tension is generally associated with lower levels of appreciation, while music that induces joy
tends to be positively received [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Since the goal of music recommenders is to provide users with music
they will enjoy, being able to predict the emotions a piece of music will evoke in a specific user is of
major importance. The research field related to this task is known as Music Emotion Recognition. It
contains extensive research on perceived emotions, i.e., emotions that listeners believe the music is
expressing or conveying, independent of their own emotional state. In contrast, research on induced
emotions, i.e., emotions actually felt by listeners as a direct result of the music, remains relatively scarce
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Although induced and perceived emotions have been found to often align [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a same music can
induce very diferent emotions on diferent listeners. This is demonstrated in Figure 1, which shows
the frequency of emotions induced by several tracks in the Music-Mouv’ dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As can be seen,
almost none of these tracks induced a uniform emotional response among all participants. The only
exception was “Djadja” from Aya Nakamura, which consistently induced feelings of “Tension” in all
participants despite the song’s moderatley high perceived emotional positivity1. In other words, none of
the participants experienced the perceived positive emotion; rather, they seemed to endure discomfort
throughout its duration.
      </p>
      <p>Research on the inference of induced emotions usually involves physiological data, such as the
heart rate or the pupil size. One of the few works that follow this line of research is [4], which
investigates the efectiveness of using heart rate and skin conductance combined with acoustic features
to classify induced emotions, as well as users’ Big-Five personality traits. Their results show that using
physiological features significantly improved the valence classification (the positivity or negativity
of emotions) when combined with acoustic features. However these features ofered less benefit for
arousal classification (the intensity of emotions), where acoustic features alone were very efective.
One limitation of this approach is that it uses a classifier on oversimplified emotion categories derived
from the dimensional model of afect. In this work, we will rely on a music-specific discrete emotion
framework developed and validated for music, to achieve a more nuanced classification of emotional
responses.</p>
      <p>Another approach proposed by [5] applied regression models that explicitely diferentiate between
perceived and induced emotions to predict valence and arousal ratings provided by users during music
listening. More precisely their assumption was that listeners assigned these ratings by mixing both
types of emotions through a weighted judgment. Their results show a slight improvement compared to
models that assume purely induced or purely perceived emotion ratings. However, it is not possible to
determine the exact accuracy of their assumption, nor is it possible to know the extent to which the
weights used in the weighted judgement models correspond to reality. Our experiments solely focus
on induced emotions, i.e., we use subjective data from participants who were specifically asked to the
emotions they actually felt through a semi-directed interview approach.</p>
      <p>
        In our previous work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we collected a comprehensive dataset that captures the interplay between
emotions, physiological responses, physical movement, familiarity and liking judgement. Our aim
was to study how the emotional context induced by music listening impacts gait initiation. Our
preliminary findings indicated that the induced emotions significantly influenced gait initiation, with
certain emotional states facilitating or hindering the movement. Additionally, our results suggested
that music familiarity has a major importance and should be taken into account for induced emotion
recognition. In this paper, we therefore aim to also investigate the impact of the familiarity factor
relatively to the physiological and musical features to further improve the classification accuracy.
Our research questions are therefore the following:
RQ 1: How efectively can physiological data collected during music listening predict discrete emotional
responses to music, as defined by a music-specific discrete emotion model?
RQ 2: How do music features compare to physiological data to predict discrete emotional responses?
RQ 3: Does music familiarity further impact the efectiveness of predicting discrete emotional
responses?
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <sec id="sec-2-1">
        <title>2.1. Feature extraction</title>
        <p>
          Our experiments rely on the Music-Mouv’ dataset [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The most important aspect of this dataset in
relation to our research questions is that the induced emotions it contains were collected using a
semistructured interview approach, ensuring that the participants only reported the emotions they actually
felt. This dataset contains the physiological responses of 35 participants recorded during music listening.
These data were collected using Empatica E4 wristbands and include Blood Volume Pulse (BVP) and
electrodermal activity (EDA). Overall, 223 trials where made, and a total of 188 tracks where played,
i.e., 27 tracks were played twice or more. Participants also indicated the emotions they experienced
during music listening according to a discrete model, the Geneva Emotional Music Scale (GEMS) model
[6], and a dimensional model, the usual valence and arousal scales. The dataset also contains binary
familiarity ratings and the following track features extracted from the Web API of Spotify2: danceability,
energy, loudness, key, mode, speechiness, accousticness, instrumentalness, liveness, valence and tempo.
        </p>
        <p>In order to answer our research questions we extracted several features from the raw physiological
data of the dataset. We extracted time-domain and frequency-domain Heart Rate Variability (HRV)
features [7, 4] by using the neurokit library [8] on the recorded BVP signals of the dataset. We also
extracted several EDA related physiological features. EDA signals include two components: the slowly
changing tonic component, which reflects a person’s general skin conductance level, and the rapidly
changing phasic component, which is often elicited by external stimuli [4]. We also relied on the
neurokit library to extract several peak-related features from the phasic component. The main extracted
features3 are presented in Table 1.</p>
        <sec id="sec-2-1-1">
          <title>2See https://developer.spotify.com/documentation/web-api/. 3We only included the 10 main EDA and HRV features for space reasons.</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Experimental protocol</title>
        <p>We compared the results of applying Random Forest classification using diferent types of input attributes:
(1) only physiological features, (2) only music features, (3) both types of features, and (4) both types of
features along with music familiarity. We used scikit-learn’s RandomForestClassifier 4 with a number of
estimators of 100.</p>
        <p>Although the dataset also contains subjective ratings of valence and arousal for each track, in this
paper we only focus on the task of classifying discrete emotions. Approximately 70% of these emotions
were represented by “Joyful Activation” and “Tension.” Given the corresponding low number of trials
of the other emotions in the dataset, we decided to focus only on these two emotions for the remainder
of our study, leaving us with 154 trials. An additional 26 trials had to be removed due to missing raw
physiological data or the inability to extract all the corresponding physiological features5.</p>
        <p>We evaluated all four resulting models in terms of precision, which is the ratio of correct predictions
to all predictions6. To assess the importance of each feature in predicting the induced emotions, we
also examined the respective importances of the features as determined by the trees computed by the
algorithm. In scikit-learn, this can be calculated based on the reduction in Gini impurity that the feature
brings within each tree.</p>
        <p>In the context of machine learning, the selection of input features generally impacts the performance
of the model. Correlated features can cause issues such as multicollinearity, which can lead to less
reliable predictions. Conversely, focusing on a subset of features reduces the risk of overfitting, especially
when the training data size is limited. Finally, selecting a subset of less correlated features allows to
better understand which features are driving the predictions, which can enhance the interpretability
of the model. In order to select our input features, we therefore relied on their correlations. Figure 2
shows the Spearman’s correlations of the main physiological features we extracted and of the music
features from Spotify. The process for selecting our features involved trying diferent combinations
while avoiding multicollinearity, as indicated by the correlation matrix.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>
        The precision results obtained through a 5-fold cross-validation procedure with 80/20 splits are shown
in Figure 3. A first interesting observation is that using only musical features resulted in better precision
compared to using only physiological features (64.2% vs 60.1%). This result is expected as induced
emotions often align with perceived emotions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and the collected physiological data may only
partially correspond to the emotional response, given that physiology can be influenced by various
factors. Combining both types of features further increased the precision, which reached 65.7%. This
result confirms the importance of making the distinction between perceived emotions and induced
emotions, with the former being the most prevalent in the domain of music emotion recognition. It
also confirms that integrating both types of data can be instrumental in predicting actual emotional
responses to music, even when those emotions fall into discrete music-specific categories. Finally, also
including the familiarity ratings as input of the classifier further improved the accuracy, which reached
68.1%. This result further confirms the importance of taking into account this factor for the detection of
induced emotions.
      </p>
      <p>Next, we looked at the respective importance of the input features of the four versions of the model.
As explained in the previous section, we selected a subset of the physiological features and of the
music features based on the correlation matrix to avoid redundancy. The values of the resulting best
combinations we found are shown in Figure 4. Regarding the version with only physiological features,
we can notice that there was no clear winner between EDA and HRV features, as both types of features</p>
      <sec id="sec-3-1">
        <title>4https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html</title>
        <p>5Our final dataset and our code can be accessed here: https://homepages.loria.fr/gbonnin/music-mouv/.
6Note that for each prediction, since each trial corresponds to a single induced emotion, the number of false negatives is equal
to the number of false positives, making recall equivalent to precision. In this case, the metric is also often referred to as hit
ratio.
iscPaaeEhnADM iscPaEhAACUD ilitsscPaeEuuhpSdAADmm ilitsscPaaeeEnuhpdAADMm iscPTaEhSADD liitsscxPaaeEuhpdAADMm iiscPaaeEhndADM ifrssckPPoaaeeEuhbANDm iitrssscPoaEhuKAD isssckPaeeEhnSADw SVVRCHD PTVRH ITVRHH iaendVRNNHM adVRNNHM IVRRNNHQ rcP80VRNNH rcP20VRNNH inVRNNHM SSSVRRHDDM ilitcyaaendb littrsssaeeunnnm ryeeng lssoeund itssscccoaeun kye oedm isssceeehnp
were quite balanced and as no strong diference appeared between successive features. The version with
only music features led to a very diferent outcome, with a strong diference of importance between the
features, except for danceability and speechiness, which were the two most important features.</p>
        <p>Combining both types of features further validated the significance of danceability and speechiness,
with importance values far higher than the rest. This strong diference aligns with the better results
obtained from using music features alone compared to using physiological features alone: since induced
emotions often correspond with perceived emotions, music features are the strongest predictors.
Nevertheless, the importance of physiological features remains substantial, with their importance values
being comparable to, or even exceeding, those of other music features. This tends to further support
the importance of personalising the recognition of induced emotions with physiological features. One
interesting outcome was the reduced importance of acousticness when physiological features were
incorporated. This may suggest a relationship between acousticness and certain physiological features,
which was not evident in the correlation matrix. Finally, the importance of the familiarity feature when
also including it was unexpectedly low, especially given the previously observed importance of this
feature in terms of correlation with induced emotions. A relationship may also exist between familiarity
and certain physiological features.</p>
        <p>Overall, these results confirm that, while musical features are established strong predictors of
perceived emotions, they also are, to a certain extent, good predictors of induced emotions. However,
these factors alone are insuficient, as also incorporating physiological data enables more accurate
predictions tailored to each participant. Finally, the results further validate the importance of music
familiarity in predicting induced emotions, as including this feature in our model further enhanced the
results.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This paper investigated the use of physiological data and music features to predict discrete emotional
responses induced by music listening, as defined by the music-specific discrete GEMS model. While
the use of music features alone led to a precision of 64.2%, incorporating physiological data improved
the results to 65.7%. Additionally, including music familiarity further enhanced the precision to 68.1%,
confirming the importance of this feature. Feature importance analysis revealed danceability and
speechiness as key music features, with physiological features showing balanced but substantial
contributions. In our future work, we will explore additional physiological and contextual factors to refine
our understanding of music-induced emotions. By doing so, we seek to uncover how emotion indicators
can be efectively utilised to enhance music recommendations.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Acknowledgments</title>
      <p>Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a
scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well
as other organizations7.
Figure 4: Feature importances of each variant of the model.
[4] X. Hu, F. Li, R. Liu, Detecting Music-Induced Emotion Based on Acoustic Analysis and Physiological</p>
      <p>Sensing: A Multimodal Approach, Applied Sciences 12 (2022).
[5] N. Vempala, F. Russo, Modeling Music Emotion Judgments Using Machine Learning Methods,</p>
      <p>Frontiers in psychology 8 (2018).
[6] M. Zentner, D. Grandjean, K. Scherer, Emotions evoked by the sound of music: characterization,
classification, and measurement, Emotion 8 (2008).
[7] F. Shafer, J. Ginsberg, An Overview of Heart Rate Variability Metrics and Norms, Frontiers in
public health (2017).
[8] D. Makowski, T. Pham, Z. J. Lau, J. C. Brammer, F. Lespinasse, H. Pham, C. Schölzel, S. H. A.</p>
      <p>Chen, NeuroKit2: A python toolbox for neurophysiological signal processing, Behavior Research
Methods 53 (2021) 1689–1696. URL: https://doi.org/10.3758%2Fs13428-020-01516-y. doi:10.3758/
s13428-020-01516-y.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Doumbia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Renard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Coudrat</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Bonnin, Characterizing the Emotional Context Induced by Music Listening and its Efects on Gait Initiation: Exploiting Physiological and Biomechanical Data</article-title>
          ,
          <source>in: Adjunct Proceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization</source>
          , Association for Computing Machinery, New York, NY, USA,
          <year>2023</year>
          . URL: https: //doi.org/10.1145/3563359.3596982.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kong</surname>
          </string-name>
          , J. Han,
          <string-name>
            <surname>G</surname>
          </string-name>
          . Wang,
          <article-title>A survey of music emotion recognition</article-title>
          ,
          <source>Frontiers of Computer Science</source>
          <volume>16</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dixon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pearce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Halpern</surname>
          </string-name>
          , Perceived and Induced Emotion Responses to Popular Music: Categorical and
          <string-name>
            <given-names>Dimensional</given-names>
            <surname>Models</surname>
          </string-name>
          ,
          <source>Music Perception: An Interdisciplinary Journal</source>
          <volume>33</volume>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>