The Influence of Users’ Personality Traits on Satisfaction
     and Attractiveness of Diversified Recommendation Lists

                Bruce Ferwerda                          Mark Graus                           Andreu Vall
            Johannes Kepler University              Eindhoven University of          Johannes Kepler University
               Altenberger Str. 69                        Technology                    Altenberger Str. 69
                  4040 Linz, AT                      IPO 0.20, P.O. Box 513                4040 Linz, AT
            bruce.ferwerda@jku.at                                       andreu.vall@jku.at
                                                    5600 MB Eindhoven, NL
                                                 m.p.graus@tue.nl
                                      Marko Tkalcic           Markus Schedl
                                Free University of Bolzano        Johannes Kepler University
                                  Piazza Domenicani, 3               Altenberger Str. 69
                                39100 Bozen-Bolzano, IT                 4040 Linz, AT
                                marko.tkalcic@unibz.it            markus.schedl@jku.at

ABSTRACT                                                          With an abundance of choices available nowadays, providing
Diversifying recommendations has shown to be a good means         diversity in the recommendations can counteract on the neg-
to counteract on choice difficulties and overload, and is able    ative psychological effects that users may experience, such
to positively influence subjective evaluations, such as sat-      as choice overload and choice difficulties [26]. These negative
isfaction and attractiveness. Personal characteristics (e.g.,     effects are caused by recommender systems, which are orig-
domain expertise, prior preference strength) have shown to        inally designed to output recommendations that are closest
influence the desired level of diversity in a recommendation      to the user’s interest. The closer to the user’s interest, the
list. However, only personal characteristics that are directly    higher the accuracy of the recommender system algorithm,
related to the domain have been investigated so far. In this      but also results in recommendations that are often too sim-
work we take personality traits as a general user model and       ilar to each other (e.g., same level of attractiveness to the
show that specific traits are related to a preference for dif-    user). This does not only increase the chance of choice over-
ferent levels of diversity (in terms of recommendation sat-       load and choice difficulties to the user, but also increases
isfaction and attractiveness). Among 103 participants we          the possibility of not covering the full spectrum of the user’s
show that conscientiousness is related to a preference for a      interest [3].
higher degree of diversification, while agreeableness is re-         Although prior research has shown that recommendation
lated to a mid-level diversification of the recommendations.      diversity has positive effects on the user experience, differ-
Our results have implications on how to personalize recom-        ences between diversity needs of users have not been given
mendation lists (i.e., the amount of diversity that should be     a lot of attention. Domain expertise and prior choice pref-
provided) depending on users’ personality.                        erences have shown to play a role in the amount of diversity
                                                                  desired by the user [2, 6, 26]. Others have shown that diver-
                                                                  sity needs can also be related to cultural dimensions [8, 14].
CCS Concepts                                                      In this work we consider personality traits as an indicator
•Human-centered computing → Human computer                        of satisfaction and attractiveness on differently diversified
interaction (HCI); User models; User studies;                     music recommendation lists.
                                                                     The use of personality as a general model for users has
Keywords                                                          gained increased interest. Several works revealed personality-
                                                                  based relationships with users’ behavior, preferences, and
Diversity; Recommender Systems; User-Centric Evaluation;
                                                                  needs (e.g., [10, 15, 25]), how to implicitly acquire per-
Personality
                                                                  sonality traits of users from social media trails (e.g., Face-
                                                                  book [1, 4, 12, 20], Twitter [16, 21], and Instagram [11, 13,
1.    INTRODUCTION                                                24]), and how personality traits can be implemented into
  Providing users with a diversified list of recommendations      a personalized system [7, 9]. With our work we contribute
has shown to have positive effects on the user experience.        to the personality research by providing more insights into
                                                                  personality-related diversity needs. We found among 103
                                                                  participants that the conscientiousness and agreeableness
                                                                  personality traits play a role in the desired amount of di-
                                                                  versity in a recommendation list. While conscientious par-
                                                                  ticipants showed a higher degree of satisfaction and attrac-
                                                                  tiveness with the more diversified recommendations, agree-
                                                                  able participants were more satisfied and found the list more
                                                                  attractive with medium amount of diversity in the recom-
EMPIRE 2016, September 16, 2016, Boston, MA, USA.                 mendations.
Copyright held by the author(s).
2.     RELATED WORK                                                 tening histories of 120,322 Last.fm users from different coun-
   The positive effects of recommendation list diversity has        tries. Since our participants were all located in the United
been shown by several researchers. Bollen et al. [2] and            States, we only used the United State users of the LFM-1b
Willemsen et al. [26] investigated the influence of diversity       dataset to complement our dataset. This resulted in 10,255
on movie recommendations and found that diversity has a             additional users, which we also aggregated into artist and
positive effect on the attractiveness of the recommendation         playcount for each user. The final dataset consists of user,
set, the difficulty to make a choice, and eventually on the         artist, and artist playcount triplets with a total of 387,037
choice satisfaction. Besides the positive effects of diversifica-   unique artists for the creation of the recommendation lists.
tion, also personal characteristics play a role on the attrac-         We used the weighted matrix factorization algorithm of [18]
tiveness of the diversified recommendation list (e.g., strength     on our final dataset to calculate the recommended items.
of prior preference or domain expertise [2, 23]). Bollen et         This algorithm is specifically designed to deal with datasets
al. [2] found that expertise in the domain showed a positive        consisting of implicit feedback (e.g., artist playcounts). We
effect on the item attractiveness.                                  optimized the factorization hyper-parameters by conduct-
   The personal characteristics that have been identified so        ing grid-search and picking the setting that yielded the best
far are domain specific to the kind of recommendations.             5-fold cross-validated mean percentile rank. Specifically, us-
However, a more general personal characteristic may be present      ing 20 factors, confidence scaling factor α=40, regularization
that influences the subjective evaluations with the diversified     weight λ=1000 and 10 iterations of alternating least squares,
recommendations. Personality has shown to be an enduring            we achieved the best 5-fold cross-validated mean percentile
factor, which can relate to one’s taste, preference, and in-        rank of 1.78%. 2 Afterwards we factorized the whole user-
terest (e.g., [5, 10, 25]). Chen et al. [5] and Wu et al. [27]      artist triplets using this set of hyper-parameters.
showed relationships with personality and preference for di-           The recommended items were diversified as was done in [26]
versification based on different movie characteristics (e.g.,       by using the method of [28]. By using the latent features
genre, artist, director). Ferwerda et al. [10] showed that          as the basis of diversification instead of additional metadata
music preferences can be related to the personality of the          like genre information (as is done in content-based recom-
listener, whereas Tkalcic et al. [25] found relationships be-       mender systems) guarantees that diversity is manipulated in
tween personality traits and the preference of being exposed        line with user preferences. Previous research demonstrated
to certain amounts of multimedia meta-information.                  that this way of diversifying recommendations is perceived
   In this work we investigate whether personality traits can       accordingly by users [26].
be considered a personal characteristic that influences the            A greedy selection to optimize the intra-list similarity [3]
subjective evaluations of diversified recommendation lists.         was run on the top 200 recommended artists (i.e., the 200
To this end, we rely on the widely used five-factor model           artists with highest predicted relevance) to maximize the
(FFM), which categorizes personality into five general di-          distances between item vectors in the matrix factorization
mensions: openness to experience, conscientiousness, ex-            space. This algorithm starts with a recommendation set
traversion, agreeableness, and neuroticism [19].                    consisting of the artist with highest predicted relevance. In
                                                                    an iterative fashion items are added to the recommendation
                                                                    set until it contains 10 items.
3.     DATA PREPARATION & PROCEDURES                                   In each step of the iteration, for each candidate item i the
   We created differently diversified music recommendation          sum of all distances from its item vector to each item vec-
                                                                                                                          z
lists in order to investigate the influence of personality traits   tor in the recommendation set is calculated: ci =
                                                                                                                          P
                                                                                                                            d(i, j),
on the subjective evaluation of the recommendation lists.                                                               j=1
Since we created the recommendation lists off-line, we sep-         where z is the number of items in the recommendation set
arated the study in two parts. In the first part participants       and d(i, j) is the Euclidean distance between two item vec-
were recruited and their complete Last.fm listening history         tors i and j). All candidate items are ranked based on de-
was crawled in order to create the recommendation lists.            creasing value of ci (Pci ) and on predicted relevance (Pri ).
After the lists were created, participants from the first part      A weighting factor β is introduced to balance the trade-off
were invited for the second part where they were asked to           between predicted relevance and diversity. For each candi-
assess the diversified recommendation lists.                        date item the combined rank is calculated following wi∗ =
   We recruited 254 participants through Amazon Mechan-             β ∗ Pci + (1 − β) ∗ Pri . The item with the highest combined
ical Turk for the first part of the study. Participation was        rank is added to the recommendation set and the next step
restricted to those located in the United States with a very        is taken until 10 items are selected.
good reputation (≥95% HIT approval rate and ≥1000 HITs                 β was manipulated to achieve different levels of diversifi-
approved) and a Last.fm account with at least 25 listening          cation. In the described implementation β=1 corresponds to
events. Furthermore, they were asked to fill in the 44-item         maximum diversity, β=0 corresponds to maximum predicted
Big Five Inventory personality questionnaire [19] to measure        relevance. We compared recommendation lists for different
the FFM. Control questions were asked to filter out fake and        values of β in terms of the sum of distances between the
careless contributions. A compensation of $1 was provided.          latent features scores of items in the recommendation set
We crawled the complete listening history of each partici-          and their average range. The list for β=0.4 showed to fall
pant and aggregated the listening events to represent artist        halfway between maximum relevance and maximum diver-
and playcount (i.e., number of times listened to an artist).        sity. Thus, the final β levels for diversification were set at
   In order to prepare the music recommendation lists for           β=0 (low), β=0.4 (medium), and β=1 (high).
each participant, we complemented our data with the LFM-               After the recommendation lists were created, emails were
1b dataset [22]. 1 This dataset consists of the complete lis-       2
                                                                      See [18] for details on the hyper-parameters and the defi-
1
    Available at http://www.cp.jku.at/datasets/LFM-1b/              nition of the mean percentile rank metric.
sent out to all participants to invite them for the second part   • Many of the artists in the lists differed from other artists
of the study. We created a login screen so that we could            in the list. (F L=.837)
retrieve the personalized recommendation lists for each par-      • The artists differed a lot from each other on different as-
ticipant. After the log in, the participant was sequentially        pects. (F L=.855)
presented with a recommendation list for three times, with
each time a different level of diversity (i.e., low, medium, or   Recommendation Satisfaction (AVE=.821, α=.932):
high). The order of presentation was randomized. Each rec-        • I am satisfied with the list of recommended artists.
ommended artist was enriched with metadata from Last.fm             (F L=.927)
(i.e., picture, genre, Top-10 songs with the number of listen-    • In most ways the recommended artists were close to ideal.
ers and playcounts), which was shown when hovered over the          (F L=.905)
name in the list. Additionally, example songs were provided       • The list of artist recommendations meet my exact needs.
by clicking on the artist name (new browser screen linked           (F L=.885)
to the artist’s YouTube page). Participants were asked to
answer questions about perceived diversity, recommenda-           Recommendation Attractiveness (AVE=.771, α=.931):
tion satisfaction, and recommendation attractiveness 3 be-        • I would give the recommended artists a high rating.
fore moving on to the next list. These questions needed to          (F L=.874)
be answered for each of the three lists.                          • The list of artists showed too many bad items.
   After the participant assessed all three recommendation          (F L=-.830)
lists, we performed a manipulation check by placing the           • The list of artists was attractive. (F L=.914)
three lists next to each other (randomly ordered) and asked       • The list of recommendations matched my preferences.
the participant to rank order the lists by diversity.               (F L=.893)
   There were 103 participants who returned for the second
part of the study. We included several control questions to       4.3     Analysis
filter out careless contributions, which left us with 100 par-       We used a repeated measures ANOVA in order to inves-
ticipants for the analyses. Age: 18-65 (median 28), gender:       tigate the influence of personality traits on the subjective
54 male, 46 female, and were compensated with $2.                 evaluations of the diversified music recommendation lists.
                                                                  Below the results of personality traits on the different subjec-
                                                                  tive evaluations are provided. The effects between diversity
4.    RESULTS                                                     levels are all compared against the low diversity condition.

4.1    Manipulation Check                                         4.3.1    Personality on Perceived Diversity
  A Wilcoxon signed-rank test was used to test the per-              Results show that Mauchly’s test is not violated (χ2 (2)=
ceived diversity levels of the recommendation lists. Results      .115, p=.944), so sphericity can be assumed, and there-
show an increase of perceived diversity by comparing the low      fore, no correction is needed. The results show that there
diversity (M =1.28) against the medium (M =2.05, r=.60,           are no significant main effects of the different personality
Z=10.370, p<.001) and high condition (M =2.65, r=.80,             traits on perceived diversity. However, a general difference
Z=13.784, p<.001). A significant diversity increase was also      in perceived diversity can be assumed (F (2, 22)=51.029,
found between medium and high (r=.45, Z=7.711, p<.001).           p<.001). Exploring the differences between the levels of di-
                                                                  versified recommendation lists show that there is an increase
4.2    Measures                                                   in perceived diversity when comparing the low diversified list
                                                                  against the medium (F (1, 11)=11.596, p<.001) and the high
  Items in the questionnaire were assessed using a confir-        diversified lists (F (1, 11)=31.191, p< .001). This confirms
matory factor analysis (CFA) with repeated ordinal depen-         once more that our diversification was effective and was per-
dent variables and a weighted least squares estimator to de-      ceived as such by the participants.
termine whether the questions convey the predicted con-
structs. After deleting questions with high cross-loadings        4.3.2    Personality on Recommendation Satisfaction
and low commonalities, the model consisting of three con-            Mauchly’s test shows that sphericity is not violated (χ2 (2)=
structs showed a good fit: χ2 (32)=108.6, p<.001, CF I=.99,       1.830, p=.401), and therefore no correction is needed. As-
T LI=.98, RM SEA=.06. 4 The constructs with their items           sessing the effect of the different personality traits on the rec-
are shown below (5-point Likert scale; Disagree strongly-         ommendation satisfaction, the following personality traits
Agree strongly). The Cronbach’s alpha (α) and the average         show a main effect: conscientiousness (F (4, 22)=2.454, p<.05)
variance extracted (AVE) of each construct showed good            and agreeableness (F (4, 22)=3.886, p<.05). Additional anal-
values (i.e., α>.8, AVE>.5), indicating convergent validity.      yses by looking at the levels between the diversity levels
Also, the square root of the AVE for each construct is higher     (i.e., low, medium, and high diversification) show that con-
than any of the factor loadings (FL) of the respective con-       scientious participants are increasingly satisfied when pro-
struct, which indicates good discriminant validity.               vided a higher degree of diversity: medium diversity (F (2,
                                                                  11)=3.994, p<.05) and high diversity (F (2, 11)=4.036, p<.05).
Perceived Diversity (AVE=.723, α=.887):                           However, the satisfaction differences for agreeable partici-
• The list of artists was varied. (F L=.858)                      pants show a higher satisfaction for the medium diversifica-
3
                                                                  tion (F (2, 11)=9.660, p<.05) than for the high diversifica-
  Questions measuring perceived diversity and recommenda-         tion (F (2, 11)=4.036, p<.05).
tion attractiveness were adapted from [26].
4
  Cutoff values for a good model fit are proposed to be:
CF I>.96, T LI>.95, and RSM EA<.05 [17].
4.3.3    Personality on Recommendation Attractiveness                   ACM Conference on Recommender systems, pages
   Assessing Mauchly’s test shows that there is no violat-              161–168. ACM, 2014.
ing of sphericity (χ2 (2)= 1.860 p=.395). Also here, results        [7] B. Ferwerda and M. Schedl. Enhancing Music
show main effects for the conscientiousness (F (4, 22)=3.157,           Recommender Systems with Personality Information
p<.05) and agreeableness (F (4, 22)=3.469, p<.05) person-               and Emotional States: A Proposal. In Proceedings of
ality traits. By looking at the differences between the levels          the 2nd Workshop on EMPIRE, 2014.
of diversification, we found similar patterns as with satis-        [8] B. Ferwerda and M. Schedl. Investigating the
faction. Results show that conscientious participants were              relationship between diversity in music consumption
increasingly more attracted to more diversified recommen-               behavior and cultural dimensions: A cross-country
dation lists: medium (F (2, 11)=2.955, p<.05), high (F (2,              analysis. In Proc. of the 1st Workshop on SOAP, 2016.
11)=7.866, p<.05). Participants scoring high on the agree-          [9] B. Ferwerda and M. Schedl. Personality-Based User
ableness personality traits show to be more attracted to the            Modeling for Music Recommender Systems. In
medium (F (2, 11)=5.933, p<.05) diversified list than to the            Proceedings of the European Conference on Machine
high (F (2, 11)=5.314, p<.05) diversified list.                         Learning and Principles and Practice of Knowledge
                                                                        Discovery in Databases (ECML PKDD 2016), Riva
5.   CONCLUSION & DISCUSSION                                            del Garda, Italy, 2016.
   Our results show that certain personality traits (i.e., con-    [10] B. Ferwerda, M. Schedl, and M. Tkalcic. Personality
scientiousness and agreeableness) are related to the subjec-            & emotional states: Understanding users’ music
tive evaluations of diversified recommendation lists. We                listening needs. UMAP 2015 Extended Proceedings.
found that conscientious people judged a higher degree of          [11] B. Ferwerda, M. Schedl, and M. Tkalčič. Predicting
diversity more attractive and were more satisfied with it,              Personality Traits with Instagram Pictures. In
whereas agreeable people showed to have more interest (i.e.,            Proceedings of the 3rd Workshop on EMPIRE, 2015.
list attractiveness and satisfaction) in a medium degree of        [12] B. Ferwerda, M. Schedl, and M. Tkalčič. Personality
diversity.                                                              Traits and the Relationship with (Non-)Disclosure
   The relationships that we found can be used in personality-          Behavior on Facebook. In Companion of the 25th
based systems as proposed in [7]. With the increased con-               International WWW Conference, 2016.
nectedness of applications, such as recommender systems,           [13] B. Ferwerda, M. Schedl, and M. Tkalčič. Using
with social networking sites, users’ personality can be ac-             Instagram Picture Features to Predict Users’
quired without the need of behavioral data in the applica-              Personality. In Proceedings of the 22nd International
tion (e.g., via Facebook [1, 4, 12, 20], Twitter [16, 21], or           Conference on MMM, Miami, USA, January 2016.
Instagram [11, 13, 24]). By identifying relationships with         [14] B. Ferwerda, A. Vall, M. Tkalčič, and M. Schedl.
users’ personality traits, such as in this work, cross-domain           Exploring Music Diversity Needs Across Countries. In
inferences about users’ preferences and needs can be made               Proceedings of the 24th International Conference on
and implemented to provide a personalized experience to                 UMAP, Halifax, Canada, July 2016.
users.                                                             [15] B. Ferwerda, E. Yang, M. Schedl, and M. Tkalčič.
                                                                        Personality Traits Predict Music Taxonomy
6.   ACKNOWLEDGMENTS                                                    Preferences. In ACM CHI ’15 EA, 2015.
  This research is supported by the Austrian Science Fund          [16] J. Golbeck, C. Robles, M. Edmondson, and K. Turner.
(FWF): P25655.                                                          Predicting Personality from Twitter. In Proceedings of
                                                                        the 3rd International Conference on SocialCom, 2011.
                                                                   [17] L.-t. Hu and P. M. Bentler. Cutoff criteria for fit
7.   REFERENCES                                                         indexes in covariance structure analysis: Conventional
 [1] M. D. Back, J. M. Stopfer, S. Vazire, S. Gaddis, S. C.             criteria versus new alternatives. Structural equation
     Schmukle, B. Egloff, and S. D. Gosling. Facebook                   modeling: a multidisciplinary journal, 6(1):1–55, 1999.
     profiles reflect actual personality, not self-idealization.   [18] Y. Hu, Y. Koren, and C. Volinsky. Collaborative
     Psychological Science, 21:372–374, 2010.                           filtering for implicit feedback datasets. In ICDM, 2008.
 [2] D. Bollen, B. P. Knijnenburg, M. C. Willemsen, and            [19] O. P. John, E. M. Donahue, and R. L. Kentle. The big
     M. Graus. Understanding choice overload in                         five inventory: Versions 4a and 54, institute of
     recommender systems. In Proceedings of the fourth                  personality and social research. UC Berkeley, 1991.
     ACM conference on RecSys, pages 63–70. ACM, 2010.             [20] G. Park, H. A. Schwartz, J. C. Eichstaedt, M. L. Kern,
 [3] P. Castells, N. J. Hurley, and S. Vargas. Novelty and              M. Kosinski, D. J. Stillwell, L. H. Ungar, and M. E.
     diversity in recommender systems. In Recommender                   Seligman. Automatic Personality Assessment Through
     Systems Handbook, pages 881–918. Springer, 2015.                   Social Media Language. Journal of Personality and
 [4] F. Celli, E. Bruni, and B. Lepri. Automatic personality            Social Psychology, 108, November 2014.
     and interaction style recognition from facebook profile       [21] D. Quercia, M. Kosinski, D. Stillwell, and
     pictures. In Proceedings of the ACM MM, 2014.                      J. Crowcroft. Our twitter profiles, our selves:
 [5] L. Chen, W. Wu, and L. He. How personality                         Predicting personality with twitter. In Proceedings of
     influences users’ needs for recommendation diversity?              the 3rd International Conference on SocialCom, 2011.
     In Proceeding of CHI’13 EA. ACM, 2013.                        [22] M. Schedl. The LFM-1b Dataset for Music Retrieval
 [6] M. D. Ekstrand, F. M. Harper, M. C. Willemsen, and                 and Recommendation. In Proceedings on ICMR, 2016.
     J. A. Konstan. User perception of differences in              [23] B. Scheibehenne, R. Greifeneder, and P. M. Todd.
     recommender algorithms. In Proceedings of the 8th
     What moderates the too-much-choice effect?
     Psychology & Marketing, 26(3):229–253, 2009.
[24] M. Skowron, B. Ferwerda, M. Tkalčič, and M. Schedl.
     Fusing Social Media Cues: Personality Prediction from
     Twitter and Instagram. In Companion Proceedings of
     the 25th International WWW Conference, 2016.
[25] M. Tkalcic, B. Ferwerda, D. Hauger, and M. Schedl.
     Personality correlates for digital concert program
     notes. UMAP 2015, Springer LNCS 9146, 2015.
[26] M. C. Willemsen, B. P. Knijnenburg, M. P. Graus,
     L. C. Velter-Bremmers, and K. Fu. Using latent
     features diversification to reduce choice difficulty in
     recommendation lists. RecSys, 11:14–20, 2011.
[27] W. Wu, L. Chen, and L. He. Using personality to
     adjust diversity in recommender systems. In
     Proceedings of the 24th ACM Conference on Hypertext
     and Social Media, pages 225–229. ACM, 2013.
[28] C.-N. Ziegler, S. M. McNee, J. a. Konstan, and
     G. Lausen. Improving recommendation lists through
     topic diversification. WWW, page 22, 2005.