The Influence of Users’ Personality Traits on Satisfaction and Attractiveness of Diversified Recommendation Lists Bruce Ferwerda Mark Graus Andreu Vall Johannes Kepler University Eindhoven University of Johannes Kepler University Altenberger Str. 69 Technology Altenberger Str. 69 4040 Linz, AT IPO 0.20, P.O. Box 513 4040 Linz, AT bruce.ferwerda@jku.at andreu.vall@jku.at 5600 MB Eindhoven, NL m.p.graus@tue.nl Marko Tkalcic Markus Schedl Free University of Bolzano Johannes Kepler University Piazza Domenicani, 3 Altenberger Str. 69 39100 Bozen-Bolzano, IT 4040 Linz, AT marko.tkalcic@unibz.it markus.schedl@jku.at ABSTRACT With an abundance of choices available nowadays, providing Diversifying recommendations has shown to be a good means diversity in the recommendations can counteract on the neg- to counteract on choice difficulties and overload, and is able ative psychological effects that users may experience, such to positively influence subjective evaluations, such as sat- as choice overload and choice difficulties [26]. These negative isfaction and attractiveness. Personal characteristics (e.g., effects are caused by recommender systems, which are orig- domain expertise, prior preference strength) have shown to inally designed to output recommendations that are closest influence the desired level of diversity in a recommendation to the user’s interest. The closer to the user’s interest, the list. However, only personal characteristics that are directly higher the accuracy of the recommender system algorithm, related to the domain have been investigated so far. In this but also results in recommendations that are often too sim- work we take personality traits as a general user model and ilar to each other (e.g., same level of attractiveness to the show that specific traits are related to a preference for dif- user). This does not only increase the chance of choice over- ferent levels of diversity (in terms of recommendation sat- load and choice difficulties to the user, but also increases isfaction and attractiveness). Among 103 participants we the possibility of not covering the full spectrum of the user’s show that conscientiousness is related to a preference for a interest [3]. higher degree of diversification, while agreeableness is re- Although prior research has shown that recommendation lated to a mid-level diversification of the recommendations. diversity has positive effects on the user experience, differ- Our results have implications on how to personalize recom- ences between diversity needs of users have not been given mendation lists (i.e., the amount of diversity that should be a lot of attention. Domain expertise and prior choice pref- provided) depending on users’ personality. erences have shown to play a role in the amount of diversity desired by the user [2, 6, 26]. Others have shown that diver- sity needs can also be related to cultural dimensions [8, 14]. CCS Concepts In this work we consider personality traits as an indicator •Human-centered computing → Human computer of satisfaction and attractiveness on differently diversified interaction (HCI); User models; User studies; music recommendation lists. The use of personality as a general model for users has Keywords gained increased interest. Several works revealed personality- based relationships with users’ behavior, preferences, and Diversity; Recommender Systems; User-Centric Evaluation; needs (e.g., [10, 15, 25]), how to implicitly acquire per- Personality sonality traits of users from social media trails (e.g., Face- book [1, 4, 12, 20], Twitter [16, 21], and Instagram [11, 13, 1. INTRODUCTION 24]), and how personality traits can be implemented into Providing users with a diversified list of recommendations a personalized system [7, 9]. With our work we contribute has shown to have positive effects on the user experience. to the personality research by providing more insights into personality-related diversity needs. We found among 103 participants that the conscientiousness and agreeableness personality traits play a role in the desired amount of di- versity in a recommendation list. While conscientious par- ticipants showed a higher degree of satisfaction and attrac- tiveness with the more diversified recommendations, agree- able participants were more satisfied and found the list more attractive with medium amount of diversity in the recom- EMPIRE 2016, September 16, 2016, Boston, MA, USA. mendations. Copyright held by the author(s). 2. RELATED WORK tening histories of 120,322 Last.fm users from different coun- The positive effects of recommendation list diversity has tries. Since our participants were all located in the United been shown by several researchers. Bollen et al. [2] and States, we only used the United State users of the LFM-1b Willemsen et al. [26] investigated the influence of diversity dataset to complement our dataset. This resulted in 10,255 on movie recommendations and found that diversity has a additional users, which we also aggregated into artist and positive effect on the attractiveness of the recommendation playcount for each user. The final dataset consists of user, set, the difficulty to make a choice, and eventually on the artist, and artist playcount triplets with a total of 387,037 choice satisfaction. Besides the positive effects of diversifica- unique artists for the creation of the recommendation lists. tion, also personal characteristics play a role on the attrac- We used the weighted matrix factorization algorithm of [18] tiveness of the diversified recommendation list (e.g., strength on our final dataset to calculate the recommended items. of prior preference or domain expertise [2, 23]). Bollen et This algorithm is specifically designed to deal with datasets al. [2] found that expertise in the domain showed a positive consisting of implicit feedback (e.g., artist playcounts). We effect on the item attractiveness. optimized the factorization hyper-parameters by conduct- The personal characteristics that have been identified so ing grid-search and picking the setting that yielded the best far are domain specific to the kind of recommendations. 5-fold cross-validated mean percentile rank. Specifically, us- However, a more general personal characteristic may be present ing 20 factors, confidence scaling factor α=40, regularization that influences the subjective evaluations with the diversified weight λ=1000 and 10 iterations of alternating least squares, recommendations. Personality has shown to be an enduring we achieved the best 5-fold cross-validated mean percentile factor, which can relate to one’s taste, preference, and in- rank of 1.78%. 2 Afterwards we factorized the whole user- terest (e.g., [5, 10, 25]). Chen et al. [5] and Wu et al. [27] artist triplets using this set of hyper-parameters. showed relationships with personality and preference for di- The recommended items were diversified as was done in [26] versification based on different movie characteristics (e.g., by using the method of [28]. By using the latent features genre, artist, director). Ferwerda et al. [10] showed that as the basis of diversification instead of additional metadata music preferences can be related to the personality of the like genre information (as is done in content-based recom- listener, whereas Tkalcic et al. [25] found relationships be- mender systems) guarantees that diversity is manipulated in tween personality traits and the preference of being exposed line with user preferences. Previous research demonstrated to certain amounts of multimedia meta-information. that this way of diversifying recommendations is perceived In this work we investigate whether personality traits can accordingly by users [26]. be considered a personal characteristic that influences the A greedy selection to optimize the intra-list similarity [3] subjective evaluations of diversified recommendation lists. was run on the top 200 recommended artists (i.e., the 200 To this end, we rely on the widely used five-factor model artists with highest predicted relevance) to maximize the (FFM), which categorizes personality into five general di- distances between item vectors in the matrix factorization mensions: openness to experience, conscientiousness, ex- space. This algorithm starts with a recommendation set traversion, agreeableness, and neuroticism [19]. consisting of the artist with highest predicted relevance. In an iterative fashion items are added to the recommendation set until it contains 10 items. 3. DATA PREPARATION & PROCEDURES In each step of the iteration, for each candidate item i the We created differently diversified music recommendation sum of all distances from its item vector to each item vec- z lists in order to investigate the influence of personality traits tor in the recommendation set is calculated: ci = P d(i, j), on the subjective evaluation of the recommendation lists. j=1 Since we created the recommendation lists off-line, we sep- where z is the number of items in the recommendation set arated the study in two parts. In the first part participants and d(i, j) is the Euclidean distance between two item vec- were recruited and their complete Last.fm listening history tors i and j). All candidate items are ranked based on de- was crawled in order to create the recommendation lists. creasing value of ci (Pci ) and on predicted relevance (Pri ). After the lists were created, participants from the first part A weighting factor β is introduced to balance the trade-off were invited for the second part where they were asked to between predicted relevance and diversity. For each candi- assess the diversified recommendation lists. date item the combined rank is calculated following wi∗ = We recruited 254 participants through Amazon Mechan- β ∗ Pci + (1 − β) ∗ Pri . The item with the highest combined ical Turk for the first part of the study. Participation was rank is added to the recommendation set and the next step restricted to those located in the United States with a very is taken until 10 items are selected. good reputation (≥95% HIT approval rate and ≥1000 HITs β was manipulated to achieve different levels of diversifi- approved) and a Last.fm account with at least 25 listening cation. In the described implementation β=1 corresponds to events. Furthermore, they were asked to fill in the 44-item maximum diversity, β=0 corresponds to maximum predicted Big Five Inventory personality questionnaire [19] to measure relevance. We compared recommendation lists for different the FFM. Control questions were asked to filter out fake and values of β in terms of the sum of distances between the careless contributions. A compensation of $1 was provided. latent features scores of items in the recommendation set We crawled the complete listening history of each partici- and their average range. The list for β=0.4 showed to fall pant and aggregated the listening events to represent artist halfway between maximum relevance and maximum diver- and playcount (i.e., number of times listened to an artist). sity. Thus, the final β levels for diversification were set at In order to prepare the music recommendation lists for β=0 (low), β=0.4 (medium), and β=1 (high). each participant, we complemented our data with the LFM- After the recommendation lists were created, emails were 1b dataset [22]. 1 This dataset consists of the complete lis- 2 See [18] for details on the hyper-parameters and the defi- 1 Available at http://www.cp.jku.at/datasets/LFM-1b/ nition of the mean percentile rank metric. sent out to all participants to invite them for the second part • Many of the artists in the lists differed from other artists of the study. We created a login screen so that we could in the list. (F L=.837) retrieve the personalized recommendation lists for each par- • The artists differed a lot from each other on different as- ticipant. After the log in, the participant was sequentially pects. (F L=.855) presented with a recommendation list for three times, with each time a different level of diversity (i.e., low, medium, or Recommendation Satisfaction (AVE=.821, α=.932): high). The order of presentation was randomized. Each rec- • I am satisfied with the list of recommended artists. ommended artist was enriched with metadata from Last.fm (F L=.927) (i.e., picture, genre, Top-10 songs with the number of listen- • In most ways the recommended artists were close to ideal. ers and playcounts), which was shown when hovered over the (F L=.905) name in the list. Additionally, example songs were provided • The list of artist recommendations meet my exact needs. by clicking on the artist name (new browser screen linked (F L=.885) to the artist’s YouTube page). Participants were asked to answer questions about perceived diversity, recommenda- Recommendation Attractiveness (AVE=.771, α=.931): tion satisfaction, and recommendation attractiveness 3 be- • I would give the recommended artists a high rating. fore moving on to the next list. These questions needed to (F L=.874) be answered for each of the three lists. • The list of artists showed too many bad items. After the participant assessed all three recommendation (F L=-.830) lists, we performed a manipulation check by placing the • The list of artists was attractive. (F L=.914) three lists next to each other (randomly ordered) and asked • The list of recommendations matched my preferences. the participant to rank order the lists by diversity. (F L=.893) There were 103 participants who returned for the second part of the study. We included several control questions to 4.3 Analysis filter out careless contributions, which left us with 100 par- We used a repeated measures ANOVA in order to inves- ticipants for the analyses. Age: 18-65 (median 28), gender: tigate the influence of personality traits on the subjective 54 male, 46 female, and were compensated with $2. evaluations of the diversified music recommendation lists. Below the results of personality traits on the different subjec- tive evaluations are provided. The effects between diversity 4. RESULTS levels are all compared against the low diversity condition. 4.1 Manipulation Check 4.3.1 Personality on Perceived Diversity A Wilcoxon signed-rank test was used to test the per- Results show that Mauchly’s test is not violated (χ2 (2)= ceived diversity levels of the recommendation lists. Results .115, p=.944), so sphericity can be assumed, and there- show an increase of perceived diversity by comparing the low fore, no correction is needed. The results show that there diversity (M =1.28) against the medium (M =2.05, r=.60, are no significant main effects of the different personality Z=10.370, p<.001) and high condition (M =2.65, r=.80, traits on perceived diversity. However, a general difference Z=13.784, p<.001). A significant diversity increase was also in perceived diversity can be assumed (F (2, 22)=51.029, found between medium and high (r=.45, Z=7.711, p<.001). p<.001). Exploring the differences between the levels of di- versified recommendation lists show that there is an increase 4.2 Measures in perceived diversity when comparing the low diversified list against the medium (F (1, 11)=11.596, p<.001) and the high Items in the questionnaire were assessed using a confir- diversified lists (F (1, 11)=31.191, p< .001). This confirms matory factor analysis (CFA) with repeated ordinal depen- once more that our diversification was effective and was per- dent variables and a weighted least squares estimator to de- ceived as such by the participants. termine whether the questions convey the predicted con- structs. After deleting questions with high cross-loadings 4.3.2 Personality on Recommendation Satisfaction and low commonalities, the model consisting of three con- Mauchly’s test shows that sphericity is not violated (χ2 (2)= structs showed a good fit: χ2 (32)=108.6, p<.001, CF I=.99, 1.830, p=.401), and therefore no correction is needed. As- T LI=.98, RM SEA=.06. 4 The constructs with their items sessing the effect of the different personality traits on the rec- are shown below (5-point Likert scale; Disagree strongly- ommendation satisfaction, the following personality traits Agree strongly). The Cronbach’s alpha (α) and the average show a main effect: conscientiousness (F (4, 22)=2.454, p<.05) variance extracted (AVE) of each construct showed good and agreeableness (F (4, 22)=3.886, p<.05). Additional anal- values (i.e., α>.8, AVE>.5), indicating convergent validity. yses by looking at the levels between the diversity levels Also, the square root of the AVE for each construct is higher (i.e., low, medium, and high diversification) show that con- than any of the factor loadings (FL) of the respective con- scientious participants are increasingly satisfied when pro- struct, which indicates good discriminant validity. vided a higher degree of diversity: medium diversity (F (2, 11)=3.994, p<.05) and high diversity (F (2, 11)=4.036, p<.05). Perceived Diversity (AVE=.723, α=.887): However, the satisfaction differences for agreeable partici- • The list of artists was varied. (F L=.858) pants show a higher satisfaction for the medium diversifica- 3 tion (F (2, 11)=9.660, p<.05) than for the high diversifica- Questions measuring perceived diversity and recommenda- tion (F (2, 11)=4.036, p<.05). tion attractiveness were adapted from [26]. 4 Cutoff values for a good model fit are proposed to be: CF I>.96, T LI>.95, and RSM EA<.05 [17]. 4.3.3 Personality on Recommendation Attractiveness ACM Conference on Recommender systems, pages Assessing Mauchly’s test shows that there is no violat- 161–168. ACM, 2014. ing of sphericity (χ2 (2)= 1.860 p=.395). Also here, results [7] B. Ferwerda and M. Schedl. Enhancing Music show main effects for the conscientiousness (F (4, 22)=3.157, Recommender Systems with Personality Information p<.05) and agreeableness (F (4, 22)=3.469, p<.05) person- and Emotional States: A Proposal. In Proceedings of ality traits. By looking at the differences between the levels the 2nd Workshop on EMPIRE, 2014. of diversification, we found similar patterns as with satis- [8] B. Ferwerda and M. Schedl. Investigating the faction. Results show that conscientious participants were relationship between diversity in music consumption increasingly more attracted to more diversified recommen- behavior and cultural dimensions: A cross-country dation lists: medium (F (2, 11)=2.955, p<.05), high (F (2, analysis. In Proc. of the 1st Workshop on SOAP, 2016. 11)=7.866, p<.05). Participants scoring high on the agree- [9] B. Ferwerda and M. Schedl. Personality-Based User ableness personality traits show to be more attracted to the Modeling for Music Recommender Systems. In medium (F (2, 11)=5.933, p<.05) diversified list than to the Proceedings of the European Conference on Machine high (F (2, 11)=5.314, p<.05) diversified list. Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2016), Riva 5. CONCLUSION & DISCUSSION del Garda, Italy, 2016. Our results show that certain personality traits (i.e., con- [10] B. Ferwerda, M. Schedl, and M. Tkalcic. Personality scientiousness and agreeableness) are related to the subjec- & emotional states: Understanding users’ music tive evaluations of diversified recommendation lists. We listening needs. UMAP 2015 Extended Proceedings. found that conscientious people judged a higher degree of [11] B. Ferwerda, M. Schedl, and M. Tkalčič. Predicting diversity more attractive and were more satisfied with it, Personality Traits with Instagram Pictures. In whereas agreeable people showed to have more interest (i.e., Proceedings of the 3rd Workshop on EMPIRE, 2015. list attractiveness and satisfaction) in a medium degree of [12] B. Ferwerda, M. Schedl, and M. Tkalčič. Personality diversity. Traits and the Relationship with (Non-)Disclosure The relationships that we found can be used in personality- Behavior on Facebook. In Companion of the 25th based systems as proposed in [7]. With the increased con- International WWW Conference, 2016. nectedness of applications, such as recommender systems, [13] B. Ferwerda, M. Schedl, and M. Tkalčič. Using with social networking sites, users’ personality can be ac- Instagram Picture Features to Predict Users’ quired without the need of behavioral data in the applica- Personality. In Proceedings of the 22nd International tion (e.g., via Facebook [1, 4, 12, 20], Twitter [16, 21], or Conference on MMM, Miami, USA, January 2016. Instagram [11, 13, 24]). By identifying relationships with [14] B. Ferwerda, A. Vall, M. Tkalčič, and M. Schedl. users’ personality traits, such as in this work, cross-domain Exploring Music Diversity Needs Across Countries. In inferences about users’ preferences and needs can be made Proceedings of the 24th International Conference on and implemented to provide a personalized experience to UMAP, Halifax, Canada, July 2016. users. [15] B. Ferwerda, E. Yang, M. Schedl, and M. Tkalčič. Personality Traits Predict Music Taxonomy 6. ACKNOWLEDGMENTS Preferences. In ACM CHI ’15 EA, 2015. This research is supported by the Austrian Science Fund [16] J. Golbeck, C. Robles, M. Edmondson, and K. Turner. (FWF): P25655. Predicting Personality from Twitter. In Proceedings of the 3rd International Conference on SocialCom, 2011. [17] L.-t. Hu and P. M. Bentler. Cutoff criteria for fit 7. REFERENCES indexes in covariance structure analysis: Conventional [1] M. D. Back, J. M. Stopfer, S. Vazire, S. Gaddis, S. C. criteria versus new alternatives. Structural equation Schmukle, B. Egloff, and S. D. Gosling. Facebook modeling: a multidisciplinary journal, 6(1):1–55, 1999. profiles reflect actual personality, not self-idealization. [18] Y. Hu, Y. Koren, and C. Volinsky. Collaborative Psychological Science, 21:372–374, 2010. filtering for implicit feedback datasets. In ICDM, 2008. [2] D. Bollen, B. P. Knijnenburg, M. C. Willemsen, and [19] O. P. John, E. M. Donahue, and R. L. Kentle. The big M. Graus. Understanding choice overload in five inventory: Versions 4a and 54, institute of recommender systems. In Proceedings of the fourth personality and social research. UC Berkeley, 1991. ACM conference on RecSys, pages 63–70. ACM, 2010. [20] G. Park, H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, [3] P. Castells, N. J. Hurley, and S. Vargas. Novelty and M. Kosinski, D. J. Stillwell, L. H. Ungar, and M. E. diversity in recommender systems. In Recommender Seligman. Automatic Personality Assessment Through Systems Handbook, pages 881–918. Springer, 2015. Social Media Language. Journal of Personality and [4] F. Celli, E. Bruni, and B. Lepri. Automatic personality Social Psychology, 108, November 2014. and interaction style recognition from facebook profile [21] D. Quercia, M. Kosinski, D. Stillwell, and pictures. In Proceedings of the ACM MM, 2014. J. Crowcroft. Our twitter profiles, our selves: [5] L. Chen, W. Wu, and L. He. How personality Predicting personality with twitter. In Proceedings of influences users’ needs for recommendation diversity? the 3rd International Conference on SocialCom, 2011. In Proceeding of CHI’13 EA. ACM, 2013. [22] M. Schedl. The LFM-1b Dataset for Music Retrieval [6] M. D. Ekstrand, F. M. Harper, M. C. Willemsen, and and Recommendation. In Proceedings on ICMR, 2016. J. A. Konstan. User perception of differences in [23] B. Scheibehenne, R. Greifeneder, and P. M. Todd. recommender algorithms. In Proceedings of the 8th What moderates the too-much-choice effect? Psychology & Marketing, 26(3):229–253, 2009. [24] M. Skowron, B. Ferwerda, M. Tkalčič, and M. Schedl. Fusing Social Media Cues: Personality Prediction from Twitter and Instagram. In Companion Proceedings of the 25th International WWW Conference, 2016. [25] M. Tkalcic, B. Ferwerda, D. Hauger, and M. Schedl. Personality correlates for digital concert program notes. UMAP 2015, Springer LNCS 9146, 2015. [26] M. C. Willemsen, B. P. Knijnenburg, M. P. Graus, L. C. Velter-Bremmers, and K. Fu. Using latent features diversification to reduce choice difficulty in recommendation lists. RecSys, 11:14–20, 2011. [27] W. Wu, L. Chen, and L. He. Using personality to adjust diversity in recommender systems. In Proceedings of the 24th ACM Conference on Hypertext and Social Media, pages 225–229. ACM, 2013. [28] C.-N. Ziegler, S. M. McNee, J. a. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. WWW, page 22, 2005.