=Paper= {{Paper |id=Vol-1680/paper9 |storemode=property |title=Recommender System Incorporating User Personality Profile through Analysis of Written Reviews |pdfUrl=https://ceur-ws.org/Vol-1680/paper9.pdf |volume=Vol-1680 |authors=Peter Potash,Anna Rumshisky |dblpUrl=https://dblp.org/rec/conf/recsys/PotashR16 }} ==Recommender System Incorporating User Personality Profile through Analysis of Written Reviews== https://ceur-ws.org/Vol-1680/paper9.pdf
     Recommender System Incorporating User Personality
         Profile through Analysis of Written Reviews

                            Peter Potash                                           Anna Rumshisky
                Department of Computer Science                             Department of Computer Science
                University of Massachusetts Lowell                         University of Massachusetts Lowell
                      Lowell, Massachusetts                                      Lowell, Massachusetts
                      ppotash@cs.uml.edu                                          arum@cs.uml.edu

ABSTRACT                                                                Recent work [20, 2, 1] has shown the effectiveness of in-
In this work we directly incorporate user personality pro-           corporating user reviews into the matrix factorization frame-
files into the task of matrix factorization for predicting user      work. Unfortunately, the information derived from the re-
ratings. Unlike previous work using personality in recom-            views is primarily used to understand items/item categories,
mender systems, we use only the presence of written re-              as opposed to users. Given that it is the users who pro-
views by users. Other work that incorporates text directly           vide the reviews, we believe that there could be important
into the recommendation framework focuses primarily on in-           information about the reviewers lost in these methodolo-
sights into products/categories, potentially disregarding im-        gies. Even if the methodologies were modified slightly to
portant traits about the reviewers themselves. By using the          glean insight into the users themselves, the representations
reviews to determine the users’ personalities directly, we can       learned by these methodologies still require manual inspec-
acquire key insights into understanding a user’s taste. Our          tion to fully understand their meaning. Alternatively, when
ability to create the personality profile is based on a super-       it comes to understanding users, personality can be an im-
vised model trained on the MyPersonality dataset. Leverag-           portant concept to leverage – the intersection of personality
ing a set of linguistics features, we are able to create a predic-   and linguistics dates back decades [8, 33, 14]. Given that
tive model for all Big 5 personality dimensions and apply it         personality is a well-researched topic, it is an interpretable
to the task of predicting personality dimensions for users in        aspect to attempt to derive from written reviews. Further-
a different dataset. We use Kernelized Probabilistic Matrix          more, we believe it can be effective side-information that can
Factorization to integrate the personality profile of the users      be used to produce more accurate predictions.
as side-information. Lastly, we show the empirical effective-           More specifically, we will use the MyPersonality dataset
ness of using the MyPersonality dataset for predicting user          [18] to build a predictive model to attain the Big 5 Per-
ratings. Our results show that combining the personality             sonality traits [13] for reviewers (users). The dataset pro-
model’s raw linguistic features with the predicted personal-         vides status updates from Facebook users along with users’
ity scores provides the best performance. Furthermore, the           personality scores that are based on the users taking sep-
personality scores alone outperform a dimensionality reduc-          arate psychological tests. Thus, the personality scores in
tion of the linguistics features.                                    this dataset are grounded in proven psychological research.
                                                                     We will then take advantage of the Kernelized Probabilistic
                                                                     Matrix Factorization (KPMF) framework to incorporate the
CCS Concepts                                                         personality scores as side-information.
•Human-centered computing → Collaborative filter-                       To further motivate the idea of personality profile as an
ing; Empirical studies in collaborative and social computing;        added signal for user rating prediction, take as an example
Social networks;                                                     the following excerpts from two different movie reviews for
                                                                     the film ‘Inception’. Both of the reviewers rated the movie
                                                                     10 out of 10, but observe how each user begins his/her re-
Keywords                                                             view. One reviewer writes:
Human-Centered Computing; Collaborative Filtering; Rec-
ommender Systems; Social Networks                                         “My sister has been bothering me to see this
                                                                          movie for more than two months, and I am re-
                                                                          ally glad that she did, because this movie was
1.   INTRODUCTION                                                         excellent, E-X-C-E-L-L-E-N-T, EXCELLENT!”

                                                                     Whereas the other reviewer notes:

                                                                          “So far, Christopher Nolan has not disappointed
                                                                          me as a director, and ‘Inception’ is another good
                                                                          one.”

                                                                     While the two users have given the same numerical rating to
EMPIRE 2016, September 16, 2016, Boston, MA, USA.                    the movie, we can obtain deeper insight into the users them-
Copyright held by the author(s).
selves by examining what they wrote. The first reviewer ap-       ommender system. [28] combines topic modeling on plot
pears to be a more casual moviegoer, seeing movies people         summaries with probabilistic matrix factorization to predict
recommend, and finding pleasure in them. The second re-           user ratings for movies. Their paper proposes an expanded
viewer, in contrast, appears to be more of a movie aficionado.    generative process for rating prediction that can incorpo-
The reviewer immediately identifies who the director is, and      rate the models of Correlated Topic Modeling [5] and Latent
indicates that he/she is familiar with the director’s work.       Dirichlet Association [6]. In similar fashion, [35] combines
Such an analysis can indicate that their ratings for other        topic modeling on the text of scientific article with proba-
items could diverge substantially.                                bilistic matrix factorization in the effort of recommending
   The rest of this paper is organized as follows. Section 2      relevant articles/papers to researchers. In an example of a
provides an overview of the related work on matrix factoriza-     non-matrix factorization approach, [29] uses sentiment anal-
tion, as well as at the intersection of recommender systems       ysis on movie reviews for movie recommendations. Here,
and natural language processing (NLP). Section 3 describes        the researchers use a recommendation technique more akin
the KPMF methodology. In Section 4, we explain how the            to nearest-neighbors by defining a similarity measure among
predictive model for the Big 5 personality traits was built, as   users and items based on how users rate items and how items
well as how it is incorporated as the side-information format     are rated. Once the similarity is measured, the researchers
for KPMF. Section 5 describes our experimental design for         use the result of the sentiment analysis to produce their final
predicting user ratings that incorporate personality. Finally,    recommendations. In [10], the authors mine users’ written
in Sectons 6 and 7, we present and discuss our results, as        reviews to understand both generalized and context-specific
well as future research directions based on this work.            user preferences. These two aspects are then combined into
                                                                  a linear regression-based recommendation system. [11] pro-
2.    BACKGROUND                                                  vides a thorough presentation of the intersection between
                                                                  NLP and recommender systems.
   In this section, we will give a brief review of the history
                                                                     In recent years, researchers have established methodolo-
of recommender systems using matrix factorization over the
                                                                  gies that integrate the content of text reviews directly into
course of the past decade, as well as then discuss examples of
                                                                  the matrix factorization framework. In [20, 2], the authors
previous work where NLP methods have been used to create
                                                                  fuse together topic modeling with matrix factorization, al-
recommender systems.
                                                                  lowing models to learn representations of users and items, as
2.1    Matrix Factorization Systems                               well as topical distributions related to items and categories.
                                                                  More recently, in [1], the authors add the modeling of dis-
   The Netflix Challenge that commenced in 2006 marked a
                                                                  tributed language representations to the matrix factoriza-
seminal event in the field of recommender systems. As [3]
                                                                  tion framework. This allows the authors to learn individual
notes, The state-of-the art system that Netflix was using,
                                                                  word representations as well as a general language model for
Cinematch, was based on a nearest-neighbor technique. The
                                                                  the categories in their dataset.
system used an extension of Pearson’s correlation, which the
                                                                     The work that closely resembles ours is that of [25]. In
system produced by analyzing the ratings for each movie.
                                                                  their work, the authors create a personality-based recom-
The system then uses these correlation values to create neigh-
                                                                  mender algorithm for recommending relevant online reviews.
borhoods for the movies. Finally, the system uses these
                                                                  The authors train their personality model on a corpus of
correlations in multi-variate regression to produce the final
                                                                  stream-of-consciousness essays, that include an accompa-
rating prediction.
                                                                  nying personality score for each writer [24]. The authors,
   The team that ultimately took home the million dollar
                                                                  unfortunately, do not detail what accuracy their person-
prize, however, relied on a fundamentally different tech-
                                                                  ality model scores on a supervised cross-validation of the
nique: latent factors via matrix factorization [17]. Rather
                                                                  dataset. Our own efforts to create a classification model
than calculating neighborhoods for items and/or users, ma-
                                                                  from the same data using similar features produced an ac-
trix factorization models users and items as latent vectors.
                                                                  curacy below 60%, which we do not deem accurate enough
Stacking these vectors into two separate matrices, one for
                                                                  for use in further applications. Once the authors predicted
users and one for items, produces the latent matrices that
                                                                  the users’ personalities, they clustered the results together
represent users and items. The models predict ratings sim-
                                                                  in order to provide recommendations for users. While the
ply by taking the dot-product of the latent vectors of the
                                                                  approach is relevant, the authors are unable to test their
user and item for which it is desired, or simply multiplying
                                                                  recommendations against a gold-standard. Furthermore, in
the two matrices to predict all ratings.
                                                                  the effort of generating recommendations, matrix factoriza-
   During the course of the Netflix Challenge, researchers
                                                                  tion has shown to be more accurate than nearest-neighbor
developed probabilistic extensions of standard matrix fac-
                                                                  approaches.
torization [26, 27] that could adapt well to large, sparse
matrices that are generally representative of rating matri-
ces. These models assume a generative process of probabil-        2.3    Recommender Systems with Personality
ity distributions for the latent user/item vectors, as well as      Aside from [25], several other researchers have integrated
the ratings themselves. Our technique for rating prediction       personality profiles into recommender systems. For exam-
follows the methodology of KPMF, detailed by [36]. KPMF           ple, [31] and [22] both use user personality profiles in the
builds upon a probabilistic framework and we will explain         process of generating recommendations. However, the im-
the model in full detail in Section 3.                            portant difference between our work and the work of these
                                                                  researchers is that their methodology requires the explicit
2.2    Recommender Systems and NLP                                completion of personality tests by users. The researchers
  Various researchers have already completed NLP-related          then derive personality scores directly from these tests. Such
tasks in the overall goal of constructing an effective rec-       requirements make it inconceivable to use these systems in
a large-scale, applied nature. Our work is unique in the fact                              KU                             KV
that we derive personality scores purely from an analysis
of the users’ written reviews. We require no further action
from users aside from allowing them to express their opin-
ion through ratings and reviews. Because of this, we contend
that our methodology has the potential for large-scale ap-                                 U:,d                           V:,d
plication.
                                                                                              D                             D


3.     MATRIX FACTORIZATION                                                                             Rn,m
   As we have previously mentioned, we use the technique of
KPMF to incorporate the information that we generate by                                                         A
analyzing a given user’s written reviews. What we generate
from the analysis is a personality profile for a given user.
We conjecture that by including this information of user                                                   σ2
personality in our model, we can ultimately produce more
accurate movie ratings. We acknowledge that the choice                Figure 1: The generative process for KPMF.
of KPMF to incorporate side-information into the matrix
factorization framework is somewhat arbitrary, and the work
of [7, 15] could potentially be used instead.                     the observed entries is:
                                                                                          N M
3.1      KPMF                                                     p(R|U,V,σ 2 ) =                   [N (Rn,m |Un,: VTm,: , σ 2 )]δn,m
                                                                                          Q Q
                                                                                                                                        (1)
                                                                                          n=1 m=1
   For the purpose of this paper we will explain the specifics
of KPMF. To understand probabilistic matrix factorization         Where the prior probabilities over U and V are:
in general and how KPMF is unique in this area, we encour-                          D
age the reader to refer to the previously cited papers. In
                                                                                    Q
                                                                  p(U |KU ) =             GP(U:,d |0, KU )                              (2)
KPMF, we assume that the dimensions for the latent vec-                             d=1
                                                                                     D
tors representing items and users are drawn from a Gaussian                          Q
                                                                  p(V |KV ) =             GP(V:,d |0, KV )                              (3)
Process (GP). Although in this GP we assume a zero mean                             d=1
function, it is the formulation of the covariance function that
                                                                  Combining (1) with (2) and (3), the log-posterior over U
allows us to integrate side-information into our model. This
                                                                  and V becomes:
covariance function – or covariance matrix in our application
– dictates a ‘similarity’ across the the users and/or items.
                                                                  log p(U,V |R,σ 2 , KU , KV )
Our notation will follow the notation the original authors
provided. Here is the notation we will use:                                     N P
                                                                                  M
                                                                  = − 2σ1 2               δn,m (Rn,m − Un,: VTm,: )2
                                                                                P
                                                                               n=1 m=1
R — N × M data matrix
                                                                         D                          D
U — N × D latent matrix for rows of R                             − 21
                                                                         P
                                                                               UT:,d SU U:,d − 12
                                                                                                    P
                                                                                                          VT:,d SV V:,d
V — M × D latent matrix for columns of R                                 d=1                        d=1
KU — N × N covariance matrix for rows                             − Alogσ 2 − D (log|KU | + log|KU |) + C                               (4)
                                                                              2
KV — M × M covariance matrix for columns
SU — N × N inverse of KU                                          Where |K| is the determinant of K and C is a constant that
SV — M × M inverse of KV                                          does not depend on U and V.
A — number of non-missing entries in R
δn,m — indicator variable for rating Rn,m                         3.2      Learning KPMF
                                                                    To learn the matrices U and V we can apply a MAP esti-
The generative process for KPMF is as follows (refer to Fig-      mate to (4). The result is optimizing the following objective
ure 1 for plate diagram):                                         function:
                                                                                N P
                                                                                  M
     1. Generate U:,d ∼ GP (0,KU ) for d ∈ {1,...,D}              E = 2σ1 2                δn,m (Rn,m − Un,: VTm,: )2
                                                                                P
                                                                                n=1 m=1
                                                                         D                           D
                                                                  + 21         UT:,d SU U:,d + 12         VT:,d SV V:,d
                                                                         P                           P
                                                                                                                                        (5)
     2. Generate V:,d ∼ GP (0,KV ) for d ∈ {1,...,D}                     d=1                        d=1

                                                                  [36] provides implementations of both gradient descent and
                                                                  stochastic gradient descent to minimize E. For our experi-
     3. For each non-missing entry Rn,m , generate Rn,m ∼         ments we used regular gradient descent, as gradient descent
        N (Un,: VTm,: ,σ), where σ is constant                    achieved the highest accuracy in the original work and our
                                                                  rating matrix is a manageable size. We will note that in
                                                                  the authors’ work, the accuracy of stochastic gradient de-
     The likelihood of the data matrix R given U and V over       scent was less than that of regular gradient descent by only
a small margin and its speed was hundreds of times faster.        [12] and [19] have similar approaches: using a general textual
   The partial derivatives for our objective function are the     analysis combined with social network attributes to create
following:                                                        features for their predictive models. However, Markoviki et
                                                                  al. report a higher precision/recall for their model, so we
                  M
 ∂E
        = − σ12
                  P
                        (Rn,m − Un,: VTm,: )Vm,d                  will use their approach to feature selection as the guide for
∂Un,d
                  m=1                                             our model for personality prediction.
         + 12 eT(n) SU U:,d                                (6)
                  N
 ∂E
        = − σ12       (Rn,m − Un,: VTm,: )Un,d
                  P
∂Vm,d
                  n=1

         + 12 eT(m) SV V:,d                                (7)
                                                                  4.2    Personality Model
where e(n) represents an N - dimensional vector of all zeros         In their paper, Markoviki et al. detail a fined-grained fea-
                                                                  ture selection for each personality trait, including social net-
except for the nth index, which is one.
                                                                  work features. Since, for our recommendation experiment,
  The update equations for U and V are as follows:
                                                                  we will not have social network information, we do not in-
                                                                  clude these features in our model. While most authors who
Ut+1    t         ∂E
 n,d = Un,d − η( ∂Un,d )                                   (8)
                                                                  used the MyPersonality data sought to create a classification
                                                                  model for personality prediction, we will predict personal-
Vt+1    t         ∂E
 m,d = Vm,d − η( ∂Vm,d )                                   (9)
                                                                  ity score. We believe having a continuous output from our
                                                                  model will make for a better translation into user covari-
where η is the learning rate of the algorithm.
                                                                  ance. Based on an analysis of correlation between features
  This completes our detailing of KPMF. In the next section
                                                                  and personality traits in Markoviki et al., we use the follow-
we describe our approach for creating the covariance matrix
                                                                  ing features in our personality model (and we encourage a
for the users, KU .
                                                                  review of the original work for a thorough discussion of the
                                                                  effectiveness of these features):
4.      CREATING PERSONALITY PROFILES
   Since we are using KPMF as our recommendation model,           Punctuation Count: We count the frequency of the fol-
any vector representation of the written reviews (for a given     lowing punctuation marks in a user’s status updates: . ? !
user, across all users) would suffice to create KU . However,     - , <> / ; : [ ] { } ( ) & ’ ” ?
it is best to generate covariance across a numeric represen-
tation that we can interpret. Since personality scores have a     POS Count: We count the frequency of verbs and adjec-
long history of analysis, which we will detail in this section,   tives appearing in a user’s status updates. We used the POS
personality profiles are an optimal representation for KU .       tagger available in NLTK [4].
In this section we cover two topics: first, how we create the
personality profile for a given user. Second, how we use this     Affin Count: We count the frequency of words appearing
personality profile to generate the user covariance matrix.       in a user’s status updates that have an emotional valence
                                                                  score between -5 and 5 [21].
4.1     MyPersonality
   In 2013, [9] held a workshop on computational personal-        ”To” Count: We count the number of times the word ”to”
ity recognition. For this workshop, the organizers released       appears in a user’s status updates.
a subset of the data collected by the MyPersonality project
[18]. The dataset for the workshop consists of the Face-          General Inquirer Tags: We process the text using the
book activity for 250 users, roughly 10,000 status updates        General Inquirer (GI) tool [30]. This tool has 182 categories
from all users. Along with the status updates, the dataset        for tagging words in a text. We use the frequency of these
includes information about the users’ social networks. For        tags for our feature set.
each user, the dataset includes a personality score as well as
a binary classification as to whether the user exhibits a given   While Markovikij et al. produced their best results when
personality trait. The personality scores/classifications for     using a different subset of the GI tags for each personality
each user have five dimensions, one for each trait in the Big 5   trait, as well as Affin words only of a particular score, we did
personality model. The five traits in the model are openness,     not find that this fine-grained breakdown produced the best
conscientiousness, extraversion, agreeableness, and neuroti-      results for our own experiments. Instead, we use the same
cism. Analysis of lexicon and personality has a long-standing     feature space for all the personality traits, which included all
tradition [8, 33, 13], and it is [14] who brought the current     GI tags and all words with any recorded Affin score. Lastly,
model to prominence.                                              all count features are normalized by the total word count
   The approaches to the dataset in the workshop are varied.      (for a given user), and punctuation count is normalized by
[32] focus on predicting a single personality trait, conscien-    the total character count.
tiousness. The authors exploit an analysis of event-based            The personality scores are in a continuous range from 1
verbs in the status updates to produce features for their         to 5 for users in the MyPersonality dataset . Thus, linear
model. [34] create an ensemble model for predicting person-       regression is a natural choice to train our model. We use
ality traits. In their base model, the authors use most fre-      the Ridge Regression algorithm available from scikit-learn
quent trigrams as features. The authors then use the predic-      [23]. Ridge Regression implements standard linear regres-
tion of the baseline model to generate their final predictions.   sion with a regularization parameter. The optimization task
is:                                                               puted cosine similarities, across all possible user pairs:

min kXw − yk2 2 + αkwk2 2                                 (10)    α = min CS(pi , pj )                                         (13)
 w                                                                     i,j
                                                                  β = max CS(pi , pj )                                         (14)
Where w is the weight vector, X is the data matrix, y is the           i,j
vector of scores and α is the regularization parameter. The
                                                                  γ controls the ceiling of the normalization: KUi,j ∈ [0, γ].
algorithm in scikit-learn performs automatic cross-validation
                                                                  We set γ = 0.4. To compute cosine similarity we use the co-
on the regularization parameter by allowing us to define a
                                                                  sine similarity method provided in scikit-learn. Note β will
list of α’s for the input. While the feature space for each
                                                                  always be 1, as CS(pi , pi ) = 1.
personality trait is the same, we train a different model for
                                                                     This, however, is not the final covariance matrix we will
each trait. To be clear, we are not testing the personality of
                                                                  use in our recommender system. Since all the personality
a single status update, but rather of a given user, which is
                                                                  scores are in the range [1,5], the cosine similarity between
the amalgamation of his/her status updates.
                                                                  personality vectors pi and pj is very close to one. To ac-
   To test the utility of our models, we divide the set of
                                                                  centuate the differences in personality profile, we create a
Facebook users into a 80%/20% training/test split. Also,
                                                                  regularized covariance matrix, KU , as follows:
we normalize the matrices we use in our models by, for each
feature dimension, subtracting the mean and dividing by the
                                                                  KU = KU n                                                    (15)
standard deviation. We randomly shuffle the set of users and
record the root-mean-square error (RMSE) of the resulting
                                                                  Where n is a hyperparameter we hand-tune. The proper
trained model on the held-out test set. That is, given a
                                                                  value of n can greatly influence the accuracy of the model.
predicted personality score for user i, yˆi , and the true per-
                                                                  We take KU as the covariance matrix in our experiment
sonality score yi , we calculate the RMSE of all users in the
                                                                  when we use personality profiles to produce the user covari-
test set. Table 1 shows the accuracy of our model averaged
                                                                  ance matrix, but we still refer to it as KU to avoid confusion.
across 5 different times shuffling the dataset. This model
is compared to a baseline, which is the average user rat-
ing for personality scores in the training set. When creating     5.   EXPERIMENTAL DESIGN
the models that we will apply to predicting personality traits       Our goal is to integrate the information contained in the
from movie reviews, we included all the Facebook users when       reviews written by a user into a recommender system, and in
training the models.                                              particular, investigate whether user personality, as reflected
                                                                  in the text generated by that user, would allow us to improve
                                                                  the accuracy of predicted ratings. We crawled IMDB to
                                                                  collect a dataset of scores and written reviews for multiple
          Personality Trait     Model      Baseline
                                                                  IMDB users. Our dataset consists of 2,087 users and 3,500
          Extraversion           0.785        0.833
                                                                  movies. Each user has rated/reviewed as little as 4 movies
          Neuroticism            0.738        0.767
                                                                  and as many as 210, with 54 being the average number of
          Agreeableness          0.635        0.661
                                                                  ratings/reviews for the users. The total rating matrix is
          Conscientiousness      0.767        0.799
                                                                  1.55% dense, which reflects the typical sparsity of this type
          Openness               0.529        0.563
                                                                  of dataset [16].
                                                                     We randomly split the ratings by each user into training,
Table 1: RMSE for personality model trained on                    evaluation, and test sets, each comprising 3/5, 1/5 and 1/5
Facebook statuses, as well as baseline model.                     of the data, respectively. We randomly shuffle the full set
                                                                  of ratings to produce five different training/evaluation/test
                                                                  splits, and report the results averaged over five runs. We use
                                                                  the ratings from these sets to create the appropriate matrices
                                                                  in our methodology. The training matrix is equivalent to R
                                                                  in our notation.
4.3    User Covariance Matrix                                        In all the experiments, we use a diagonal item covariance
                                                                  matrix, KV . Thus, in our model, we are not assuming any
  Once we have trained the personality models on the Face-
                                                                  covariance across items. Following the results of Zhou et al.
book data we apply it to the movie reviews written by a
                                                                  we let D = 10 and σ = 0.4. We use gradient descent to
given user to determine his/her personality profile. We pre-
                                                                  learn the latent matrices U and V . We use the proportional
process the movie reviews just as we did for the Facebook
                                                                  change in RMSE on our evaluation matrix as the stopping
data to create the same feature space. The result is a 5-
                                                                  criteria for gradient descent. Once the algorithm converges,
dimensional vector, which we will denote pi , for user i. For
                                                                  we calculate the RMSE on our test matrix. When calculat-
users i and j, we calculate entry i, j of KU as follows:
                                                                  ing RMSE, we only do so for non-zero entries, i.e. δn,m =
          CS(pi ,pj )−α                                           1.
KUi,j =      β−α
                        ∗γ                                (11)

Where CS(x, y) denotes the cosine similarity between vec-         6.   RESULTS
tors x, y, calculated as follows:                                   For each run, we train five different models and calculate
                 T
                                                                  their RMSE on a held-out test set: (1) KPMF with KU cal-
            xy
CS(x, y) = kxkkyk                                         (12)    culated according to user personality profile, (2) KPMF with
                                                                  KU calculated using a user’s text-generated feature space for
α and β are minimum and maximum values from our com-              (10) as our p vector in equation (11), (3) KPMF with KU
as a diagonal matrix (no similarity across users), (4) ma-        model where each personality trait should be weighted dif-
trix factorization (MF) without trying to optimize U and V        ferently. For example, similarity in user conscientiousness
according to an objective function, and (5) KPMF with a           might be more important than similarity in user agreeable-
PCA-reduction of the text-based feature space as p. Aside         ness when determining overall similarity in user preference.
from providing a tangible vector representation of user re-       We can create a new variable Q, a 5-by-5 diagonal matrix
views, the Big 5 personality model also acts as a guided          where each entry Qi,i is the weight for a given personality
dimensionality reduction of the textual feature space we use      trait. If we stack the personality vectors to form a M × 5
to generate personality scores. Therefore, we have compared       matrix P , the covariance matrix KU becomes:
the 5-dimensional output of our personality model to the re-
sult of using PCA to compute a reduction of the text-based        KU = P QP T                                            (16)
feature space to 5 dimensions. We used the PCA implemen-
tation from scikit-learn. The RMSE values averaged over           We can learn the diagonal entries of Q along with U and V
five runs for each model are shown in Table 2. For the pur-       in our model. The final values of Q would provide a novel
poses of RMSE calculation, the rating values in our data,         outcome as to how important each personality trait is for
which were originally 1-10, have been normalized to fall in       predicting movie ratings. We leave this approach for future
the interval [0.1, 1].                                            work.

     Model                                        RMSE            8.   REFERENCES
     KPMF with Personality                         0.2006          [1] A. Almahairi, K. Kastner, K. Cho, and A. Courville.
     KPMF with Personality Model Features          0.1980              Learning distributed representations from reviews for
     KPMF Personality and Model Features          0.1901               collaborative filtering. In Proceedings of the 9th ACM
     KPMF with Diagonal Matrix                     0.2122              Conference on Recommender Systems, pages 147–154.
     KPMF with PCA Feature Reduction               0.2087              ACM, 2015.
     MF                                            0.2262          [2] Y. Bao, H. Fang, and J. Zhang. Topicmf:
                                                                       Simultaneously exploiting ratings and reviews for
                                                                       recommendation. In AAAI, pages 2–8, 2014.
      Table 2: RMSE predicting user ratings.
                                                                   [3] J. Bennett and S. Lanning. The netflix prize. In
                                                                       Proceedings of KDD cup and workshop, volume 2007,
                                                                       page 35, 2007.
7.   DISCUSSION                                                    [4] S. Bird, E. Klein, and E. Loper. Natural language
   As we expected, the KPMF models performed better than               processing with Python. O’Reilly Media, Inc., 2009.
the non-optimized MF model, lowering the RMSE by 16.0%,            [5] D. Blei and J. Lafferty. Correlated topic models.
12.5%, 11.3%, 7.7% and 6.2% respectively. Comparing the                Advances in neural information processing systems,
KPMF models together, the personality model improves                   18:147, 2006.
upon the diagonal model by 5.5%, however we see that a             [6] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent
more accurate model is achieved by applying the textual                dirichlet allocation. the Journal of machine Learning
personality features directly, and the most effective model            research, 3:993–1022, 2003.
uses a combination of the textual features and the predicted       [7] G. Bouchard, D. Yin, and S. Guo. Convex collective
personality scores. It is important to note the percent differ-        matrix factorization. In AISTATS, volume 13, pages
ence along with RMSE, especially when the baseline metric
                                                                       144–152, 2013.
performs well. When comparing the two models of ’dimen-
                                                                   [8] R. B. Cattell. Personality and motivation structure
sionality reduction’, the personality model performs better
                                                                       and measurement. 1957.
than the PCA-model. This would dictate that the personal-
ity scores do capture a stronger signal of user similarity, as     [9] F. Celli, F. Pianesi, D. Stillwell, and M. Kosinski.
opposed to an arbitrary reduction of the raw text features.            Workshop on computational personality recognition:
The the personality scores on their own do not perform as              Shared task. In Seventh International AAAI
well as the raw textual features. We will discuss shortly a            Conference on Weblogs and Social Media, 2013.
major added benefit for using personality scores, aside from      [10] G. Chen and L. Chen. Augmenting service
testing accuracy.                                                      recommender systems by incorporating contextual
   One immediate question that arises is whether a more ac-            opinions from user reviews. User Modeling and
curate personality predictive model actually does correlate            User-Adapted Interaction, 25(3):295–329, 2015.
to a more accurate KPMF model when using the personality          [11] L. Chen, G. Chen, and F. Wang. Recommender
profile. While our personality predictive model scores rea-            systems based on user reviews: the state of the art.
sonably well, it is inconsistent across the personality traits.        User Modeling and User-Adapted Interaction,
Future work can have a renewed focus on the MyPersonality              25(2):99–154, 2015.
data now that the recommendation framework has a solid            [12] G. Farnadi, S. Zoghbi, M.-F. Moens, and M. De Cock.
foundation. Furthermore, as we have previously stated, rep-            Recognising personality traits using facebook status
resenting users as personality profiles provides a gateway             updates. In Proceedings of the workshop on
to a number of interesting analyses relating personality to            computational personality recognition (WCPR13) at
product recommendation. For example, in our current rec-               the 7th international AAAI conference on weblogs and
ommendation model, each personality trait is given equal               social media (ICWSM13), 2013.
weight when we use the personality model to generate the          [13] L. R. Goldberg. Language and individual differences:
covariance matrix. However, it is interesting to imagine a             The search for universals in personality lexicons.
     Review of personality and social psychology,                   Multi-disciplinary Trends in Artificial Intelligence,
     2(1):141–165, 1981.                                            pages 38–50. Springer, 2011.
[14] L. R. Goldberg. The development of markers for the        [30] P. J. Stone, D. C. Dunphy, and M. S. Smith. The
     big-five factor structure. Psychological assessment,           general inquirer: A computer approach to content
     4(1):26, 1992.                                                 analysis. 1966.
[15] S. Gunasekar, M. Yamada, D. Yin, and Y. Chang.            [31] M. Tkalcic, M. Kunaver, J. Tasic, and A. Košir.
     Consistent collective matrix completion under joint            Personality based user similarity measure for a
     low rank structure. In AISTATS, 2015.                          collaborative recommender system. In Proceedings of
[16] Y. Koren. Factorization meets the neighborhood: a              the 5th Workshop on Emotion in Human-Computer
     multifaceted collaborative filtering model. In                 Interaction-Real world challenges, pages 30–37, 2009.
     Proceedings of the 14th ACM SIGKDD international          [32] M. T. Tomlinson, D. Hinote, and D. B. Bracewell.
     conference on Knowledge discovery and data mining,             Predicting conscientiousness through semantic analysis
     pages 426–434. ACM, 2008.                                      of facebook posts. Proceedings of WCPR, 2013.
[17] Y. Koren, R. Bell, and C. Volinsky. Matrix                [33] E. C. Tupes and R. E. Christal. Recurrent personality
     factorization techniques for recommender systems.              factors based on trait ratings. Technical report, DTIC
     Computer, 42(8):30–37, 2009.                                   Document, 1961.
[18] M. Kosinski, D. Stillwell, and T. Graepel. Private        [34] B. Verhoeven, W. Daelemans, and T. De Smedt.
     traits and attributes are predictable from digital             Ensemble methods for personality recognition. In Proc
     records of human behavior. Proceedings of the National         of Workshop on Computational Personality
     Academy of Sciences, 110(15):5802–5805, 2013.                  Recognition, AAAI Press, Melon Park, CA, pages
[19] D. Markovikj, S. Gievska, M. Kosinski, and                     35–38, 2013.
     D. Stillwell. Mining facebook data for predictive         [35] C. Wang and D. M. Blei. Collaborative topic modeling
     personality modeling. In Proceedings of the 7th                for recommending scientific articles. In Proceedings of
     international AAAI conference on Weblogs and Social            the 17th ACM SIGKDD international conference on
     Media (ICWSM 2013), 2013.                                      Knowledge discovery and data mining, pages 448–456.
[20] J. McAuley and J. Leskovec. Hidden factors and                 ACM, 2011.
     hidden topics: understanding rating dimensions with       [36] T. Zhou, H. Shan, A. Banerjee, and G. Sapiro.
     review text. In Proceedings of the 7th ACM conference          Kernelized probabilistic matrix factorization:
     on Recommender systems, pages 165–172. ACM, 2013.              Exploiting graphs and side information. In SDM,
[21] F. Å. Nielsen. A new anew: Evaluation of a word list          volume 12, pages 403–414. SIAM, 2012.
     for sentiment analysis in microblogs. arXiv preprint
     arXiv:1103.2903, 2011.
[22] M. A. S. N. Nunes. Recommender systems based on
     personality traits. PhD thesis, Université Montpellier
     II-Sciences et Techniques du Languedoc, 2008.
[23] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
     B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
     R. Weiss, V. Dubourg, et al. Scikit-learn: Machine
     learning in python. The Journal of Machine Learning
     Research, 12:2825–2830, 2011.
[24] J. W. Pennebaker and L. A. King. Linguistic styles:
     language use as an individual difference. Journal of
     personality and social psychology, 77(6):1296, 1999.
[25] A. Roshchina, J. Cardiff, and P. Rosso. A comparative
     evaluation of personality estimation algorithms for the
     twin recommender system. In Proceedings of the 3rd
     international workshop on Search and mining
     user-generated contents, pages 11–18. ACM, 2011.
[26] R. Salakhutdinov and A. Mnih. Probabilistic matrix
     factorization. In NIPS, volume 1, pages 2–1, 2007.
[27] R. Salakhutdinov and A. Mnih. Bayesian probabilistic
     matrix factorization using markov chain monte carlo.
     In Proceedings of the 25th international conference on
     Machine learning, pages 880–887. ACM, 2008.
[28] H. Shan and A. Banerjee. Generalized probabilistic
     matrix factorizations for collaborative filtering. In
     Data Mining (ICDM), 2010 IEEE 10th International
     Conference on, pages 1025–1030. IEEE, 2010.
[29] V. K. Singh, M. Mukherjee, and G. K. Mehta.
     Combining collaborative filtering and sentiment
     classification for improved movie recommendations. In