PReFacTO: Preference Relations Based Factor Model with Topic
                     Awareness and Offset
                                Priyanka Choudhary∗                                                               Maunendra Sankar Desarkar
                   Indian Institute of Technology Hyderabad                                                  Indian Institute of Technology Hyderabad
                         Hyderabad, Telangana, India                                                               Hyderabad, Telangana, India
                           cs16mtech11011@iith.ac.in                                                                   maunendra@iith.ac.in
 ABSTRACT                                                                                             of rating data containing ratings given to the items by the users,
 Recommendation systems create personalized list of items that                                        Recommendation Systems try to predict the ratings of the items
 might interest the user by analyzing the user’s history of past pur-                                 that are not yet rated/viewed by the user. Based on these predicted
 chases and/or consumption. For rating based systems, most of the                                     rating values, ranked list of the items that can be of user’s interest
 traditional methods for recommendation focus on the absolute rat-                                    are recommended to the users. Latent factor models [1, 8, 13] have
 ings provided by the users to the items. In this paper, we extend the                                been extensively used in the past for this purpose.
 traditional Matrix Factorization approach for recommendation and                                        There are lot of recommendation systems where the user feed-
 propose pairwise relation based factor modeling. While modeling                                      back come in the form of ratings. Majority of such recommendation
 the items in the system, the use of pairwise preferences allow in-                                   systems use these absolute ratings entered by the users for model-
 formation flow between the items through the preference relations                                    ing the users and items according to latent factor modeling, and use
 as an additional information. Item feedbacks are available in the                                    those models for recommendation. Latent factor models like Matrix
 form of reviews apart from the rating information. The reviews                                       Factorization [8] are commonly used to transform or represent the
 have textual information that can be really helpful to represent                                     users and the items to latent feature spaces. These representations
 the item’s latent feature vector appropriately. We perform topic                                     are helpful for explaining the observed ratings and predicting the
 modeling of the item reviews and use the topic vectors to guide the                                  unknown ratings. These latent factors, e.g. in case of movie rec-
 joint factor modeling of the users and items and learn their final                                   ommendations, can be genres, actors or directors or something
 representations. The proposed method shows promising results in                                      un-interpretable. These factors try to explain the aspects behind
 comparison to the state-of-the-art methods in our experiments.                                       the liking of the items by a particular user. The items are modeled in
                                                                                                      a similar fashion by representation of the hidden factors possessed
 KEYWORDS                                                                                             by them. This representation predicts the rating by possession of
                                                                                                      these factors in an item and affinity of users towards these hidden
 Recommendation System, Pairwise Preferences, Topic Modeling,
                                                                                                      factors.
 Latent Factor Models
                                                                                                         User feedback in the form of reviews along with the ratings is also
ACM Reference Format:                                                                                 available for many online systems like Amazon, IMDb, TripAdvisor
Priyanka Choudhary and Maunendra Sankar Desarkar. 2018. PReFacTO:                                     etc. The review information can be really useful as it contains the
Preference Relations Based Factor Model with Topic Awareness and Off-                                 users’ perception about the items. There can be systems where
set. In Proceedings of ACM SIGIR Workshop on eCommerce (SIGIR 2018
                                                                                                      the item description is also available. There are algorithms [14]
eCom). ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/nnnnnnn.
nnnnnnn
                                                                                                      which consider the item description as additional input for latent
                                                                                                      factor modeling. However, the descriptions are often entered by
                                                                                                      the item producers or sellers. On the other hand, the feedback in
 1     INTRODUCTION
                                                                                                      the form of reviews given by the user generally conveys these
 Users have access to large variety of items available online for pur-                                factors that are being liked or disliked in an item. An attempt to
 chase, subscription, consumption etc. Such a huge list of options                                    include these textual information can be helpful for better modeling,
 often result in choice overload, where it becomes difficult to browse                                interpretation and visualization of the hidden dimensions [11].
 through and/or select the items of interest. Recommendation Sys-                                        An alternate form of recommendation system can be based on
 tems (RS) make this task of selecting appropriate items easier by                                    pairwise preferences of the user among the items [3, 4, 7]. Given a
 finding and suggesting subset of the items that might be of interest                                 pair of items (i, j), user u may give feedback regarding which of the
 to the user. Many traditional recommendation techniques use only                                     item he prefers over the other. Such type of feedback is referred to as
 ratings to assess the users’ taste and behavior. Given a small subset                                pairwise preference or pairwise preference based feedback. A survey
 ∗ This is the corresponding author.                                                                  in [6] shows that users do prefer comparisons through pairwise
                                                                                                      scores rather than providing absolute ratings. Although there is
 Permission to make digital or hard copies of part or all of this work for personal or
Copyright © 2018 by the paper’s authors. Copying permitted for private and academic purposes.         no available dataset where the pairwise preferences were directly
In: J. Degenhardt,
 classroom     use is G.  Di Fabbrizio,
                       granted  withoutS.fee
                                          Kallumadi,
                                             providedM.that
                                                         Kumar,  Y.-C.
                                                            copies  areLin,
                                                                         notA.made
                                                                               Trotman, H. Zhao
                                                                                    or distributed
                                                                                                      captured, many approaches in literature have induced pairwise
(eds.): Proceedings
 for profit           of the SIGIR
             or commercial         2018 eCom
                               advantage  andworkshop,  12 July,
                                               that copies bear2018,   Ann Arbor,
                                                                 this notice  andMichigan,  USA,
                                                                                  the full citation
published  at http://ceur-ws.org
 on the first   page. Copyrights for third-party components of this work must be honored.             preferences from absolute ratings [3, 7, 10] and used those relations
For all other uses, contact the owner/author(s).                                                      for developing algorithms for recommendation.
SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA
                                                                                                         The existing methods from the literature that are based on pair-
© 2018 Copyright held by the owner/author(s).
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM.                                                                     wise preferences do not consider the item content information in
https://doi.org/10.1145/nnnnnnn.nnnnnnn
SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA                                  Priyanka Choudhary and Maunendra Sankar Desarkar


the modeling process. In this paper, we propose approaches that             to improve the rating predictions and provide feature discovery.
combine the pairwise feedback with the additional review data               Different users give different weights to these features. For e.g., a
available. We propose an algorithm to use Latent Factor modeling            user who loves horror movies and hates romantic genre will have
using the pairwise preferences to discover the latent dimensions,           high weightage to "Annabelle" movie than the "The Notebook" in
map users and items to joint latent feature vector space and produce        contrary to a romantic movie lover. This weightage will affect the
recommendations for the end user. The latent feature vector space           overall scores and explain the rating difference.
for the items are derived through topic modeling. In this approach,            Recently researchers have shown keen interest in pairwise pref-
we construct a proxy document for each item by considering the              erences based recommendation techniques. In [2] suitable graphical
reviews that it has got. If available, the descriptions of the items also   interface has been provided to the user to mark his choices over
can be used to populate this document. We performed probabilistic           the pair of items. In [7] the pairwise preferences are induced from
topic modeling on these documents representing items using Latent           the available rating values of the items. Both implicit [12] and ex-
Dirichlet Allocation (LDA). These topics are then used to guide the         plicit feedback can be modeled using the pairwise preferences based
factorization process for learning the latent representations of the        latent factor models. In [3], the users motivate the use of prefer-
users. We propose two different approaches for this purpose. One            ence relations or relative feedback for recommendation systems.
in which the LDA topic vectors for the items are directly used as           Pairwise preferences have been used in [3, 4, 7, 10] in matrix fac-
the latent representations of the items, and another where these            torization and nearest neighbor of latent factor modeling settings
LDA representations are used to initialize the item vectors in the          to generate recommendations. However, in none of these works,
factorization process. For the second approach, the item-latent off-        the user reviews are taken into account.
set is introduced alongside the LDA representations. The offset
is learned throughout the factorization process and tries to cap-
ture the deviations from the LDA representations of the items. We           3   METHODOLOGY
call our approach as Preference Relations Based Factor Model with           In this section, we present our proposed recommendation methods
Topic Awareness and Offset or PReFacTO in short. Experimental               that work with pairwise preference information from the user.
evaluation and analysis performed on a benchmark dataset helps              Apart from the pairwise feedback, we also consider the reviews that
to understand the strengths of the pairwise methods and their               are provided by the users for different items. The methods represent
ability to generate efficient recommendations. We summarize the             each user and item in a shared latent feature space, through factor
contribution of our work below:                                             modeling approach. Before discussing our proposed methods in
    • We use relative preferences over item pairs in a factor mod-          detail, we briefly describe the concepts of pairwise preferences and
      eling framework for modeling users and items. The models              also about the way in which we handle the textual reviews available
      are then used for generating recommendations.                         for the items.
    • We incorporate item reviews in the factorization process.                Pairwise Preferences: The ratings in recommendation systems
    • Detailed experimental evaluation is performed on a bench-             are generally absolute in nature, often in the range of 1-5 or 1-10.
      mark dataset. Analysis of the results are performed to un-            However, users have different behavior while rating the items. The
      derstand the advantages and shortcomings of the methods.              same rating value entered by two different users might be due
   The rest of the paper is organized as follows. After discussing          to two different satisfaction levels. Moreover, the absolute rating
the related work, we present the proposed methods in Section 3. We          entered by a user to an item may change over time, if the same user
briefly talk about pairwise preferences and handling textual reviews        is asked to rate the same item again. Motivated by observations
and then provide detailed description about the methods being               like this, pairwise preferences are introduced in modeling users
proposed in this paper. In Section 4, we define the four evaluation         and items in recommendation systems [3]. Pairwise relation based
metrics used to measure the performance of the proposed methods             approaches try to capture the relative preference between the items.
with the baseline methods. We provide the detailed discussion and           Such feedback, if directly obtained, removes the user bias that may
analysis of the results obtained. The conclusion and the future work        correspond to the leniency or strictness of the users while assigning
of this paper has been summarized in Section 5.                             the absolute ratings.
                                                                               Although pairwise preference relations can address some of the
2    RELATED WORK                                                           problems with absolute ratings mentioned above, there is no dataset
                                                                            (publicly available) with directly obtained pairwise preferences.
Traditional recommendation systems have extensively used latent
                                                                            In absence of such data, we consider in our work, datasets with
factor based modeling techniques. Many researches have been done
                                                                            absolute ratings as user feedback, and induce relative ratings from
that employ the use of Matrix Factorization(MF) [8, 9] techniques
                                                                            those absolute ratings. We then consider those relative pairwise
for the prediction of unknown rating values of items not seen by the
                                                                            preferences as input to the proposed methods.
user and providing recommendations by selecting top-N items. This
                                                                               Handling Textual Reviews - Topic modeling: If the item de-
basic MF model which corresponds to the pointwise method used
                                                                            scriptions are available, then the system can identify more about the
in this paper. It acts as a baseline model to compare the proposed
                                                                            attributes or aspects that the items possess. This information can
methods presented in this paper. The works of [11, 14] have included
                                                                            be useful in making the recommendations. In fact, content-based
the content based modeling to interpret the textual labels for the
                                                                            recommendation algorithms try to exploit these item attributes for
rating dimensions. This justifies the reasons how the user assess
                                                                            generating the recommendations.
the products. Similar kind of work has been done in [5]. It tries
PReFacTO: Preference Relations Based Factor Model with Topic Awareness and Offset
                                                                            SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA


   Several systems allow the users to enter reviews for the items.
Item reviews are very useful in making view/purchase decisions
as they often contain reasons or explanations regarding why the
item was liked or disliked by the user who wrote the review. The
reviews often describe some additional details about the items, for
example the aspects that they possess. An example review for a
product from Amazon is given below.

      It seems like just about everybody has made a
      Christmas Carol movie. This one is the best by
      far! It seems more realistic than all the others
      and the time period seems to be perfect. The
      acting is also far better than any of the others
      I’ve seen; my opinion.
                                                                                          Figure 1: Graph showing pairwise relation between the
   We hypothesize that even if item descriptions are not available,                       items as a function of sigmoid.
then also, the reviews reveal a great deal of information about the
different attributes (specified or latent) that might be contained
in the items1 . These attributes can then be useful in modeling the                       was adopted in [4] for inducing pairwise preferences from absolute
items, and can further aid in generating efficient recommendations.                       ratings.
   Based on this assumption, we use the reviews given by the users                           We take a different approach for converting the absolute rating
to different items as an additional source of information for learning                    to relative preferences. If the ratings given by user u to the two
the item representations. We use Latent Dirichlet Allocation (LDA)                        items i and j are rui and ru j respectively, then we define the (actual
topic modeling technique to learn the topic representation of the                         or ground truth) preference strength for the triplet (u, i, j) as
items. LDA is an unsupervised method, which, given a document                                                                 exp (rui )
collection, identifies a fixed-number (say k, input to the algorithm)                                            rui j =
                                                                                                                        exp (rui ) + exp (ru j )
of latent topics present in the collection. Each document can then                                                                                             (1)
                                                                                                                                    1
be represented as a k-dimensional vector in that topic space. LDA                                                     =
                                                                                                                        1 + exp (−(rui − ru j ))
works on documents, so we need to represent each item as a docu-
ment. For that purpose, we combine all the reviews assigned to an                            The value of rui j thus obtained can capture the strength of the
item to create a proxy document for that item. If dui represents a                        preference relation as well. If the difference between rui and ru j
review given by a user u for an item i, then we denote the proxy                          becomes larger, then the strength of this relation becomes stronger
document (di ) for the item i as the concatenation of all the reviews                     as shown in Figure 1.
given by the set of users U for i. Then, we can have a document                              We model the prediction of the unobserved rui j ’s as:
collection d corresponding to the set of items I as d = ∪i (di ) where
i = 1, · · · , |I |.                                                                                           exp (pu (qi − q j ) + (bi − b j ))
                                                                                                      rˆui j =
                                                                                                             1 + exp (pu (qi − q j ) + (bi − b j ))
3.1     Preference Relation based Factor modeling                                                                                                              (2)
                                                                                                                                 1
                                                                                                           =
        (Pairwise)                                                                                           1 + exp (−(pu (qi − q j ) + (bi − b j ))))
Between the pair of items (i, j), users can express their relative                           where the rating matrix R consisting of user-item interaction
preference if such a provision exists. This would allow the user to                       gives access to the values of rui , indicating the rating given to
indicate, for the item pair, which item he prefers more. The user                         item i by user u. The quantity bi represents the bias for the item.
can also indicate if he favors both the items equally.                                    The method models each user u by a vector pu . This vector space
    This pairwise preference can be captured through an interface                         measures user’s interest in the particular item based on affinity of
where users mark their preferences over a small subset of data.                           user towards these factors. Similarly, each item i is represented by
However, as mentioned earlier, we are not aware of the existence                          a feature vector qi . This latent factor representation defines the
of any such system that allows the users to enter the pairwise                            degree to which these factors are possessed by the item.
preferences directly. In absence of that, if the rating data is available,                   Given the training set, the mean-squared error (MSE) function
pairwise preferences can be obtained as: rui j = rui − ru j . Here, rui                   on the training data (with suitable regularization) is used as the
indicates the absolute rating given by user u to item i. If the sign of                   objective function. The optimization is generally performed using
rui j is positive, we may consider that item i is preferred over item                     Stochastic Gradient Descent (SGD) and the algorithm outputs op-
j by the user u. If the sign is negative we may consider that j is                        timized values of the rating parameters Θ = {B, P, Q } where B
preferred over i. If the value of ru i j is zero, then it indicates that                  represents the bias values (bi ) for all the items i ∈ I , P represents
both the items are equally preferable to u. Similar kind of approach                      the user latent feature vector (pu ) for all the users u ∈ U and Q
1 The dataset used in our experiments did not have the item descriptions, but contained   represents the item latent feature vector (qi ) for all the items i ∈ I .
the reviews                                                                               The objective function is defined as :
SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA                                                           Priyanka Choudhary and Maunendra Sankar Desarkar


                                                                                                by Equation 10. The optimization variables (parameters) now be-
                                                                                                come Θ = {B, P }. The solution to this objective function is obtained
                                    si j                           λ            λ
             Õ                              2
  min                   rui j −                   + λ||pu || 2 +     ||qi || 2 + ||q j || 2 +   through Stochastic Gradient Descent.
      Θ                           1 + si j                         2            2
          (u,i, j)∈T
                                                             λ            λ                                            
                                                                                                                                    si j
                                                                                                                                           2
                                                               ||bi || 2 + ||b j || 2
                                                                                                               Õ
                                                                                          (3)      min                  rui j −               + λ||pu || 2 +
                                                             2            2                           Θ                           1 + si j
                                                                                                          (u,i, j)∈T
where
                       si j = exp (pu (qi − q j ) + (bi − b j ))                                                                          λ            λ
                                                                                                                                            ||bi || 2 + ||b j || 2 (10)
                                                                                                                                          2            2
   T represents the training set and λ is the regularization parameter.
                                                                                                Here qi remains fixed throughout the learning process. Hence, we
The update rules for optimizing the above objective function are
                                                                                                do not have regularization term for qi in the objective function.
given below:
                                                                                                The update rules remain same for pu , bi and b j as in Equation 4,
   Update rules :
                                                                                                7 and 8 respectively. Personalized utility scores of the items are
                              2eui j si j (qi − q j )
                                                             
             pu ← pu + α                              − 2λpu       (4)                          computed using Equation 9 and recommendations are generated.
                                   (1 + si j )2
                                
                                  2eui j si j pu
                                                                                               3.3       Pairwise Relation based Factor modeling
                  qi ← qi + α                     −   λq  i        (5)                                    with Topics and Offset (PreFacTO)
                                   (1 + si j )2
                                                                                                In the previous method described in Section 3.2, the topic modeling
                                  2eui j si j pu
                                                           
                  qj ← qj − α                     +   λq  j        (6)                          provides the seed information for the item latent vector representa-
                                   (1 + si j )2                                                 tions obtained from the reviews. These representations were fixed
                                    2eui j si j
                                                                                              throughout the learning process. In our next method, we allow
                   bi ← bi + α                   −   λb i          (7)                          the item representations to take deviations from their LDA topic
                                   (1 + si j )2
                                                                                                vectors. If ϵi is the deviation of the item i’s representation from its
                                    2eui j si j
                                                          
                  bj ← bj − α                    +   λb            (8)                          topic vector qi , then the pairwise ratings can be modeled as:
                                                        j
                                   (1 + si j )2                                                                exp (pu ((qi + ϵi ) − (q j + ϵ j )) + (bi − b j ))
                              s                                                                     rˆui j =
where eui j = rui j − (1+si j ) and α is the learning rate.                                                  1 + exp (pu ((qi + ϵi ) − (q j + ϵ j )) + (bi − b j ))
                             ij
                                                                                                                                                                                 (11)
   After obtaining the model parameters through stochastic gradi-                                                                        1
ent descent, we can predict the personalized utility of the item i for                                     =
                                                                                                             1 + exp (−(pu ((qi + ϵi ) − (q j + ϵ j )) + (bi − b j ))))
the user u as:
                                                                                                    The parameters for this model are Θ = {B, P, E}. As earlier, B
                              ρui = bi + pu qi                     (9)                          and P are the collection of item-bias vectors and user vectors. E is
The top-N items according to this predicted personalized utility                                the collection of deviations or offsets of the items from their LDA
are recommended to the user.                                                                    topic vectors. The objective function for this model can be written
                                                                                                as:
3.2       Preference Relation based Factor modeling
          with Topics (Pairwise+Topic)                                                                                              si j                    λ           λ
                                                                                                               Õ                          2
                                                                                                   min                  rui j −               + λ||pu || 2 + ||bi || 2 + ||b j || 2 +
As motivated in the previous section, the review comments about                                       Θ                           1 + si j                  2           2
items can be useful in identifying the aspects that the items pos-                                        (u,i, j)∈T
sess. Moreover, it also helps to understand the reasons behind the                                                                                   λ             λ
                                                                                                                                                       ||ϵi || 2 + ||ϵ j || 2 (12)
liking or disliking of the item by the user. Hence, we extend the                                                                                    2             2
previous method to incorporate the reviews about the items. The                                 where si j = exp (pu (qi + ϵi ) − (q j + ϵ j )) + (bi − b j )) and rui j is al-
proxy documents for the items are passed through a Latent Dirichlet                             ready defined in Equation 1.
Allocation (LDA) framework to identify the latent topics present in                                The model parameters are learned using Stochastic Gradient
the documents.                                                                                  Descent. The update rules are given below:
    LDA is a probabilistic topic modeling technique that discovers                                                   2eui j si j ((qi + ϵi ) − (q j + ϵ j ))
                                                                                                                                                                      
latent topics in the documents. It represents each document di by                                   pu ← pu + α                                                 − 2λpu        (13)
                                                                                                                                   (1 + si j )2
k-dimensional topic distribution θ i through Dirichlet distribution.
The k-th dimension of the vector indicates the probability with                                                                       2eui j si j pu
                                                                                                                                                            
                                                                                                                  ϵi ← ϵi + α                        −  λϵ i                  (14)
which the k-th topic is being discussed in the document. Each topic                                                                    (1 + si j )2
is associated with the word distribution ϕ k which is the probability                                                               
                                                                                                                                      2eui j si j pu
                                                                                                                                                             
of the word-topic association.                                                                                    ϵj ← ϵj − α                        + λϵ j                   (15)
                                                                                                                                       (1 + si j )2
    We pass the collection of documents D = ∪i ∈I di to LDA. As an                                                            s
output, we get the topic vector qi corresponding to each document                               where eui j = rui j − (1+si j ) .
                                                                                                                             ij
di ∈ D. For each item i, the latent representation is now fixed at qi ,                           The update rules for the bias terms remain same as specified in
and these values of qi ’s are fed to the factor modeling technique                              Equations 7 and 8. After the optimized values of the parameters are
used in Section 3.1. The objective function for this method is given                            obtained, personalized utility of the item i for user u is computed
PReFacTO: Preference Relations Based Factor Model with Topic Awareness and Offset
                                                                            SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA


using following equation and Top-N recommendations are made                   4.3     Evaluation
for each user.                                                                For evaluation of the models presented in Section 3, we compare
                     ρui = bi + pu (qi + ϵi )           (16)                  those three algorithms with the baseline methods mentioned in
                                                                              Section 4.2. We use Precision@k, Recall@k, IRecall and URecall
4 PERFORMANCE EVALUATION                                                      as the evaluation metrics. We took k = 100. The IRecall and the
4.1 Dataset                                                                   URecall metrics are described below.
We use the Amazon product review dataset2 for our experiments.                IRecall: IRecall of an item is computed using the following equa-
This dataset contains reviews and ratings given to different items by         tion:
different users. We consider items from the Movies and TV category.                                         |Rec(i) ∩ Rated(i)|
All items in this category were released between 1999 to 2013. We                               IRecalli =                      ,            (17)
                                                                                                                |Rated(i)|
divided this timeline into three blocks each consisting of 5 year             where Rec(i) denotes the sets of users to whom item i is recom-
span: (A) 1999-2003, (B) 2004-2008, and (C) 2009-2013. From each              mended. Rated(i) denotes the set of users who have i in their test
block, we removed the items which have less than 10 reviews in                set. Thus this metric measures the algorithm’s ability to recom-
that block and the users who have given less than 5 reviews in that           mend items to the users who have actually rated it. IRecall for an
block. After this filtering to remove these non-prolific users and            algorithm is defined as the average of the item-wise IRecall values
items, we have 3,513 items, 85,375 users, 725198 ratings and 725176           over the set of concerned items.
reviews in our dataset. We have used 70% of this data for training               URecall: URecall of a user is computed as:
and the remaining 30% for testing purposes.
                                                                                                             |Rec(u) ∩ Rated(u)|
                                                                                               U Recallu =                       ,             (18)
4.2        Baseline Methods                                                                                       |Rated(u)|
We compare our preference relation based models to the following              where Rec(u) denotes the sets of items that have been recommended
baselines:                                                                    to user u. Rated(u) denotes the set of items present in the test set
                                                                              of user u.
   (a) Absolute Rating based Factor modeling (Pointwise):
                                                                                 For the experimentation and evaluation purposes, we have di-
       In analogous to the standard latent-model [8], we convert
                                                                              vided the items into bins. These bins are created based on the
       the absolute rating values using the sigmoid function. The
                                                                              number of reviews. For each block, we maintain item review count
       sigmoid function is then used to make predictions using the
                                                                              written by the user during that time span (block range). We define
       following objective function:
                                                                              two bins for each block as follows: Bin-0 consists of the items hav-
                                                                              ing review count less than 40 and Bin-1 contains the items having
                               si                    λ           λ
             Õ                     2
    min              ρui −             + λ||pu || 2 + ||qi || 2 + ||bi || 2   review count greater than or equal to 40. We consider the Bin-0 as
      Θ                      1 + si                  2           2            a collection of sparse items, and the items from Bin-1 as dense items.
           (u,i)∈T
                                                                              For each bin, we compute the average of the IRecall value of all
          where
                                   exp (rui )                                 the items present in the corresponding bin. Analogous to the items,
                             ρui =
                                 1 + exp (rui )                               we divide the users as well into the bins based on the number of
                                                                              reviews given by the user. Also, we take the average of the URecall
                         si = exp (pu qi + bi )
                                                                              value of all the users falling into the corresponding bin. We then
   (b) Absolute Rating based Factor modeling with Topics                      compare the IRecall and URecall values of the different methods
       (Pointwise+Topics) : We combine the topic modeling                     mentioned in this paper with the baseline approaches.
       technique with the latent factor modeling. The latent vector
       representations of the items are drawn from the reviews (by            4.4     Experimental Analysis And Discussion
       passing the reviews of the items as an input to the LDA) and
                                                                              Setting the parameters for the proposed method: The model
       fed to latent factor model. Here the item representations will
                                                                              hyperparameters λ (regularization parameter) and k (number of
       remain fixed and the user-latent space will be learned using
                                                                              topics) need to be determined in order to produce best models
       the Stochastic Gradient Descent.
                                                                              for recommendation. Experiments were conducted with different
   (c) Absolute Rating based Factor modeling with Topics
                                                                              values of λ and k on a small subset of the data. From the experiments,
       And Offset (Pointwise+Topics+Offset) : Along with
                                                                              the combination of λ = 4E − 05 and k = 10 were found to be the
       the factor and the topic modeling, we introduce item latent
                                                                              best values for the parameters. Hence, we select these two values
       vector offset which captures the deviations of the item feature
                                                                              of the hyperparameters for further experimentation. Performance
       vector representations drawn from the LDA. The objective
                                                                              of the algorithm on the test set for different values of λ (keeping k
       function to model the system and learn the user-latent and
                                                                              fixed at 10) and different values of k (keeping λ fixed at 4E − 05 are
       the item-offset representations can be written as:
                                                                              shown in Table 1 and Table 2 respectively.
                           si                    λ           λ
          Õ                    2
                                                                                 Comparison with other methods and discussion: For each
    min           ρui −            + λ||pu || 2 + ||bi || 2 + ||ϵi || 2
     Θ                   1 + si                  2           2                method, we run the experiments for the three blocks, and compute
           (u,i)∈T
                                                                              the average value of each metric over these three blocks. These
          where si = exp (pu (qi + ϵi ) + bi )                                average values are reported in Table 3. It can be seen from the exper-
2 http://jmcauley.ucsd.edu/data/amazon/                                       imental results that pairwise methods and in particular, PreFacTO
SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA                                                             Priyanka Choudhary and Maunendra Sankar Desarkar

               Table 1: Values of the evaluation metrics for different values of λ. Number of topics were fixed at 10.

              Alpha      Precision           Recall        IRecall(reviews<40)          IRecall(reviews>40)         URecall(reviews<40)           URecall(reviews>40)
           4.00E-02      0.0076              0.1045              0.0117                        0.0673                     0.1074                        0.0863
           4.00E-03      0.0122              0.1451               0.0013                       0.0793                     0.1448                        0.1456
           4.00E-04      0.0120              0.1398               0.0012                       0.0789                     0.1390                        0.1435
           4.00E-05      0.0125              0.1457               0.0012                       0.0792                     0.1448                        0.1504
           4.00E-06      0.0124              0.1448               0.0011                      0.0797                      0.1438                        0.1495

Table 2: Values of the evaluation metrics for different values of k: the number of topics. The value of λ was fixed at 4.00E − 05.

       No. of Topics         Precision            Recall     IRecall(reviews<40)          IRecall(reviews>40)           URecall(reviews<40)          URecall(reviews>40)
                   5         0.0107               0.1229            0.0008                       0.0781                       0.1221                       0.1302
                  10         0.0125               0.1457           0.0012                       0.0792                        0.1448                       0.1504
                  15         0.0108               0.1246            0.0011                       0.0778                       0.1238                       0.1324
                  20         0.0108               0.1244            0.0008                       0.0784                       0.1233                       0.1331

Table 3: Comparing performances of different algorithms. The best values for each metric across the algorithms are marked
in bold.

                  Method                      Precision      Recall            IRecall(reviews<40)         IRecall(reviews>40)          URecall(reviews<40)             URecall(reviews>40)
               Pointwise                      0.0106         0.1267                   0.0141                      0.0635                      0.1271                          0.1210
        Pointwise+Topics                      0.0048         0.0555                   0.0256                      0.0551                      0.0551                          0.0568
 Pointwise+Topics+Offset                      0.0055         0.0650                   0.0252                      0.0514                      0.0651                          0.0632
                Pairwise                      0.0021         0.0254                  0.0420                       0.0312                      0.0255                          0.0252
         Pairwise+Topics                      0.0038         0.0485                   0.0378                      0.0399                      0.0491                          0.0448
               PreFacTO                       0.0125         0.1457                   0.0012                     0.0792                       0.1448                          0.1504

    0.08                                                                                         0.14


    0.07
                                                                                                 0.12


    0.06
                                                                                                     0.1

    0.05
                                                                                                 0.08

    0.04

                                                                                                 0.06
    0.03

                                                                                                 0.04
    0.02


                                                                                                 0.02
    0.01


      0                                                                                               0
                  Block-1               Block-2        Block-3               Average                                Block-1            Block-2   Block-3               Average
                                   Pointwise                            Pairwise                                                  Pointwise                       Pairwise
                             Pointwise+Topic                     Pairwise+Topic                                             Pointwise+Topic                Pairwise+Topic
                      Pointwise+Topic+Offset                          PreFacTO                                       Pointwise+Topic+Offset                     PreFacTO


 Figure 2: Comparison of IRecall values of different       Figure 3: Comparison of IRecall values of different algo-
 algorithms taking into consideration the items having re- rithms with review count of the items greater than or equal
 view count less than 40.                                  to 40.


gives the best results compared to other algorithms for the com-                                 method outperforms all other approaches. The IRecall values for
plete dataset. Although the PreFacTO and pointwise are at par                                    dense items shows that PreFacTO performs very well for dense
based on their performance but the PreFacTO slightly surpasses                                   items. The IRecall values for the sparse and dense items for different
the pointwise in terms of overall precision and recall values. If                                blocks are compared in Figure 2 and Figure 3 respectively. There are
we compare the IRecall values for the sparse items, the Pairwise                                 four groups of columns in both the figures. The first three represent
PReFacTO: Preference Relations Based Factor Model with Topic Awareness and Offset
                                                                            SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA


the three blocks, and the last one represents the average value over     Pairwise and PreFacTo and fuse the recommendations generated
the three blocks.                                                        by them from sparse and dense zones to come up with the final
   The superior performance of Pairwise and worst performance of         recommendations. It might also be possible to develop parame-
PreFacTO in case of the sparse items might be due to the sparseness      terized algorithms that automatically switch between Pairwise
of the reviews. The LDA representation for the sparse items having       (no consideration of reviews) and PreFacTo (considering the re-
very less reviews and further learning in the form of deviations on      views) depending on the availability of data for the item under
top of the LDA vectors do not provide any additional benefit. On         consideration during the modeling.
the contrary, it might have led to overfitting. But on the other hand,
Pairwise tries to model the system only through rating information.      REFERENCES
The preference relations provide some additional information to           [1] Deepak Agarwal and Bee-Chung Chen. 2009. Regression-based latent factor
                                                                              models. In Proceedings of the 15th ACM SIGKDD international conference on
the item in the process of comparing it with the other items. There           Knowledge discovery and data mining. ACM, 19–28.
is no overfitting in the process and modeling the system for the          [2] Laura Blédaité and Francesco Ricci. 2015. Pairwise preferences elicitation and
sparse items works well. If we look at URecall values for the sparse          exploitation for conversational collaborative filtering. In Proceedings of the 26th
                                                                              ACM Conference on Hypertext & Social Media. ACM, 231–236.
users, the PreFacTO actually performs well.                               [3] Maunendra Sankar Desarkar, Sudeshna Sarkar, and Pabitra Mitra. 2010. Aggre-
   However, in case of dense items, the PreFacTO outperforms                  gating Preference Graphs for Collaborative Rating Prediction. In Proceedings of
every other method including Pointwise. Along with the pairwise               the Fourth ACM Conference on Recommender Systems (RecSys ’10). ACM, New
                                                                              York, NY, USA, 21–28. https://doi.org/10.1145/1864708.1864716
preference based learning, the item vector representation from the        [4] Maunendra Sankar Desarkar, Roopam Saxena, and Sudeshna Sarkar. 2012. Pref-
rich-textual information of the reviews and learning the deviations           erence relation based matrix factorization for recommender systems. In Inter-
                                                                              national conference on user modeling, adaptation, and personalization. Springer,
from these item vectors help in better prediction with reasoning as           63–75.
to why the item will be likeable or dislikable to the user.               [5] Gayatree Ganu, Noemie Elhadad, and Amélie Marian. 2009. Beyond the stars:
   In any real recommendation system, there are sparse items, and             improving rating predictions using review text content.. In WebDB, Vol. 9. Citeseer,
                                                                              1–6.
there are dense items as well. Depending on the exact system or           [6] Nicolas Jones, Armelle Brun, and Anne Boyer. 2011. Comparisons instead of rat-
domain, the ratio of sparse to dense items can vary. In this study,           ings: Towards more stable preferences. In Proceedings of the 2011 IEEE/WIC/ACM
we have explored few algorithms that consider pairwise feedback               International Conferences on Web Intelligence and Intelligent Agent Technology-
                                                                              Volume 01. IEEE Computer Society, 451–456.
instead of absolute ratings. Among the proposed methods, Pairwise         [7] Saikishore Kalloori, Francesco Ricci, and Marko Tkalcic. 2016. Pairwise prefer-
works well for the sparse items and PreFacTO works well for the               ences based matrix factorization and nearest neighbor recommendation tech-
                                                                              niques. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM,
dense items. The experiments show the power of preference relation            143–146.
based feedback for recommendation. However, it does not establish         [8] Robert Bell Koren, Yehuda and Chris Volinsky. 2009. Matrix factorization tech-
the superiority of any single algorithm that works across the entire          niques for recommender systems. Computer (Long Beach, Calif.) 42 (Aug. 2009),
                                                                              30–37. https://doi.org/10.1109/MC.2009.263
range of data (both sparse and dense zones). Nonetheless, we believe      [9] Yehuda Koren and Robert Bell. 2015. Advances in collaborative filtering. In
that it might be possible to design such algorithms that works well           Recommender systems handbook. Springer, 77–118.
for the entire range of data. It might be an interesting research        [10] Shaowu Liu, Gang Li, Truyen Tran, and Yuan Jiang. 2016. Preference Relation-
                                                                              based Markov Random Fields for Recommender Systems. In Asian Conference on
direction to develop hybrid methods that consider both Pairwise               Machine Learning (Proceedings of Machine Learning Research), Geoffrey Holmes
and PreFacTo for fusing the recommendations from sparse and                   and Tie-Yan Liu (Eds.), Vol. 45. PMLR, Hong Kong, 157–172. http://proceedings.
                                                                              mlr.press/v45/Liu15.html
dense zones to generate the final recommendations.                       [11] Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics:
                                                                              understanding rating dimensions with review text. Proceedings of the 7th ACM
                                                                              conference on Recommender systems. ACM, 2013 (Oct. 2013), 165–172. https:
5   CONCLUSIONS AND FUTURE WORK                                               //doi.org/10.1145/2507157.2507163
                                                                         [12] Ladislav Peska and Peter Vojtas. 2017. Using implicit preference relations to
We have presented the PReFacTO approach in this paper, which                  improve recommender systems. Journal on Data Semantics 6, 1 (2017), 15–30.
aligns the latent factor modeling between the users and the item         [13] Ruslan Salakhutdinov and Andriy Mnih. 2008. Bayesian probabilistic matrix
pairs with the hidden topics in the reviews of the item. The pair-            factorization using Markov chain Monte Carlo. In Proceedings of the 25th interna-
                                                                              tional conference on Machine learning. ACM, 880–887.
wise relation adds significant information for the sparse items and      [14] Chong Wang and David M Blei. 2011. Collaborative topic modeling for recom-
provides better modeling of the user-item interaction, and the item           mending scientific articles. In Proceedings of the 17th ACM SIGKDD international
                                                                              conference on Knowledge discovery and data mining. ACM, 448–456.
hidden dimensions are effectively drawn from the reviews. The
topic modeling based latent factors of the items along with the
pairwise relation between these items (where the latent feature
space of the items drawn from the LDA are allowed to change
through offset during the learning process) provides significant
improvement over the methods considered in isolation. Our algo-
rithm runs very effectively on large dataset and comparable with
the pointwise approach. In fact, PreFacTO method gives marginal
improvements over the pointwise methods. It is also shown that
Pairwise method works well for the sparse items and PreFacTO
provides better performance in case of dense items.
   It was observed in the experimental results that Pairwise works
well for sparse items and PreFacTO works well for dense items.
It might be possible to develop hybrid methods that consider both