=Paper=
{{Paper
|id=Vol-2290/kars2018_paper1
|storemode=property
|title=Deriving Item Features Relevance from Collaborative Domain Knowledge
|pdfUrl=https://ceur-ws.org/Vol-2290/kars2018_paper1.pdf
|volume=Vol-2290
|authors=Maurizio Ferrari Dacrema,Alberto Gasparin,Paolo Cremonesi
|dblpUrl=https://dblp.org/rec/conf/recsys/DacremaGC18
}}
==Deriving Item Features Relevance from Collaborative Domain Knowledge==
<pdf width="1500px">https://ceur-ws.org/Vol-2290/kars2018_paper1.pdf</pdf>
<pre>
     Deriving item features relevance from collaborative domain
                             knowledge
        Maurizio Ferrari Dacrema                                       Alberto Gasparin                              Paolo Cremonesi
              Politecnico di Milano                            Universitá della Svizzera Italiana                  Politecnico di Milano
            maurizio.ferrari@polimi.it                             alberto.gasparin@usi.ch                       paolo.cremonesi@polimi.it

ABSTRACT                                                                             weighting, which can be considered as a generalization of feature
An Item based recommender system works by computing a sim-                           selection, is a useful tool to improve content based algorithms. Tra-
ilarity between items, which can exploit past user interactions                      ditional Information Retrieval methods like TF-IDF and BM25 [10],
(collaborative filtering) or item features (content based filtering).                while often leading to accuracy improvements, cannot take into
Collaborative algorithms have been proven to achieve better rec-                     account how important are those features from the user point of
ommendation quality then content based algorithms in a variety of                    view. Collaborative Filtering on the other hand determines items
scenarios, being more effective in modeling user behaviour. How-                     similarity by taking into account the user interactions. It is known
ever, they can not be applied when items have no interactions at all,                that collaborative filtering generally outperforms content based
i.e. cold start items. Content based algorithms, which are applicable                filtering even when few ratings for each user are available [5]. The
to cold start items, often require a lot of feature engineering in                   main disadvantage of collaborative systems is their inability to
order to generate useful recommendations. This issue is specifically                 compute predictions for new items or new users due to the lack of
relevant as the content descriptors become large and heterogeneous.                  interactions, this problem is referred to as cold-start item. In cold
The focus of this paper is on how to use a collaborative models                      start scenarios only content based algorithm are applicable, this
domain-specific knowledge to build a wrapper feature weighting                       is the case in which improving item description would be most
method which embeds collaborative knowledge in a content based                       beneficial, and is therefore the main focus of this article.
algorithm. We present a comparative study for different state of                         The most common approach to tackle the cold-start item problem
the art algorithms and present a more general model. This machine                    is to rely on content-based algorithms, whose accuracy is sometimes
learning approach to feature weighting shows promising results                       much poorer. In this paper we further investigate the cold-start
and high flexibility.                                                                item problem focusing on how to learn feature weights able to
                                                                                     better represent feature importance from the user point of view.
ACM Reference Format:                                                                We provide a comparative study of state of the art algorithms and
Maurizio Ferrari Dacrema, Alberto Gasparin, and Paolo Cremonesi. 2019.
                                                                                     present a more general model demonstrating its applicability on a
Deriving item features relevance from collaborative domain knowledge. In
Proceedings of Knowledge-aware and Conversational Recommender Systems
                                                                                     wide range of collaborative models. Moreover we describe a two
(KaRS) Workshop 2018 (co-located with RecSys 2018). ACM, New York, NY,               step approach that:
USA, 4 pages.                                                                            (1) exploits the capability of a generic collaborative algorithms
                                                                                             to model domain-specific user behaviour and achieve state-
1    INTRODUCTION                                                                            of-the-art performance for warm items
                                                                                         (2) embeds the collaborative knowledge into feature weights
Recommender systems aim at guiding the user through the naviga-
tion of vast catalogs and in recent years they have become wide-                         The rest of the paper is organized as follows. In Section 2 we
spread. Among item based algorithms, content based are the most                      briefly review the literature in the cold-start recommendation do-
widely used, as they provide good performance and explainability.                    main, in Section 3 our framework is presented and a comparison
Content Based algorithms recommend items based on similarities                       of the different algorithms is discussed in Section 4. Finally conclu-
computed via item attributes. Although being applicable in any                       sions and future works are highlighted in Section 5
circumstance in which at least some information about the items is
available, they suffer from many drawbacks related to the quality of                 2   RELATED WORKS
the item features. It is hard and expensive to provide an accurate and               Various tools are at our disposal to assess the relevance of a feature.
exhaustive description of the item. In recent years the amount of                    We can distinguish feature weighting algorithms in three categories:
data which is machine-readable on the web has increased substan-                     filtering, embedding and wrappers [4].
tially, it is therefore possible to build heterogeneous and complex                      Filtering methods usually rely on information retrieval. Methods
representations for each item (e.g. textual features extracted from                  like TF-IDF or BM25 are not optimized with respect to a predictive
web pages). Those representations however often comprise of high                     model, therefore the resulting weights are not domain-specific and
number of features and are sparse and noisy, to the point where                      can not take into account the rich collaborative information, even
adding new data hampers the recommendation quality. Feature                          when available.
                                                                                         Embedding methods learn feature weights as a part of the model
Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018          training, examples of this are UFSM [3] and FBSM [11]. Among
(co-located with RecSys 2018), October 7, 2018, Vancouver, Canada.                   embedded methods main drawbacks are a complex training phase
Copyright for the individual papers remains with the authors. Copying permitted
for private and academic purposes. This volume is published and copyrighted by its   and noise sensitivity due to the strong coupling of features and
editors.                                                                             interactions. User-Specific Feature-based Similarity Models (UFSM)
Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Vancouver, Canada.                                                                                             Ferrari D. et al.

learns a personalized linear combination of similarity functions                                         train      validation test
known as global similarity functions and can be considered as a


                                                                                            users
special case of Factorization Machines. Factorized Bilinear Similarity                                     A            B       C
Models (FBSM) was proposed as an evolution of UFSM and aims to
discover relations among item features. The FBSM similarity matrix
is computed as follows:                                                                                   warm items
                         sim(i, j) = fiT Wfj                       (1)
where f is the feature vector of the item, W is the matrix of parame-     Figure 1: URM split, A and B contains the warm items while
ters whose diagonal elements represents how well a feature of item        C contains the cold items.
i interacts with the same feature of item j, while the off diagonal
elements determines the correlation among different features. In
order to reduce the number of parameters W is represented as the          embedded method to learn the optimal item feature weights that
summation of diagonal weights and a low rank approximation of             better approximate the item-item collaborative similarity obtained
the off-diagonal values:                                                  before.

                           W = D + VT V                            (2)    3.1     Parameter estimation
where D is a diagonal matrix having as dimension the number of            We solve (4) via SGD applying Adam [6] which is well suited for
features, n F , and V ∈ Rn L ×n F . The number of latent factors n L is   problems with noisy and sparse gradients. Note that here our goal
treated as a parameter.                                                   is to find weights that will approximate as well as possible the
   Wrapper methods rely on a two step approach, by learning fea-          collaborative similarity, this is why we optimize MSE and not BPR,
ture weights on top of an already available model, an example of          which was used in FBSM. The objective function is therefore:
this is Least-square Feature Weights (LFW) [1]. LFW learns feature                             1 ÕÕ                    2
weights from a SLIM similarity matrix using a simpler model with              L MS E (D, V ) =            ŝi j − siCj F + λ ∥D∥ 2F + β ∥V∥ 2F (5)
                                                                                               2
respect to FBSM.                                                                                    i ∈I j ∈I
                          sim(i, j) = fiT Dfj                       (3)
                                                                          4     EVALUATION
  All these algorithms, to our best knowledge, have never been
subject to a comparative study.                                           We performed experiments to confirm that our approach is capable
                                                                          of embedding collaborative knowledge in a content based algo-
3   SIMILARITY BASED FEATURE WEIGHTING                                    rithm improving its recommendation quality in an item cold-start
                                                                          scenario.
Feature weighting can be formulated as a minimization problem
whose objective function is:
                                                                          4.1     Dataset
                                      2                                   In order to evaluate our approach we used only item descriptors
           argmin      S(CF) − S(W)       + λ ∥D∥ 2F + β ∥V∥ 2F    (4)
             W                        F                                   accessible via web, which we assume will be available for new items,
                                                                          excluding user generated content. The datasets are the following:
where S(CF) is any item-item collaborative similarity, S(W) is the sim-
ilarity function described in Equation (1), W is the feature weight          Netflix. Enriched with structured and unstructured attributes
matrix which captures the relationships between items features            extracted from IMDB. This dataset has 250k users, 6.5k movies, 51k
and β and λ are the regularization terms, we call this model Collab-      features and 8.8M ratings in 1-5 scale. The rating data is enriched
orative boosted Feature Weighting (CFW). This model can either use        with 6.6k binary editorial attributes such as director, actor and
the latent factors (CFW D+V), as in FBSM, or not (CFW D), as in           genres.
LFW.
   The advantages of learning from a similarity matrix, instead of           The Movies Database. 1 : 45k movies with 190k TMDB editorial
using the user interactions, are several:                                 features and ratings for 270k users. This dataset has been built from
                                                                          the original one by extracting its 70-cores.
     • High flexibility in choosing the collaborative algorithm which
                                                                             For all the listed datasets, features belonging to less than 5 items
        can be treated as a black box
                                                                          or more than 30% of the items have been removed, as done in [11].
     • Similarity values are less noisy than user interactions
     • The model is simpler and convergence is faster
                                                                          4.2     Evaluation procedure
   In this paper a two steps hybrid method is presented, in order
                                                                          The evaluation procedure consists of two steps.
to easily allow to embed domain-specific user behaviour, as rep-
resented by a collaborative model, in a weighted content based               Step 1 - Collaborative algorithm: In this step the training of
recommender. The presented model is easily extendable to other            the collaborative algorithm is performed on warm items, which are
algorithms and domains. The learning phase is composed by two             defined as the union of split A and B, see Figure 1. Hyper-parameters
steps. The goal of the first step is to find the optimal parameters for   are tuned with a 20% holdout validation.
the collaborative algorithm, to this end a collaborative algorithm
                                                                          1 https://www.kaggle.com/rounakbanik/the-movies-dataset
is trained and tuned on warm items. The second step applies an
    Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Deriving item features relevance from collaborative domain knowledge                                      Vancouver, Canada.

                                                                 Netflix                                              The Movies
               Algorithm                   Precision    Recall    MRR      MAP        NDCG        Precision      Recall   MRR    MAP                 NDCG
      Content         CBF KNN               0.0439      0.0405    0.1177   0.0390     0.0449        0.3885       0.0916      0.6909      0.3166      0.1641
      baselines       CBF KNN IDF           0.0439      0.0405    0.1177   0.0390     0.0449        0.3931       0.0930      0.6956      0.3215      0.1662
                      CBF KNN BM25          0.0466      0.0410    0.1237   0.0414     0.0462        0.3931       0.0930      0.6956      0.3215      0.1662
  hybrid feature      FBSM                  0.0244      0.0240    0.0476   0.0162     0.0199        0.2957       0.0727      0.5503      0.2114      0.1192
 weights baselines    LFW                   0.0679      0.0573    0.1632   0.0631     0.0646        0.4135       0.0959      0.7073      0.3442      0.1736
                      CF KNN                0.0688     0.0597     0.1585   0.0609     0.0645       0.3891        0.0906      0.6939      0.3192      0.1597
                      P3alpha               0.0714     0.0679     0.1707   0.0664     0.0716       0.3847        0.0882      0.6911      0.3179      0.1578
      CFW - D         RP3beta               0.0656     0.0624     0.1643   0.0610     0.0669       0.4281        0.1010      0.7233      0.3588      0.1806
                      SLIM RMSE             0.0643     0.0529     0.1572   0.0583     0.0604       0.4058        0.0923      0.7017      0.3372      0.1669
                      SLIM BPR              0.0685     0.0539     0.1583   0.0618     0.0619       0.4170        0.0967      0.7218      0.3455      0.1723
                                Table 1: Performance of CFW and baselines evaluated on cold items.


   Step 2 - Feature weights: In this case a cold-item validation is         Model                 Precision      Recall       MRR         MAP        NDCG
chosen as it better represents the end goal of the algorithm to find
                                                                                       D+V         0.0244        0.0240      0.0476      0.0162      0.0199
weights that perform well on cold items. The collaborative similarity
                                                                               FBSM    D           0.0348        0.0366      0.0954      0.0312      0.0379
will be learned using only split A with the hyper-parameters found
                                                                                       V           0.0138        0.0112      0.0247      0.0071      0.0086
in the previous step, see Figure 1. An embedded method is then used
to learn the optimal item feature weights that better approximate                      D+V         0.0475        0.0424      0.1336      0.0430      0.0489
the item-item collaborative similarity obtained before. The hyper-             CFW     D           0.0635        0.0579      0.1653      0.0602      0.0641
parameters of the machine learning model are tuned using split B                       V           0.0412        0.0346      0.1146      0.0335      0.0392
while set C is used for pure testing.                                      Table 2: Model component contribution on the result of
                                                                           FBSM and CFW on Netflix.
4.3    Collaborative Similarity
The collaborative similarity matrices used in our approach are
computed using different algorithms: KNN collaborative (using
                                                                               While LFW only learned feature weights using a SLIM similarity
the best-performing similarity among: Cosine, Pearson, Adjusted
                                                                           matrix, our results indicate that it is possible to learn from a wide
Cosine, Jaccard), P3alpha [2] (graph based algorithm which models
                                                                           variety of item based algorithms, even those not relying on machine
a random walk), Rp3beta [9] (reranked version of P3alpha), SLIM
                                                                           learning. This means that machine learning feature weights can
BPR [8] and SLIM MSE [7].
                                                                           be used on top of already available collaborative algorithm with
                                                                           little effort. Using an intermediate similarity matrix while offering
4.4    Results                                                             additional degrees of freedom in the selection of the collabora-
Table 1 shows the recommendation quality of both pure content              tive model, also simplifies the training phase and improves overall
based and hybrid baselines, as well as CFW D evaluated on all              performance.
collaborative similarity models. Table 2 shows the performance of
the two components of CFW and FBSM on Netflix, results for the             5     CONCLUSIONS
other dataset are omitted as they behave in the same way.                  In this paper we presented different state of the art feature weight-
   From Table 1 we can see that FBSM performs poorly, which                ing methods, compared their performance and proposed a more
indicates that while it has the power to model complex relations, it       general framework to effectively apply machine learning feature
is more sensitive to noise and data sparsity than other algorithms.        weighting to boost content based algorithms recommendation qual-
Learning from a similarity matrix, as LFW and CFW D+V does,                ity, embedding user domain-specific behaviour. We also demon-
results in much better results than FBSM. In Table 2 it is possible        strate high flexibility in the choice of which collaborative algorithm
to see that the latent factor component was able to learn very little,     to use. Future work directions include testing the proposed ap-
this suggests that while rendering the model more expressive, it           proach in different datasets and domains, as well as exploring the
introduces noise and numerical instability. Note that the perfor-          symmetric problem of using the collaborative similarity to discover
mance using of the diagonal component alone is higher than the             item features or to reduce feature noise.
one obtained by adding the V component. However, its effective-
ness could be influenced by the feature structure and therefore it         REFERENCES
might be relevant in some specific cases. rom Table 1 we can see            [1] Leonardo Cella, Stefano Cereda, Massimo Quadrana, and Paolo Cremonesi. 2017.
that by using only the diagonal and discarding the latent factor                Deriving Item Features Relevance from Past User Interactions. In Proceedings
                                                                                of the 25th Conference on User Modeling, Adaptation and Personalization. ACM,
component, the performance improves significantly.                              275–279.
Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Vancouver, Canada.                                                                                             Ferrari D. et al.

[2] Colin Cooper, Sang Hyuk Lee, Tomasz Radzik, and Yiannis Siantos. 2014. Random        [8] Xia Ning and George Karypis. 2011. Slim: Sparse linear methods for top-n
    walks in recommender systems: exact computation and simulations. In Proceed-             recommender systems. In Data Mining (ICDM), 2011 IEEE 11th International
    ings of the 23rd International Conference on World Wide Web. ACM, 811–816.               Conference on. IEEE, 497–506.
[3] Asmaa Elbadrawy and George Karypis. 2015. User-Specific Feature-Based Simi-          [9] Bibek Paudel, Fabian Christoffel, Chris Newell, and Abraham Bernstein. 2017.
    larity Models for Top-n Recommendation of New Items. ACM Trans. Intell. Syst.            Updatable, Accurate, Diverse, and Scalable Recommendations for Interactive
    Technol. 6, 3, Article 33 (April 2015), 20 pages. https://doi.org/10.1145/2700495        Applications. ACM Transactions on Interactive Intelligent Systems (TiiS) 7, 1 (2017),
[4] Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature        1.
    selection. Journal of machine learning research 3, Mar (2003), 1157–1182.           [10] Stephen Robertson, Hugo Zaragoza, and Michael Taylor. 2004. Simple BM25
[5] Domonkos Tikk IstvÃąn PilÃąszy. 2009. Recommending new movies: even a few                extension to multiple weighted fields. In Proceedings of the thirteenth ACM inter-
    ratings are more valuable than metadata. RecSys 09 Proceedings of the third ACM          national conference on Information and knowledge management. ACM, 42–49.
    conference on Recommender systems, 93 – 100. https://doi.org/10.1145/1639714.       [11] Mohit Sharma, Jiayu Zhou, Junling Hu, and George Karypis. 2015. Feature-based
    1639731                                                                                  factorized bilinear similarity model for cold-start top-n item recommendation.
[6] Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti-                In Proceedings of the 2015 SIAM International Conference on Data Mining. SIAM,
    mization. arXiv preprint arXiv:1412.6980 (2014).                                         190–198.
[7] Mark Levy and Kris Jack. 2013. Efficient top-n recommendation by linear regres-
    sion. In RecSys Large Scale Recommender Systems Workshop.

</pre>