=Paper=
{{Paper
|id=Vol-2290/kars2018_paper1
|storemode=property
|title=Deriving Item Features Relevance from Collaborative Domain Knowledge
|pdfUrl=https://ceur-ws.org/Vol-2290/kars2018_paper1.pdf
|volume=Vol-2290
|authors=Maurizio Ferrari Dacrema,Alberto Gasparin,Paolo Cremonesi
|dblpUrl=https://dblp.org/rec/conf/recsys/DacremaGC18
}}
==Deriving Item Features Relevance from Collaborative Domain Knowledge==
Deriving item features relevance from collaborative domain knowledge Maurizio Ferrari Dacrema Alberto Gasparin Paolo Cremonesi Politecnico di Milano Universitá della Svizzera Italiana Politecnico di Milano maurizio.ferrari@polimi.it alberto.gasparin@usi.ch paolo.cremonesi@polimi.it ABSTRACT weighting, which can be considered as a generalization of feature An Item based recommender system works by computing a sim- selection, is a useful tool to improve content based algorithms. Tra- ilarity between items, which can exploit past user interactions ditional Information Retrieval methods like TF-IDF and BM25 [10], (collaborative filtering) or item features (content based filtering). while often leading to accuracy improvements, cannot take into Collaborative algorithms have been proven to achieve better rec- account how important are those features from the user point of ommendation quality then content based algorithms in a variety of view. Collaborative Filtering on the other hand determines items scenarios, being more effective in modeling user behaviour. How- similarity by taking into account the user interactions. It is known ever, they can not be applied when items have no interactions at all, that collaborative filtering generally outperforms content based i.e. cold start items. Content based algorithms, which are applicable filtering even when few ratings for each user are available [5]. The to cold start items, often require a lot of feature engineering in main disadvantage of collaborative systems is their inability to order to generate useful recommendations. This issue is specifically compute predictions for new items or new users due to the lack of relevant as the content descriptors become large and heterogeneous. interactions, this problem is referred to as cold-start item. In cold The focus of this paper is on how to use a collaborative models start scenarios only content based algorithm are applicable, this domain-specific knowledge to build a wrapper feature weighting is the case in which improving item description would be most method which embeds collaborative knowledge in a content based beneficial, and is therefore the main focus of this article. algorithm. We present a comparative study for different state of The most common approach to tackle the cold-start item problem the art algorithms and present a more general model. This machine is to rely on content-based algorithms, whose accuracy is sometimes learning approach to feature weighting shows promising results much poorer. In this paper we further investigate the cold-start and high flexibility. item problem focusing on how to learn feature weights able to better represent feature importance from the user point of view. ACM Reference Format: We provide a comparative study of state of the art algorithms and Maurizio Ferrari Dacrema, Alberto Gasparin, and Paolo Cremonesi. 2019. present a more general model demonstrating its applicability on a Deriving item features relevance from collaborative domain knowledge. In Proceedings of Knowledge-aware and Conversational Recommender Systems wide range of collaborative models. Moreover we describe a two (KaRS) Workshop 2018 (co-located with RecSys 2018). ACM, New York, NY, step approach that: USA, 4 pages. (1) exploits the capability of a generic collaborative algorithms to model domain-specific user behaviour and achieve state- 1 INTRODUCTION of-the-art performance for warm items (2) embeds the collaborative knowledge into feature weights Recommender systems aim at guiding the user through the naviga- tion of vast catalogs and in recent years they have become wide- The rest of the paper is organized as follows. In Section 2 we spread. Among item based algorithms, content based are the most briefly review the literature in the cold-start recommendation do- widely used, as they provide good performance and explainability. main, in Section 3 our framework is presented and a comparison Content Based algorithms recommend items based on similarities of the different algorithms is discussed in Section 4. Finally conclu- computed via item attributes. Although being applicable in any sions and future works are highlighted in Section 5 circumstance in which at least some information about the items is available, they suffer from many drawbacks related to the quality of 2 RELATED WORKS the item features. It is hard and expensive to provide an accurate and Various tools are at our disposal to assess the relevance of a feature. exhaustive description of the item. In recent years the amount of We can distinguish feature weighting algorithms in three categories: data which is machine-readable on the web has increased substan- filtering, embedding and wrappers [4]. tially, it is therefore possible to build heterogeneous and complex Filtering methods usually rely on information retrieval. Methods representations for each item (e.g. textual features extracted from like TF-IDF or BM25 are not optimized with respect to a predictive web pages). Those representations however often comprise of high model, therefore the resulting weights are not domain-specific and number of features and are sparse and noisy, to the point where can not take into account the rich collaborative information, even adding new data hampers the recommendation quality. Feature when available. Embedding methods learn feature weights as a part of the model Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 training, examples of this are UFSM [3] and FBSM [11]. Among (co-located with RecSys 2018), October 7, 2018, Vancouver, Canada. embedded methods main drawbacks are a complex training phase Copyright for the individual papers remains with the authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its and noise sensitivity due to the strong coupling of features and editors. interactions. User-Specific Feature-based Similarity Models (UFSM) Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018, Vancouver, Canada. Ferrari D. et al. learns a personalized linear combination of similarity functions train validation test known as global similarity functions and can be considered as a users special case of Factorization Machines. Factorized Bilinear Similarity A B C Models (FBSM) was proposed as an evolution of UFSM and aims to discover relations among item features. The FBSM similarity matrix is computed as follows: warm items sim(i, j) = fiT Wfj (1) where f is the feature vector of the item, W is the matrix of parame- Figure 1: URM split, A and B contains the warm items while ters whose diagonal elements represents how well a feature of item C contains the cold items. i interacts with the same feature of item j, while the off diagonal elements determines the correlation among different features. In order to reduce the number of parameters W is represented as the embedded method to learn the optimal item feature weights that summation of diagonal weights and a low rank approximation of better approximate the item-item collaborative similarity obtained the off-diagonal values: before. W = D + VT V (2) 3.1 Parameter estimation where D is a diagonal matrix having as dimension the number of We solve (4) via SGD applying Adam [6] which is well suited for features, n F , and V ∈ Rn L ×n F . The number of latent factors n L is problems with noisy and sparse gradients. Note that here our goal treated as a parameter. is to find weights that will approximate as well as possible the Wrapper methods rely on a two step approach, by learning fea- collaborative similarity, this is why we optimize MSE and not BPR, ture weights on top of an already available model, an example of which was used in FBSM. The objective function is therefore: this is Least-square Feature Weights (LFW) [1]. LFW learns feature 1 ÕÕ 2 weights from a SLIM similarity matrix using a simpler model with L MS E (D, V ) = ŝi j − siCj F + λ ∥D∥ 2F + β ∥V∥ 2F (5) 2 respect to FBSM. i ∈I j ∈I sim(i, j) = fiT Dfj (3) 4 EVALUATION All these algorithms, to our best knowledge, have never been subject to a comparative study. We performed experiments to confirm that our approach is capable of embedding collaborative knowledge in a content based algo- 3 SIMILARITY BASED FEATURE WEIGHTING rithm improving its recommendation quality in an item cold-start scenario. Feature weighting can be formulated as a minimization problem whose objective function is: 4.1 Dataset 2 In order to evaluate our approach we used only item descriptors argmin S(CF) − S(W) + λ ∥D∥ 2F + β ∥V∥ 2F (4) W F accessible via web, which we assume will be available for new items, excluding user generated content. The datasets are the following: where S(CF) is any item-item collaborative similarity, S(W) is the sim- ilarity function described in Equation (1), W is the feature weight Netflix. Enriched with structured and unstructured attributes matrix which captures the relationships between items features extracted from IMDB. This dataset has 250k users, 6.5k movies, 51k and β and λ are the regularization terms, we call this model Collab- features and 8.8M ratings in 1-5 scale. The rating data is enriched orative boosted Feature Weighting (CFW). This model can either use with 6.6k binary editorial attributes such as director, actor and the latent factors (CFW D+V), as in FBSM, or not (CFW D), as in genres. LFW. The advantages of learning from a similarity matrix, instead of The Movies Database. 1 : 45k movies with 190k TMDB editorial using the user interactions, are several: features and ratings for 270k users. This dataset has been built from the original one by extracting its 70-cores. • High flexibility in choosing the collaborative algorithm which For all the listed datasets, features belonging to less than 5 items can be treated as a black box or more than 30% of the items have been removed, as done in [11]. • Similarity values are less noisy than user interactions • The model is simpler and convergence is faster 4.2 Evaluation procedure In this paper a two steps hybrid method is presented, in order The evaluation procedure consists of two steps. to easily allow to embed domain-specific user behaviour, as rep- resented by a collaborative model, in a weighted content based Step 1 - Collaborative algorithm: In this step the training of recommender. The presented model is easily extendable to other the collaborative algorithm is performed on warm items, which are algorithms and domains. The learning phase is composed by two defined as the union of split A and B, see Figure 1. Hyper-parameters steps. The goal of the first step is to find the optimal parameters for are tuned with a 20% holdout validation. the collaborative algorithm, to this end a collaborative algorithm 1 https://www.kaggle.com/rounakbanik/the-movies-dataset is trained and tuned on warm items. The second step applies an Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018, Deriving item features relevance from collaborative domain knowledge Vancouver, Canada. Netflix The Movies Algorithm Precision Recall MRR MAP NDCG Precision Recall MRR MAP NDCG Content CBF KNN 0.0439 0.0405 0.1177 0.0390 0.0449 0.3885 0.0916 0.6909 0.3166 0.1641 baselines CBF KNN IDF 0.0439 0.0405 0.1177 0.0390 0.0449 0.3931 0.0930 0.6956 0.3215 0.1662 CBF KNN BM25 0.0466 0.0410 0.1237 0.0414 0.0462 0.3931 0.0930 0.6956 0.3215 0.1662 hybrid feature FBSM 0.0244 0.0240 0.0476 0.0162 0.0199 0.2957 0.0727 0.5503 0.2114 0.1192 weights baselines LFW 0.0679 0.0573 0.1632 0.0631 0.0646 0.4135 0.0959 0.7073 0.3442 0.1736 CF KNN 0.0688 0.0597 0.1585 0.0609 0.0645 0.3891 0.0906 0.6939 0.3192 0.1597 P3alpha 0.0714 0.0679 0.1707 0.0664 0.0716 0.3847 0.0882 0.6911 0.3179 0.1578 CFW - D RP3beta 0.0656 0.0624 0.1643 0.0610 0.0669 0.4281 0.1010 0.7233 0.3588 0.1806 SLIM RMSE 0.0643 0.0529 0.1572 0.0583 0.0604 0.4058 0.0923 0.7017 0.3372 0.1669 SLIM BPR 0.0685 0.0539 0.1583 0.0618 0.0619 0.4170 0.0967 0.7218 0.3455 0.1723 Table 1: Performance of CFW and baselines evaluated on cold items. Step 2 - Feature weights: In this case a cold-item validation is Model Precision Recall MRR MAP NDCG chosen as it better represents the end goal of the algorithm to find D+V 0.0244 0.0240 0.0476 0.0162 0.0199 weights that perform well on cold items. The collaborative similarity FBSM D 0.0348 0.0366 0.0954 0.0312 0.0379 will be learned using only split A with the hyper-parameters found V 0.0138 0.0112 0.0247 0.0071 0.0086 in the previous step, see Figure 1. An embedded method is then used to learn the optimal item feature weights that better approximate D+V 0.0475 0.0424 0.1336 0.0430 0.0489 the item-item collaborative similarity obtained before. The hyper- CFW D 0.0635 0.0579 0.1653 0.0602 0.0641 parameters of the machine learning model are tuned using split B V 0.0412 0.0346 0.1146 0.0335 0.0392 while set C is used for pure testing. Table 2: Model component contribution on the result of FBSM and CFW on Netflix. 4.3 Collaborative Similarity The collaborative similarity matrices used in our approach are computed using different algorithms: KNN collaborative (using While LFW only learned feature weights using a SLIM similarity the best-performing similarity among: Cosine, Pearson, Adjusted matrix, our results indicate that it is possible to learn from a wide Cosine, Jaccard), P3alpha [2] (graph based algorithm which models variety of item based algorithms, even those not relying on machine a random walk), Rp3beta [9] (reranked version of P3alpha), SLIM learning. This means that machine learning feature weights can BPR [8] and SLIM MSE [7]. be used on top of already available collaborative algorithm with little effort. Using an intermediate similarity matrix while offering 4.4 Results additional degrees of freedom in the selection of the collabora- Table 1 shows the recommendation quality of both pure content tive model, also simplifies the training phase and improves overall based and hybrid baselines, as well as CFW D evaluated on all performance. collaborative similarity models. Table 2 shows the performance of the two components of CFW and FBSM on Netflix, results for the 5 CONCLUSIONS other dataset are omitted as they behave in the same way. In this paper we presented different state of the art feature weight- From Table 1 we can see that FBSM performs poorly, which ing methods, compared their performance and proposed a more indicates that while it has the power to model complex relations, it general framework to effectively apply machine learning feature is more sensitive to noise and data sparsity than other algorithms. weighting to boost content based algorithms recommendation qual- Learning from a similarity matrix, as LFW and CFW D+V does, ity, embedding user domain-specific behaviour. We also demon- results in much better results than FBSM. In Table 2 it is possible strate high flexibility in the choice of which collaborative algorithm to see that the latent factor component was able to learn very little, to use. Future work directions include testing the proposed ap- this suggests that while rendering the model more expressive, it proach in different datasets and domains, as well as exploring the introduces noise and numerical instability. Note that the perfor- symmetric problem of using the collaborative similarity to discover mance using of the diagonal component alone is higher than the item features or to reduce feature noise. one obtained by adding the V component. However, its effective- ness could be influenced by the feature structure and therefore it REFERENCES might be relevant in some specific cases. rom Table 1 we can see [1] Leonardo Cella, Stefano Cereda, Massimo Quadrana, and Paolo Cremonesi. 2017. that by using only the diagonal and discarding the latent factor Deriving Item Features Relevance from Past User Interactions. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. ACM, component, the performance improves significantly. 275–279. Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018, Vancouver, Canada. Ferrari D. et al. [2] Colin Cooper, Sang Hyuk Lee, Tomasz Radzik, and Yiannis Siantos. 2014. Random [8] Xia Ning and George Karypis. 2011. Slim: Sparse linear methods for top-n walks in recommender systems: exact computation and simulations. In Proceed- recommender systems. In Data Mining (ICDM), 2011 IEEE 11th International ings of the 23rd International Conference on World Wide Web. ACM, 811–816. Conference on. IEEE, 497–506. [3] Asmaa Elbadrawy and George Karypis. 2015. User-Specific Feature-Based Simi- [9] Bibek Paudel, Fabian Christoffel, Chris Newell, and Abraham Bernstein. 2017. larity Models for Top-n Recommendation of New Items. ACM Trans. Intell. Syst. Updatable, Accurate, Diverse, and Scalable Recommendations for Interactive Technol. 6, 3, Article 33 (April 2015), 20 pages. https://doi.org/10.1145/2700495 Applications. ACM Transactions on Interactive Intelligent Systems (TiiS) 7, 1 (2017), [4] Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature 1. selection. Journal of machine learning research 3, Mar (2003), 1157–1182. [10] Stephen Robertson, Hugo Zaragoza, and Michael Taylor. 2004. Simple BM25 [5] Domonkos Tikk IstvÃąn PilÃąszy. 2009. Recommending new movies: even a few extension to multiple weighted fields. In Proceedings of the thirteenth ACM inter- ratings are more valuable than metadata. RecSys 09 Proceedings of the third ACM national conference on Information and knowledge management. ACM, 42–49. conference on Recommender systems, 93 – 100. https://doi.org/10.1145/1639714. [11] Mohit Sharma, Jiayu Zhou, Junling Hu, and George Karypis. 2015. Feature-based 1639731 factorized bilinear similarity model for cold-start top-n item recommendation. [6] Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti- In Proceedings of the 2015 SIAM International Conference on Data Mining. SIAM, mization. arXiv preprint arXiv:1412.6980 (2014). 190–198. [7] Mark Levy and Kris Jack. 2013. Efficient top-n recommendation by linear regres- sion. In RecSys Large Scale Recommender Systems Workshop.