Semantic interpretability of latent factors for recommendation∗ Vito Walter Anelli, Tommaso Di Noia, Eugenio Azzurra Ragone Di Sciascio, Claudio Pomo Independent Researcher Polytechnic University of Bari Milan, Italy Bari, Italy azzurra.ragone@gmail.com firstname.lastname@poliba.it ABSTRACT graph KG, we may build the set of all possible features as F = Model-based approaches to recommendation have proven to be {⟨ρ, ω⟩ | ⟨i, ρ, ω⟩ ∈ KG with i ∈ I }. Each item can be then rep- resented as a vector of weights i = [v (i, ⟨ρ,ω ⟩1 ) , . . . , v (i, ⟨ρ,ω ⟩|F | ) ], very accurate. Unfortunately, exploiting a latent space we miss where v (i, ⟨ρ,ω ⟩) is the generic element computed as the normalized references to the actual semantics of recommended items. In this extended abstract, we show how to initialize latent factors in Fac- TF-IDF value for ⟨ρ, ω⟩. Since the numerator of T F K G can only take values 0 or 1 and, each feature under the root in the denominator torization Machines by using semantic features coming from a has value 0 or 1, v (i, ⟨ρ,ω ⟩) is zero if ⟨ρ, ω⟩ < KG, and otherwise: knowledge graph in order to train an interpretable model. Finally, we introduce and evaluate the semantic accuracy and robustness log |I | − log | ⟨j, ρ, ω ⟩ ∩ K G |j ∈ I | v (i, ⟨ρ, ω ⟩) = r Í (1) for the knowledge-aware interpretability of the model. | { ⟨ρ, ω ⟩ | ⟨i, ρ, ω ⟩ ∈ K G } | ⟨ρ, ω ⟩∈F 1 INTRODUCTION Analogously, when we have a set U of users, we may represent them using the features describing the items they enjoyed in the Transparency and interpretability of predictive models are gaining past. We use f to denote a feature ⟨ρ, ω⟩ ∈ F . Given a user u, if we momentum since they have been recognized as a key element in the denote with I u the set of the items enjoyed by u, we may introduce next generation of recommendation algorithms. When equipped the vector u = [v (u, f1 ) . . . , v (u, f |F | ) ], where v (u, f ) is the generic with interpretability of recommendation results, a system ceases to element computed as: be just a black-box and users are more willing to extensively exploit Í v (i, f ) the predictions [6]. However, powerful and accurate Deep Learn- v (u, f ) = i ∈I u u | {i | i ∈ I and v (i, f ) , 0} | ing or model-based recommendation algorithms and techniques project items and users in a new vector space of latent features thus Given the vectors uj , with j ∈ [1 . . . |U |], and ip , with p ∈ [1 . . . |I |], making the final result not directly interpretable. In the last years, we build a matrix V ∈ Rn× |F | , where n = |U | + |I |: so the first many approaches have been proposed that take advantage of side |U | rows have a one to one mapping with uj while the last ones information to enhance the performance of latent factor models. correspond to ip . In second degree Factorization Machines models Interestingly, in [7] the authors argue about a new generation of the score is computed as: knowledge-aware recommendation engines able to exploit infor- n Õ n Õ Õ n k Õ mation encoded in knowledge graphs KG to produce meaningful ŷ(xui ) = w 0 + wj · xj + x j · xp · v (j, f ) · v (p, f ) (2) j=1 j=1 p=j+1 f =1 recommendations. In this work, we propose a knowledge-aware Hybrid Factorization Machine (kaHFM) to train interpretable models We may see that, for each x, the term nj=1 p=j+1 Í Ín x j ·xp · kf =1 v (j, f ) · Í in recommendation scenarios taking advantage of semantics-aware v (p, f ) is non-zero, i.e., when both x j and xp are equal to 1. In information. kaHFM relies on Factorization Machines (F M) [4] and a recommendation scenario, this happens when there is an in- it extends them in different key aspects by making use of the se- teraction between a user and an item. Moreover, the summation mantic information encoded in a knowledge graph. We show how Ík v ·v represents the dot product between two vectors: kaHFM may exploit data coming from knowledge graphs as side f =1 (j,f ) (p, f ) vj and vp with a size equal to k. Hence, vj represents a latent repre- information to build a recommender system whose final results are sentation of a user, vp that of an item within the same latent space, accurate and, at the same time, semantically interpretable. and their interaction is evaluated through their dot product. In order to inject the knowledge coming from KG into kaHFM, 2 KNOWLEDGE-AWARE HYBRID we set k = |F | in Equation 2. In other words, we impose a number FACTORIZATION MACHINES of latent factors equal to the number of features describing all the In [1], the authors proposed to encode a Linked Data knowledge items in our catalog. Since we formulated our problem as a top- graph in a Vector Space Model (V SM) to develop a Content Based recommender system. Given a set of items I = {i 1 , i 2 , . . . , i N } N recommendation task, kaHFM can be trained using a learning in a catalog and their associated triples ⟨i, ρ, ω⟩ in a knowledge to rank approach like Bayesian Personalized Ranking Criterion (BPR)[5] obtaining V̂. We extract the items vectors vj from V̂, and ∗ An extended version of this work will be presented at the International Semantic Web we use them to implement an Item-kNN recommendation approach. Conference (ISWC 2019)[2] We measure similarities between each pair of items i and j by Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons evaluating the cosine similarity of their corresponding vectors in License Attribution 4.0 International (CC BY 4.0). V̂. In an RDF knowledge graph, we usually find different types of IIR 2019, September 16–18, 2019, Padova, Italy encoded information. We extracted the categorical information that is mainly used to state something about the subject of an entity. 43 IIR 2019, September 16–18, 2019, Padova, Italy V.W. Anelli, et al. Semantics Accuracy SA@M SA@2M SA@3M SA@4M SA@5M F.A. 3 EXPERIMENTAL EVALUATION Yahoo!Movies 0.847 0.863 0.865 0.868 0.873 12.143 0.864 0.883 0.889 0.894 0.899 12.856 We evaluated the performance of our method on two well-known Facebook Movies datasets for recommender systems belonging to movies domain: Table 2: Semantics Accuracy results for different values of Yahoo!Movies12 , and Facebook Movies3 . Experiments were con- M. F.A. denotes the Feature Average number per item. Robustness. We suppose that a particular feature ⟨ρ, ω⟩ is useful ducted adopting the "All Unrated Items" protocol, and an Hold-Out to describe an item i but the corresponding triple ⟨i, ρ, ω⟩ is not 80-20 temporal split [3]. All the items from the datasets come with represented in the knowledge graph. In case kaHFM was robust in a DBpedia link. We retrieved all the ⟨ρ, ω⟩ pairs4 excluding some generating weights for unknown features, it should discover the noisy features (based on the following predicates): owl:sameAs, importance of that feature and modify its value to make it enter dbo:thumbnail, foaf:depiction, prov:wasDerivedFrom, the Top-K features in vi . Starting from this observation, the idea foaf:isPrimaryTopicOf. to measure robustness is then to “forget” a triple involving i and Accuracy Evaluation. The goal of this evaluation is to assess check if kaHFM can generate it. Given a catalog I , we may then if the controlled injection of Linked Data positively affects the define the Robustness for 1 removed feature @M (1-Rob@M) as the training of F M. We compared kaHFM5 w.r.t. a canonical 2 degree number of items for which the removed feature is in Top − M after F M optimized via BPR (BPR-FM). In order to preserve the expres- training. Similarly to SA@nM, we may define 1-Rob@nM. Table 2 siveness of the model, we used the same number of hidden factors 1-Robustness 1-Rob@M1-Rob@2M1-Rob@3M1-Rob@4M1-Rob@5M F.A. as kaHFM. Since we use items similarity in the last step of our ap- Yahoo!Movies 0.487 0.645 0.713 0.756 0.793 12.143 proach, we compared kaHFM against an Attribute Based Item-kNN Facebook Movies 0.821 0.945 0.970 0.980 0.984 12.856 (ABItem-kNN) algorithm, where each item is represented as a vector Table 3: 1-Robustness for different values of M. Column F.A. of weights, computed through a TF-IDF model. We also compared denotes the Feature Average number per item. kaHFM against Item-kNN, and User-kNN based on Cosine Similarity, showed that kaHFM was able to guess 10 on 12 different features Most-Popular, and a knowledge-graph-based V SM adopting the for Yahoo!Movies. In this experiment, we remove one of the ten representation formalted in [1]. To evaluate our approach, we mea- features (thus, based on Table 2, kaHFM will guess an average of sured accuracy through Precision@N , and Normalized Discounted 10 − 1 = 9 features). Since the number of features is 12 we have 3 Cumulative Gain (nDCG@N ). Table 1 shows the corresponding remaining "slots". In Table 3, we measure how often kaHFM is able Facebook Movies Yahoo!Movies to guess the removed feature in these "slots". Categorical Setting (CS) Precision@10 Precision@10 nDCG@10 ABItem-kNN 0.0173∗ 0.0421∗ 0.1174∗ BPR-FM 0.0158∗ 0.0189∗ 0.0344∗ 4 CONCLUSION AND FUTURE WORK 0.0118∗ 0.0154∗ 0.0271∗ MostPopular ItemKnn 0.0262∗ 0.0203∗ 0.0427∗ We have proposed an interpretable method for recommendation UserKnn 0.0168∗ 0.0231∗ 0.0474∗ scenario, kaHFM, in which we bind the meaning of latent factors for a 0.0185∗ 0.0385∗ 0.1129∗ VSM kaHFM 0.0296 0.0524 0.1399 Factorization machine to data coming from a knowledge graph. We Table 1: Accuracy results for Facebook Movies, and considered Categorical information coming from DBpedia and we Yahoo!Movies considering Top-10 recommendations, and a have shown that the generated recommendations are more precise relevance threshold of 4 over 5 stars. and personalized on two different publicly available datasets. We results. We highlight in bold the best result while we underline the showed that the computed features are semantically meaningful, second one. Statistically significant differences in performance are and the model is robust regarding computed features. In the future denoted with a ∗ mark considering Student’s paired t-test with a we want to test the kaHFM performance in classical Information 0.05 level. Retrieval, and knowledge graph completion tasks. Semantic Accuracy. The main idea behind Semantic Accuracy is REFERENCES to evaluate, given an item i, how well kaHFM is able to return its [1] Vito Walter Anelli, Tommaso Di Noia, Pasquale Lops, and Eugenio Di Sciascio. original features available in the computed top-K list vi . In other 2017. Feature Factorization for Top-N Recommendation: From Item Rating to Features Relevance. In Proc. of the 1st Workshop on Intelligent Recommender Systems words, subset i represented by F i = { f 1i , . . . , fmi , . . . f M i }, with by Knowledge Transfer & Learning co-located with ACM Conf. on Recommender F ⊆ F , we check if the values in vi , corresponding to fm,i ∈ F i , i Systems (RecSys 2017), Como, Italy, August 27, 2017. (CEUR Workshop Proceedings), Vol. 1887. CEUR-WS.org, 16–21. are higher than those corresponding to f < F i . For the set of M [2] Vito Walter Anelli, Tommaso Di Noia, Eugenio Di Sciascio, Azzurra Ragone, features initially describing i we see how many of them appear in and Joseph Trotta. 2019. How to make latent factors interpretable by feeding the set top(vi , M) representing the top-M features in vi . We then Factorization machines with knowledge graphs. In The Semantic Web - ISWC 2019 normalize this number by the size of F i and average on all the - 18th International Semantic Web Conference, Auckland, NZ, October 26-30, 2019. [3] Vito Walter Anelli, Tommaso Di Noia, Eugenio Di Sciascio, Azzurra Ragone, and items within the catalog I . Table 2 shows the results for SA@nM Joseph Trotta. 2019. Local Popularity and Time in top-N Recommendation. In with n ∈ {1, 2, 3, 4, 5} and M = 10, and evaluated the number of Advances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14-18, 2019, Proceedings, Part I. 861–868. ground features available in the top-nM elements of vi for each [4] Steffen Rendle. 2010. Factorization machines. In Data Mining (ICDM), 2010 IEEE dataset. 10th Int. Conf. on. IEEE, 995–1000. [5] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Generative Robustness. To check if kaHFM promotes important 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI 2009, features for an item i we proposed a new measure: Generative Proc. of the Twenty-Fifth Conf. on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, June 18-21, 2009. 452–461. 1 Yahoo! Webscope dataset ydata-ymovies-user-movie-ratings-content-v1_0 [6] Markus Zanker. 2012. The influence of knowledgeable explanations on users’ 2 http://research.yahoo.com/Academic_Relations perception of a recommender system. In Sixth ACM Conf. on Recommender Systems, 3 https://2015.eswc-conferences.org/program/semwebeval.html RecSys ’12, Dublin, Ireland, September 9-13, 2012. 269–272. 4 https://github.com/sisinflab/LinkedDatasets/ [7] Yongfeng Zhang and Xu Chen. 2018. Explainable Recommendation: A Survey and 5 https://github.com/sisinflab/HybridFactorizationMachines/ New Perspectives. CoRR abs/1804.11192 (2018). 44