Sparse Embeddings for Recommender Systems with Knowledge Graphs Discussion Paper Vito Walter Anelli1 , Tommaso Di Noia1 , Eugenio Di Sciascio1 , Antonio Ferrara1 and Alberto Mancino1 1 Politecnico di Bari, Bari, Italy Abstract Collaborative filtering models have undoubtedly dominated the scene of recommender systems. However, these methods do not take into account valuable item characteristics. On the other side, content-based algorithms only use this kind of information and may fail to generalize. Some collaborative filtering tech- niques have recently used side information about items, but they end up being huge models using thou- sands of features for modeling a single user-item interaction. In this paper, we present KGFlex, a sparse and expressive model based on feature embeddings. KGFlex studies which features are considered by each user when consuming an item. Then, it models each user-item interaction as a factorized entropy-driven com- bination of the only item features relevant to the user. An extensive experimental evaluation shows the approach’s effectiveness, considering the recommendation results’ accuracy, diversity, and induced bias. Keywords recommender systems, knowledge graphs, feature embeddings, feature factorization 1. Introduction The outstanding accuracy of collaborative filtering techniques has undoubtedly helped rec- ommender systems getting famous. However, these methods are based on the simple idea to recommend certain items since "similar users have experienced those items", or "other users, who have experienced the same items, have also experienced those items". On the contrary, content-based recommendation algorithms aim to recommend new items that share the same patterns of features of items liked in the past. The use of content features can make the model interpretable [1] but these techniques may fail to recommend items that have different charac- teristics with respect to the items enjoyed in the past. To get the benefits of the two approaches and mitigate their drawbacks, scientists worked to integrate into collaborative filtering the side information used in content-based approaches such as tags [2], demographic data [3], structured knowledge [4]. However, this may lead to very large models that need to take into account hundreds or thousands of features for predicting user-item interactions. In this work, we introduce KGFlex, a knowledge-aware recommendation system, that tackles this issue with a sparse and expressive model based on feature embeddings. KGFlex describes the catalog using features extracted from publicly available knowledge graphs, one of the most impact- ful and relevant sources for knowledge-aware recommender systems. Then, low-dimensionality 11th Italian Information Retrieval Workshop (IIR 2021), September 13–15, 2021, Bari, Italy " antonio.ferrara@poliba.it (A. Ferrara) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) embeddings are adopted to represent the semantic item features. Using an entropy-based strat- egy, KGFlex analyzes the users’ history to study the user-specific decision-making process of consuming or not consuming an item. Thus, the subsets of item features relevant to the user in her decision-making process are adopted to model the user-item interaction. To evaluate the performance of KGFlex, we conduct extensive experiments on two different publicly available datasets. We evaluate the accuracy and diversity of recommendation results and analyze whether the algorithm produces biased recommendations. The results show that KGFlex has competitive accuracy performance and, at the same time, generates highly diversified recommendations with a low induced bias. 2. Basics of KGFlex KGFlex exploits the knowledge encoded in a knowledge graph as side information to characterize both items and users. One of the main assumptions is that users decide to enjoy an item based on a subset of its characteristics, implying that not all the item features are equally important. In the following, we show how KGFlex describes each user and item with a set of features. Taking a cue from information theory KGFlex exploits the notion of information gain to measure the relevance of a feature for a user in deciding to consume or not an item. From 𝒦𝒢𝑠 to Decision-Making. A knowledge graph 𝒦𝒢 can be represented as a set of triples where entities are linked to each other by binary relations. Each connection in 𝒦𝒢 is then a 𝜌 triple 𝜎 − → 𝜔, where 𝜎 is a subject entity, 𝜌 is a relation (predicate), and 𝜔 is an object entity. If we consider chains of predicates that connect two entities at a higher depth, a 𝑛-hop predicate 𝜌1 𝜌2 𝜌𝑛 can be defined as 𝜌 = ⟨𝜌1 ,...,𝜌𝑛 ⟩ if 𝜎 −→ 𝜔1 −→ ... −→ 𝜔𝑛 ∈ 𝒦𝒢. For convenience, ℎ(𝜌) = 𝑛 for 𝜌 𝜌:𝜎 −→ 𝜔𝑛 ∈ 𝒦𝒢 denotes the depth of the predicate chain. When no confusion arises, from now 𝜌 on we will use 𝜎 −→ 𝜔 to denote a generic chain with ℎ(𝜌) ∈ {1,...,𝑛}. Given a collection of items ℐ and a knowledge graph 𝒦𝒢, we assume each element in 𝑖 ∈ ℐ has a mapping to a corresponding entity in 𝒦𝒢. Under this assumption, an item 𝑖 can be explored, (𝑛) at depth 𝑛, to identify the set ℱ𝑖 of the semantic features describing it: (𝑛) 𝜌 ℱ𝑖 = {⟨𝜌,𝜔⟩ | 𝑖 − → 𝜔 ∈ 𝒦𝒢 ,ℎ(𝜌) ∈ {1,...,𝑛}}. (1) (𝑛) We describe each user 𝑢 ∈ 𝒰 with the set ℱ𝑢 = 𝑖∈ℐ𝑢 ℱ𝑖 , i.e. all the features representing ⋃︀ the items ℐ𝑢 ⊆ ℐ enjoyed by 𝑢. Finally, we define the overall set of features in the system as (𝑛) ℱ (𝑛) = 𝑖∈ℐ ℱ𝑖 of features in the system: In the following, the (𝑛) superscript is omitted ⋃︀ whenever it is not relevant in the context. Once items and users have been associated with their set of features, we use the notion of information gain to measure the importance of each feature for a user in deciding whether to consume or not consume an item, i.e. in distinguishing positive from negative items in the dataset. Indeed, given a dataset 𝒟 with a certain extent of entropy (uncertainty) on the target attribute, the information gain 𝐼𝐺(𝒟,𝑥𝑑 ) measures the expected reduction in information entropy obtained from the observation of the value of the 𝑑-th attribute of a sample x. To this aim, we build, for each user 𝑢, a balanced dataset 𝒟𝑢 with all the consumed items from ℐ𝑢 and the same amount of negative items randomly picked up from 𝑣∈𝒰 ,𝑣̸=𝑢 ℐ𝑣 ∖ ℐ𝑢 . For each of these positive and ⋃︀ negative items, 𝒟𝑢 is provided with a sample whose attributes correspond to the features in ℱ𝑢 and indicate the presence (𝑓 = 1) or the absence (𝑓 = 0) of the corresponding feature 𝑓 in ℱ𝑖 . Therefore, the attribute 𝑓 provides an information gain in distinguishing positive from negative samples equal to 𝐼𝐺(𝒟𝑢 ,𝑓 ) = 1−𝐻(𝒟𝑢 |𝑓 = 1)−𝐻(𝒟𝑢 |𝑓 = 0). We finally associate a weight 𝑘𝑢𝑓 = 𝐼𝐺(𝒟𝑢 ,𝑓 ) to each pair of user 𝑢 and feature 𝑓 to represent the influence of a feature —in the view of the user— in the prediction of user-item interactions. Sparse Embeddings. KGFlex models the features in ℱ as collaboratively learned embeddings in a latent space. Since KGFlex promotes the idea of having user fine-tuned versions of the same model, we have both a global representation of the features in ℱ and a personal view, for each user 𝑢, of the features in ℱ𝑢 ⊆ ℱ. Notably, the model is structured into two distinct parts. On the one hand, KGFlex keeps a set 𝒢 of global trainable embeddings and biases shared among all the users, with 𝒢 = {(g𝑓 ∈ R𝐸 ,𝑏𝑓 ∈ R), ∀𝑓 ∈ ℱ }. On the other hand, each user in KGFlex also has his/her personal representation of the features he/she interacted with, i.e., the features in ℱ𝑢 . These embeddings are collected within the set 𝒫 𝑢 , defined as 𝒫 𝑢 = {p𝑢𝑓 ∈ R𝐸 , ∀𝑓 ∈ ℱ𝑢 }. Then, the inner product between the personal representation p𝑓𝑢 and the global representation g𝑓 , plus a bias value 𝑏𝑓 , estimates the affinity of user 𝑢 to feature 𝑓 . The sum of such affinities for all the features in ℱ𝑢𝑖 = ℱ𝑢 ∩ℱ𝑖 , weighted according to the pre-computed entropy-based coefficients, estimates the interaction 𝑥 ˆ 𝑢𝑖 between user 𝑢 and item 𝑖: ∑︁ 𝑥 ˆ 𝑢𝑖 = 𝑘𝑢𝑓 (p𝑢𝑓 g𝑓 +𝑏𝑓 ). (2) 𝑓 ∈ℱ𝑢𝑖 Eq. (2) encodes the strategy KGFlex exploits to handle the features: it takes advantage of user profile to involve only a small subset of them in the estimate of the user-item affinity. To learn the model parameters, KGFlex adopts Bayesian Personalized Ranking (BPR), the most common pair-wise Learning to Rank strategy, that, ∑︀ given a training set 𝒯 = {(𝑢,𝑖+ ,𝑖− ) | 𝑖+ ∈ ℐ𝑢 ∧ 𝑖 ∈ ℐ∖ℐ𝑢 , ∀𝑢 ∈ 𝒰 }, optimizes the loss 𝐿 = (𝑢,𝑖+ ,𝑖− )∈𝒯 ln 𝜎(𝑥 − ˆ 𝑢𝑖+ − 𝑥ˆ 𝑢𝑖− ), with the assumption that a user 𝑢 prefers a consumed item 𝑖 over a non-consumed item 𝑖− . + 3. Exploratory Evaluation Experimental Setting. The evaluation of the performance of KGFlex is conducted on two well- known datasets: Yahoo! Movies and Facebook Books. The datasets have been binarized, retaining ratings of 3 or higher, and have been preprocessed with iterative 10-core and 5-core, respectively. The semantic features have been retrieved through a 2-depth exploration of the DBpedia 𝒦𝒢, removing some useless features [5]. Finally, we removed the features associated with less than ten items, and we kept the user’s 100 most informative features from the 1- and 2- hop exploration. We compare KGFlex with BPR-MF [6], a latent factor model based on the same pair-wise optimization criterion used in KGFlex, a batch version of Rendle et al. [7] MF, NeuMF [8], and kaHFM [4], a factorization-based model making use of knowledge graphs. For the sake of reproducibility, we provide our code and all the details about the experiments1 . 1 https://split.to/kgflex Table 1 Comparison of KGFlex with baselines. The best result is in boldface, the second-best result is underlined. For all the metrics, the cutoff is 10. Yahoo! Movies Facebook Books nDCG IC Gini ACLT PopREO PopRSP nDCG IC Gini ACLT PopREO PopRSP BPR-MF 0.1857 151 0.0219 0.0006 0.9954 0.9999 0.0947 17 0.0132 0.0000 1.0000 1.0000 MF 0.2897 455 0.0902 0.0823 0.8735 0.9865 0.0956 87 0.0238 0.0000 1.0000 1.0000 NeuMF 0.0918 50 0.0113 0.0006 1.0000 0.9999 0.0714 17 0.0125 0.0000 1.0000 1.0000 kaHFM 0.3006 757 0.1659 0.4624 0.7610 0.9234 0.1267 540 0.1387 0.3294 0.8766 0.9420 KGFlex 0.2464 851 0.2802 2.1447 0.4477 0.6336 0.0853 606 0.3070 3.0264 0.1521 0.4485 We have measured the recommendation accuracy with nDCG [9]. We have also evaluated the diversity, adopting Item Coverage (IC) [10] and Gini Index (Gini) [11]. Finally, three bias metrics have been used to evaluate how the algorithms consider the items from the long-tail: ACLT [12], PopREO and PopRSP, specific applications of RSP, and REO [13]. PopREO estimates the equal opportunity of items, encouraging the True Positive Rate of popular and unpopular items to be the same. PopRSP measures statistical parity, assessing whether the ranking probability distributions for popular and unpopular items are the same in the recommendation. Main Results. Table 1 depicts the evaluation outcome for the aforementioned metrics with a cut- off of 10. For Yahoo! Movies, KGFlex is outperformed exclusively by kaHFM and MF, but continues to show acceptable accuracy results. It is noteworthy that KGFlex significantly outperforms BPR- MF, albeit both are learned with a pair-wise BPR optimization, hence underlining the beneficial role of the extracted knowledge. Moreover, examining the item coverage and Gini values, we note the high degree of personalization provided by KGFlex. We link this result to the personalized view of the knowledge granted by the framework. Moreover, in KGFlex the collaborative signal on explicit user interests ensures to recommend diverse items among the ones sharing characteristics of inter- est for the user. The aforementioned behavior is not confirmed in Facebook Books. Indeed, the ac- curacy results seem to remain below the performance of other approaches. However, the diversity results show how BPR-MF, MF, and NeuMF may have been flooded by popularity signal, which led them to perform poorly regarding the item coverage and Gini metrics. Instead, KGFlex does not suf- fer from this problem and approaches the superior performance of Item-kNN in terms of diversity. Oftentimes, recommender systems fail to recommend unpopular items, which tend to remain underrepresented [14], thus causing a fairness issue for items and an inappropriate recommen- dation for users who do not prefer very popular items. From Table 1, it is noteworthy that KGFlex always outperforms all the other factorization-based approaches and generally outperforms the other approaches. The value of ACLT (the higher the better) is comparable with the value obtained by VSM. This result is further supported by the values of PopREO and PopRSP (the smaller the better). Concerning those metrics, KGFlex and VSM continue to grant the less biased recommendations. Interestingly, while both exploit the same optimization criterion, we notice how KGFlex consistently improves BPR-MF, which is known to be vulnerable to imbalanced data and to produce biased recommendations [13]. References [1] Y. Zhang, X. Chen, Explainable recommendation: A survey and new perspectives, CoRR abs/1804.11192 (2018). URL: http://arxiv.org/abs/1804.11192. arXiv:1804.11192. [2] Y. Zhu, Z. Guan, S. Tan, H. Liu, D. Cai, X. He, Heterogeneous hypergraph embed- ding for document recommendation, Neurocomputing 216 (2016) 150–162. URL: https://doi.org/10.1016/j.neucom.2016.07.030. doi:10.1016/j.neucom.2016.07.030. [3] W. X. Zhao, S. Li, Y. He, L. Wang, J. Wen, X. Li, Exploring demographic information in social media for product recommendation, Knowl. Inf. Syst. 49 (2016) 61–89. URL: https://doi.org/10.1007/s10115-015-0897-5. doi:10.1007/s10115-015-0897-5. [4] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ragone, J. Trotta, How to make latent factors inter- pretable by feeding factorization machines with knowledge graphs, in: C. Ghidini, O. Hartig, M. Maleshkova, V. Svátek, I. F. Cruz, A. Hogan, J. Song, M. Lefrançois, F. Gandon (Eds.), The Semantic Web - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part I, volume 11778 of Lecture Notes in Computer Science, Springer, 2019, pp. 38–56. URL: https://doi.org/10.1007/978-3-030-30793-6_3. doi:10.1007/978-3-030-30793-6\_3. [5] T. Di Noia, C. Magarelli, A. Maurino, M. Palmonari, A. Rula, Using ontology-based data sum- marization to develop semantics-aware recommender systems, in: A. Gangemi, R. Navigli, M. Vidal, P. Hitzler, R. Troncy, L. Hollink, A. Tordai, M. Alam (Eds.), The Semantic Web - 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3-7, 2018, Proceedings, volume 10843 of Lecture Notes in Computer Science, Springer, 2018, pp. 128–144. URL: https://doi.org/10.1007/978-3-319-93417-4_9. doi:10.1007/978-3-319-93417-4\_9. [6] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, BPR: bayesian personalized ranking from implicit feedback, in: J. A. Bilmes, A. Y. Ng (Eds.), UAI 2009, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, June 18-21, 2009, AUAI Press, 2009, pp. 452–461. URL: https://dslpitt.org/uai/ displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=1630&proceeding_id=25. [7] S. Rendle, W. Krichene, L. Zhang, J. R. Anderson, Neural collaborative filtering vs. matrix factorization revisited, in: R. L. T. Santos, L. B. Marinho, E. M. Daly, L. Chen, K. Falk, N. Koenigstein, E. S. de Moura (Eds.), RecSys 2020: Fourteenth ACM Conference on Rec- ommender Systems, Virtual Event, Brazil, September 22-26, 2020, ACM, 2020, pp. 240–248. URL: https://doi.org/10.1145/3383313.3412488. doi:10.1145/3383313.3412488. [8] X. He, T. Chua, Neural factorization machines for sparse predictive analytics, in: N. Kando, T. Sakai, H. Joho, H. Li, A. P. de Vries, R. W. White (Eds.), Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017, ACM, 2017, pp. 355–364. URL: https://doi.org/10.1145/3077136.3080777. doi:10.1145/3077136.3080777. [9] W. Krichene, S. Rendle, On sampled metrics for item recommendation, in: R. Gupta, Y. Liu, J. Tang, B. A. Prakash (Eds.), KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Dis- covery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020, ACM, 2020, pp. 1748– 1757. URL: https://doi.org/10.1145/3394486.3403226. doi:10.1145/3394486.3403226. [10] G. Adomavicius, Y. Kwon, Improving aggregate recommendation diversity using ranking-based techniques, IEEE Trans. Knowl. Data Eng. 24 (2012) 896–911. URL: https://doi.org/10.1109/TKDE.2011.15. doi:10.1109/TKDE.2011.15. [11] P. Castells, N. J. Hurley, S. Vargas, Novelty and diversity in recommender sys- tems, in: F. Ricci, L. Rokach, B. Shapira (Eds.), Recommender Systems Handbook, Springer, 2015, pp. 881–918. URL: https://doi.org/10.1007/978-1-4899-7637-6_26. doi:10.1007/978-1-4899-7637-6\_26. [12] H. Abdollahpouri, R. Burke, B. Mobasher, Managing popularity bias in recommender systems with personalized re-ranking, in: R. Barták, K. W. Brawner (Eds.), Proceedings of the Thirty-Second International Florida Artificial Intelligence Research Society Conference, Sarasota, Florida, USA, May 19-22 2019, AAAI Press, 2019, pp. 413–418. URL: https://aaai.org/ocs/index.php/FLAIRS/FLAIRS19/paper/view/18199. [13] Z. Zhu, J. Wang, J. Caverlee, Measuring and mitigating item under-recommendation bias in personalized ranking systems, in: J. Huang, Y. Chang, X. Cheng, J. Kamps, V. Murdock, J. Wen, Y. Liu (Eds.), Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020, ACM, 2020, pp. 449–458. URL: https://doi.org/10.1145/3397271.3401177. doi:10.1145/3397271.3401177. [14] H. Abdollahpouri, M. Mansoury, R. Burke, B. Mobasher, The unfairness of popularity bias in recommendation, in: R. Burke, H. Abdollahpouri, E. C. Malthouse, K. P. Thai, Y. Zhang (Eds.), Proceedings of the Workshop on Recommendation in Multi-stakeholder Environments co-located with the 13th ACM Conference on Recommender Systems (RecSys 2019), Copenhagen, Denmark, September 20, 2019, volume 2440 of CEUR Workshop Proceedings, CEUR-WS.org, 2019. URL: http://ceur-ws.org/Vol-2440/paper4.pdf.