Sparse Embeddings
for Recommender Systems with Knowledge Graphs
Discussion Paper

Vito Walter Anelli1 ,
Tommaso Di Noia1 , Eugenio Di Sciascio1 , Antonio Ferrara1 and Alberto Mancino1
1
    Politecnico di Bari, Bari, Italy


                                         Abstract
                                         Collaborative filtering models have undoubtedly dominated the scene of recommender systems. However,
                                         these methods do not take into account valuable item characteristics. On the other side, content-based
                                         algorithms only use this kind of information and may fail to generalize. Some collaborative filtering tech-
                                         niques have recently used side information about items, but they end up being huge models using thou-
                                         sands of features for modeling a single user-item interaction. In this paper, we present KGFlex, a sparse and
                                         expressive model based on feature embeddings. KGFlex studies which features are considered by each user
                                         when consuming an item. Then, it models each user-item interaction as a factorized entropy-driven com-
                                         bination of the only item features relevant to the user. An extensive experimental evaluation shows the
                                         approach’s effectiveness, considering the recommendation results’ accuracy, diversity, and induced bias.

                                         Keywords
                                         recommender systems, knowledge graphs, feature embeddings, feature factorization


1. Introduction
The outstanding accuracy of collaborative filtering techniques has undoubtedly helped rec-
ommender systems getting famous. However, these methods are based on the simple idea to
recommend certain items since "similar users have experienced those items", or "other users,
who have experienced the same items, have also experienced those items". On the contrary,
content-based recommendation algorithms aim to recommend new items that share the same
patterns of features of items liked in the past. The use of content features can make the model
interpretable [1] but these techniques may fail to recommend items that have different charac-
teristics with respect to the items enjoyed in the past. To get the benefits of the two approaches
and mitigate their drawbacks, scientists worked to integrate into collaborative filtering the side
information used in content-based approaches such as tags [2], demographic data [3], structured
knowledge [4]. However, this may lead to very large models that need to take into account
hundreds or thousands of features for predicting user-item interactions.
   In this work, we introduce KGFlex, a knowledge-aware recommendation system, that tackles
this issue with a sparse and expressive model based on feature embeddings. KGFlex describes the
catalog using features extracted from publicly available knowledge graphs, one of the most impact-
ful and relevant sources for knowledge-aware recommender systems. Then, low-dimensionality
11th Italian Information Retrieval Workshop (IIR 2021), September 13–15, 2021, Bari, Italy
" antonio.ferrara@poliba.it (A. Ferrara)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
embeddings are adopted to represent the semantic item features. Using an entropy-based strat-
egy, KGFlex analyzes the users’ history to study the user-specific decision-making process of
consuming or not consuming an item. Thus, the subsets of item features relevant to the user in
her decision-making process are adopted to model the user-item interaction.
  To evaluate the performance of KGFlex, we conduct extensive experiments on two different
publicly available datasets. We evaluate the accuracy and diversity of recommendation results
and analyze whether the algorithm produces biased recommendations. The results show that
KGFlex has competitive accuracy performance and, at the same time, generates highly diversified
recommendations with a low induced bias.


2. Basics of KGFlex
KGFlex exploits the knowledge encoded in a knowledge graph as side information to characterize
both items and users. One of the main assumptions is that users decide to enjoy an item based
on a subset of its characteristics, implying that not all the item features are equally important.
In the following, we show how KGFlex describes each user and item with a set of features. Taking
a cue from information theory KGFlex exploits the notion of information gain to measure the
relevance of a feature for a user in deciding to consume or not an item.
From 𝒦𝒢𝑠 to Decision-Making. A knowledge graph 𝒦𝒢 can be represented as a set of triples
where entities are linked to each other by binary relations. Each connection in 𝒦𝒢 is then a
         𝜌
triple 𝜎 −
         → 𝜔, where 𝜎 is a subject entity, 𝜌 is a relation (predicate), and 𝜔 is an object entity. If
we consider chains of predicates that connect two entities at a higher depth, a 𝑛-hop predicate
                                          𝜌1    𝜌2     𝜌𝑛
can be defined as 𝜌 = ⟨𝜌1 ,...,𝜌𝑛 ⟩ if 𝜎 −→ 𝜔1 −→ ... −→ 𝜔𝑛 ∈ 𝒦𝒢. For convenience, ℎ(𝜌) = 𝑛 for
     𝜌
𝜌:𝜎 −→ 𝜔𝑛 ∈ 𝒦𝒢 denotes the depth of the predicate chain. When no confusion arises, from now
                  𝜌
on we will use 𝜎 −→ 𝜔 to denote a generic chain with ℎ(𝜌) ∈ {1,...,𝑛}.
   Given a collection of items ℐ and a knowledge graph 𝒦𝒢, we assume each element in 𝑖 ∈ ℐ has
a mapping to a corresponding entity in 𝒦𝒢. Under this assumption, an item 𝑖 can be explored,
                                 (𝑛)
at depth 𝑛, to identify the set ℱ𝑖 of the semantic features describing it:
                            (𝑛)               𝜌
                          ℱ𝑖      = {⟨𝜌,𝜔⟩ | 𝑖 −
                                               → 𝜔 ∈ 𝒦𝒢 ,ℎ(𝜌) ∈ {1,...,𝑛}}.                      (1)
                                                             (𝑛)
We describe each user 𝑢 ∈ 𝒰 with the set ℱ𝑢 = 𝑖∈ℐ𝑢 ℱ𝑖 , i.e. all the features representing
                                                    ⋃︀
the items ℐ𝑢 ⊆ ℐ enjoyed by 𝑢. Finally, we define the overall set of features in the system as
                (𝑛)
ℱ (𝑛) = 𝑖∈ℐ ℱ𝑖 of features in the system: In the following, the (𝑛) superscript is omitted
        ⋃︀
whenever it is not relevant in the context.
   Once items and users have been associated with their set of features, we use the notion of
information gain to measure the importance of each feature for a user in deciding whether to
consume or not consume an item, i.e. in distinguishing positive from negative items in the dataset.
Indeed, given a dataset 𝒟 with a certain extent of entropy (uncertainty) on the target attribute, the
information gain 𝐼𝐺(𝒟,𝑥𝑑 ) measures the expected reduction in information entropy obtained
from the observation of the value of the 𝑑-th attribute of a sample x. To this aim, we build, for
each user 𝑢, a balanced dataset 𝒟𝑢 with all the consumed items from ℐ𝑢 and the same amount
of negative items randomly picked up from 𝑣∈𝒰 ,𝑣̸=𝑢 ℐ𝑣 ∖ ℐ𝑢 . For each of these positive and
                                               ⋃︀
negative items, 𝒟𝑢 is provided with a sample whose attributes correspond to the features in ℱ𝑢
and indicate the presence (𝑓 = 1) or the absence (𝑓 = 0) of the corresponding feature 𝑓 in ℱ𝑖 .
Therefore, the attribute 𝑓 provides an information gain in distinguishing positive from negative
samples equal to 𝐼𝐺(𝒟𝑢 ,𝑓 ) = 1−𝐻(𝒟𝑢 |𝑓 = 1)−𝐻(𝒟𝑢 |𝑓 = 0).
   We finally associate a weight 𝑘𝑢𝑓 = 𝐼𝐺(𝒟𝑢 ,𝑓 ) to each pair of user 𝑢 and feature 𝑓 to represent
the influence of a feature —in the view of the user— in the prediction of user-item interactions.
Sparse Embeddings. KGFlex models the features in ℱ as collaboratively learned embeddings
in a latent space. Since KGFlex promotes the idea of having user fine-tuned versions of the same
model, we have both a global representation of the features in ℱ and a personal view, for each
user 𝑢, of the features in ℱ𝑢 ⊆ ℱ. Notably, the model is structured into two distinct parts. On
the one hand, KGFlex keeps a set 𝒢 of global trainable embeddings and biases shared among all
the users, with 𝒢 = {(g𝑓 ∈ R𝐸 ,𝑏𝑓 ∈ R), ∀𝑓 ∈ ℱ }. On the other hand, each user in KGFlex also
has his/her personal representation of the features he/she interacted with, i.e., the features in ℱ𝑢 .
These embeddings are collected within the set 𝒫 𝑢 , defined as 𝒫 𝑢 = {p𝑢𝑓 ∈ R𝐸 , ∀𝑓 ∈ ℱ𝑢 }. Then,
the inner product between the personal representation p𝑓𝑢 and the global representation g𝑓 , plus
a bias value 𝑏𝑓 , estimates the affinity of user 𝑢 to feature 𝑓 . The sum of such affinities for all the
features in ℱ𝑢𝑖 = ℱ𝑢 ∩ℱ𝑖 , weighted according to the pre-computed entropy-based coefficients,
estimates the interaction 𝑥 ˆ 𝑢𝑖 between user 𝑢 and item 𝑖:
                                             ∑︁
                                     𝑥
                                     ˆ 𝑢𝑖 =      𝑘𝑢𝑓 (p𝑢𝑓 g𝑓 +𝑏𝑓 ).                                   (2)
                                           𝑓 ∈ℱ𝑢𝑖

Eq. (2) encodes the strategy KGFlex exploits to handle the features: it takes advantage of user
profile to involve only a small subset of them in the estimate of the user-item affinity.
  To learn the model parameters, KGFlex adopts Bayesian Personalized Ranking (BPR), the most
common pair-wise Learning to Rank strategy, that, ∑︀   given a training set 𝒯 = {(𝑢,𝑖+ ,𝑖− ) | 𝑖+ ∈
ℐ𝑢 ∧ 𝑖 ∈ ℐ∖ℐ𝑢 , ∀𝑢 ∈ 𝒰 }, optimizes the loss 𝐿 = (𝑢,𝑖+ ,𝑖− )∈𝒯 ln 𝜎(𝑥
       −                                                                  ˆ 𝑢𝑖+ − 𝑥ˆ 𝑢𝑖− ), with the
assumption that a user 𝑢 prefers a consumed item 𝑖 over a non-consumed item 𝑖− .
                                                     +


3. Exploratory Evaluation
Experimental Setting. The evaluation of the performance of KGFlex is conducted on two well-
known datasets: Yahoo! Movies and Facebook Books. The datasets have been binarized, retaining
ratings of 3 or higher, and have been preprocessed with iterative 10-core and 5-core, respectively.
The semantic features have been retrieved through a 2-depth exploration of the DBpedia 𝒦𝒢,
removing some useless features [5]. Finally, we removed the features associated with less than ten
items, and we kept the user’s 100 most informative features from the 1- and 2- hop exploration.
   We compare KGFlex with BPR-MF [6], a latent factor model based on the same pair-wise
optimization criterion used in KGFlex, a batch version of Rendle et al. [7] MF, NeuMF [8], and
kaHFM [4], a factorization-based model making use of knowledge graphs. For the sake of
reproducibility, we provide our code and all the details about the experiments1 .
    1
        https://split.to/kgflex
Table 1
Comparison of KGFlex with baselines. The best result is in boldface, the second-best result is underlined.
For all the metrics, the cutoff is 10.
                          Yahoo! Movies                                  Facebook Books
          nDCG     IC    Gini ACLT PopREO PopRSP nDCG              IC    Gini ACLT PopREO PopRSP
BPR-MF 0.1857 151 0.0219 0.0006           0.9954    0.9999 0.0947 17 0.0132 0.0000       1.0000    1.0000
MF     0.2897 455 0.0902 0.0823           0.8735    0.9865 0.0956 87 0.0238 0.0000       1.0000    1.0000
NeuMF 0.0918 50 0.0113 0.0006             1.0000    0.9999 0.0714 17 0.0125 0.0000       1.0000    1.0000
kaHFM 0.3006 757 0.1659 0.4624            0.7610    0.9234 0.1267 540 0.1387 0.3294      0.8766    0.9420
KGFlex 0.2464 851 0.2802 2.1447          0.4477    0.6336 0.0853 606 0.3070 3.0264      0.1521    0.4485


  We have measured the recommendation accuracy with nDCG [9]. We have also evaluated the
diversity, adopting Item Coverage (IC) [10] and Gini Index (Gini) [11]. Finally, three bias metrics
have been used to evaluate how the algorithms consider the items from the long-tail: ACLT [12],
PopREO and PopRSP, specific applications of RSP, and REO [13]. PopREO estimates the equal
opportunity of items, encouraging the True Positive Rate of popular and unpopular items to be the
same. PopRSP measures statistical parity, assessing whether the ranking probability distributions
for popular and unpopular items are the same in the recommendation.
Main Results. Table 1 depicts the evaluation outcome for the aforementioned metrics with a cut-
off of 10. For Yahoo! Movies, KGFlex is outperformed exclusively by kaHFM and MF, but continues
to show acceptable accuracy results. It is noteworthy that KGFlex significantly outperforms BPR-
MF, albeit both are learned with a pair-wise BPR optimization, hence underlining the beneficial role
of the extracted knowledge. Moreover, examining the item coverage and Gini values, we note the
high degree of personalization provided by KGFlex. We link this result to the personalized view of
the knowledge granted by the framework. Moreover, in KGFlex the collaborative signal on explicit
user interests ensures to recommend diverse items among the ones sharing characteristics of inter-
est for the user. The aforementioned behavior is not confirmed in Facebook Books. Indeed, the ac-
curacy results seem to remain below the performance of other approaches. However, the diversity
results show how BPR-MF, MF, and NeuMF may have been flooded by popularity signal, which led
them to perform poorly regarding the item coverage and Gini metrics. Instead, KGFlex does not suf-
fer from this problem and approaches the superior performance of Item-kNN in terms of diversity.
   Oftentimes, recommender systems fail to recommend unpopular items, which tend to remain
underrepresented [14], thus causing a fairness issue for items and an inappropriate recommen-
dation for users who do not prefer very popular items. From Table 1, it is noteworthy that KGFlex
always outperforms all the other factorization-based approaches and generally outperforms
the other approaches. The value of ACLT (the higher the better) is comparable with the value
obtained by VSM. This result is further supported by the values of PopREO and PopRSP (the
smaller the better). Concerning those metrics, KGFlex and VSM continue to grant the less biased
recommendations. Interestingly, while both exploit the same optimization criterion, we notice
how KGFlex consistently improves BPR-MF, which is known to be vulnerable to imbalanced data
and to produce biased recommendations [13].
References
 [1] Y. Zhang, X. Chen, Explainable recommendation: A survey and new perspectives, CoRR
     abs/1804.11192 (2018). URL: http://arxiv.org/abs/1804.11192. arXiv:1804.11192.
 [2] Y. Zhu, Z. Guan, S. Tan, H. Liu, D. Cai, X. He, Heterogeneous hypergraph embed-
     ding for document recommendation, Neurocomputing 216 (2016) 150–162. URL:
     https://doi.org/10.1016/j.neucom.2016.07.030. doi:10.1016/j.neucom.2016.07.030.
 [3] W. X. Zhao, S. Li, Y. He, L. Wang, J. Wen, X. Li, Exploring demographic information
     in social media for product recommendation, Knowl. Inf. Syst. 49 (2016) 61–89. URL:
     https://doi.org/10.1007/s10115-015-0897-5. doi:10.1007/s10115-015-0897-5.
 [4] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ragone, J. Trotta, How to make latent factors inter-
     pretable by feeding factorization machines with knowledge graphs, in: C. Ghidini, O. Hartig,
     M. Maleshkova, V. Svátek, I. F. Cruz, A. Hogan, J. Song, M. Lefrançois, F. Gandon (Eds.), The
     Semantic Web - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New
     Zealand, October 26-30, 2019, Proceedings, Part I, volume 11778 of Lecture Notes in Computer
     Science, Springer, 2019, pp. 38–56. URL: https://doi.org/10.1007/978-3-030-30793-6_3.
     doi:10.1007/978-3-030-30793-6\_3.
 [5] T. Di Noia, C. Magarelli, A. Maurino, M. Palmonari, A. Rula, Using ontology-based data sum-
     marization to develop semantics-aware recommender systems, in: A. Gangemi, R. Navigli,
     M. Vidal, P. Hitzler, R. Troncy, L. Hollink, A. Tordai, M. Alam (Eds.), The Semantic Web - 15th
     International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3-7, 2018, Proceedings,
     volume 10843 of Lecture Notes in Computer Science, Springer, 2018, pp. 128–144. URL:
     https://doi.org/10.1007/978-3-319-93417-4_9. doi:10.1007/978-3-319-93417-4\_9.
 [6] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, BPR: bayesian personalized
     ranking from implicit feedback, in: J. A. Bilmes, A. Y. Ng (Eds.), UAI 2009, Proceedings
     of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC,
     Canada, June 18-21, 2009, AUAI Press, 2009, pp. 452–461. URL: https://dslpitt.org/uai/
     displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=1630&proceeding_id=25.
 [7] S. Rendle, W. Krichene, L. Zhang, J. R. Anderson, Neural collaborative filtering vs. matrix
     factorization revisited, in: R. L. T. Santos, L. B. Marinho, E. M. Daly, L. Chen, K. Falk,
     N. Koenigstein, E. S. de Moura (Eds.), RecSys 2020: Fourteenth ACM Conference on Rec-
     ommender Systems, Virtual Event, Brazil, September 22-26, 2020, ACM, 2020, pp. 240–248.
     URL: https://doi.org/10.1145/3383313.3412488. doi:10.1145/3383313.3412488.
 [8] X. He, T. Chua, Neural factorization machines for sparse predictive analytics, in:
     N. Kando, T. Sakai, H. Joho, H. Li, A. P. de Vries, R. W. White (Eds.), Proceedings of the
     40th International ACM SIGIR Conference on Research and Development in Information
     Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017, ACM, 2017, pp. 355–364. URL:
     https://doi.org/10.1145/3077136.3080777. doi:10.1145/3077136.3080777.
 [9] W. Krichene, S. Rendle, On sampled metrics for item recommendation, in: R. Gupta, Y. Liu,
     J. Tang, B. A. Prakash (Eds.), KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Dis-
     covery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020, ACM, 2020, pp. 1748–
     1757. URL: https://doi.org/10.1145/3394486.3403226. doi:10.1145/3394486.3403226.
[10] G. Adomavicius, Y. Kwon, Improving aggregate recommendation diversity using
     ranking-based techniques, IEEE Trans. Knowl. Data Eng. 24 (2012) 896–911. URL:
     https://doi.org/10.1109/TKDE.2011.15. doi:10.1109/TKDE.2011.15.
[11] P. Castells, N. J. Hurley, S. Vargas, Novelty and diversity in recommender sys-
     tems, in: F. Ricci, L. Rokach, B. Shapira (Eds.), Recommender Systems Handbook,
     Springer, 2015, pp. 881–918. URL: https://doi.org/10.1007/978-1-4899-7637-6_26.
     doi:10.1007/978-1-4899-7637-6\_26.
[12] H. Abdollahpouri, R. Burke, B. Mobasher, Managing popularity bias in recommender
     systems with personalized re-ranking, in: R. Barták, K. W. Brawner (Eds.), Proceedings
     of the Thirty-Second International Florida Artificial Intelligence Research Society
     Conference, Sarasota, Florida, USA, May 19-22 2019, AAAI Press, 2019, pp. 413–418. URL:
     https://aaai.org/ocs/index.php/FLAIRS/FLAIRS19/paper/view/18199.
[13] Z. Zhu, J. Wang, J. Caverlee, Measuring and mitigating item under-recommendation bias
     in personalized ranking systems, in: J. Huang, Y. Chang, X. Cheng, J. Kamps, V. Murdock,
     J. Wen, Y. Liu (Eds.), Proceedings of the 43rd International ACM SIGIR conference on
     research and development in Information Retrieval, SIGIR 2020, Virtual Event, China,
     July 25-30, 2020, ACM, 2020, pp. 449–458. URL: https://doi.org/10.1145/3397271.3401177.
     doi:10.1145/3397271.3401177.
[14] H. Abdollahpouri, M. Mansoury, R. Burke, B. Mobasher, The unfairness of popularity
     bias in recommendation, in: R. Burke, H. Abdollahpouri, E. C. Malthouse, K. P. Thai,
     Y. Zhang (Eds.), Proceedings of the Workshop on Recommendation in Multi-stakeholder
     Environments co-located with the 13th ACM Conference on Recommender Systems
     (RecSys 2019), Copenhagen, Denmark, September 20, 2019, volume 2440 of CEUR Workshop
     Proceedings, CEUR-WS.org, 2019. URL: http://ceur-ws.org/Vol-2440/paper4.pdf.