=Paper= {{Paper |id=Vol-2290/kars2018_paper3 |storemode=property |title=Computing Recommendations via a Knowledge Graph-aware Autoencoder |pdfUrl=https://ceur-ws.org/Vol-2290/kars2018_paper3.pdf |volume=Vol-2290 |authors=Vito Bellini,Angelo Schiavone,Tommaso Di Noia,Azzurra Ragone,Eugenio Di Sciascio |dblpUrl=https://dblp.org/rec/conf/recsys/BelliniSNRS18a }} ==Computing Recommendations via a Knowledge Graph-aware Autoencoder== https://ceur-ws.org/Vol-2290/kars2018_paper3.pdf

Computing recommendations via a Knowledge Graph-aware
Autoencoder
Vito Bellini⋆, Angelo Schiavone⋆, Tommaso Di Noia⋆,
Azzurra Ragone• , Eugenio Di Sciascio⋆
⋆ Polytechnic University of Bari

Bari - Italy
firstname.lastname@poliba.it
• Independent Researcher

azzurra.ragone@gmail.com

ABSTRACT Another technology that surely boosted the development of a
In the last years, deep learning has shown to be a game-changing new generation of smarter and more accurate recommender sys-
technology in artificial intelligence thanks to the numerous suc- tems is deep learning [2]. Starting from the basic notion of an
cesses it reached in diverse application fields. Among others, the artificial neural net (ANN), several configurations of deep ANN
use of deep learning for the recommendation problem, although have been proposed over the years, such as autoencoders.
new, looks quite promising due to its positive performances in In this paper, we show how autoencoders technology can benefit
terms of accuracy of recommendation results. In a recommendation from the existence of a Knowledge Graph to create a representation
setting, in order to predict user ratings on unknown items a possi- of a user profile that can be eventually exploited to predict ratings
ble configuration of a deep neural network is that of autoencoders for unknown items. The main intuition behind the approach is that
typically used to produce a lower dimensionality representation of both ANN and Knowledge Graph expose a graph-based structure.
the original data. In this paper we present KG-AUTOENCODER, an Hence, we may imagine building the topology of the inner layers
autoencoder that bases the structure of its neural network on the in the ANN by mimicking that of a Knowledge Graph.
semantics-aware topology of a knowledge graph thus providing a la- The remainder of this paper is structured as follows: in the next
bel for neurons in the hidden layer that are eventually used to build section, we discuss related works on recommender systems exploit-
a user profile and then compute recommendations. We show the ef- ing deep learning, knowledge graphs and Linked Open Data. Then,
fectiveness of KG-AUTOENCODER in terms of accuracy, diversity the basic notions of the technologies we adopted are introduced
and novelty by comparing with state of the art recommendation in Section 3. The proposed recommendation model is described in
algorithms. Section 4 while in Section 5 we present the experimental setting
and evaluation. Conclusions and Future Work close the paper.
ACM Reference Format:
Vito Bellini⋆ , Angelo Schiavone⋆ , Tommaso Di Noia⋆ ,, Azzurra Ragone• ,
Eugenio Di Sciascio⋆ . 2019. Computing recommendations via a Knowledge
2 RELATED WORK
Graph-aware Autoencoder. In Proceedings of Knowledge-aware and Conver-
sational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys Autoencoders and Deep Learning for RS. The adoption of deep
2018). ACM, New York, NY, USA, 7 pages. learning techniques is for sure one of the main advances of the
last years in the field of recommender systems. In [23], the authors
1 INTRODUCTION propose the usage of a denoising autoencoder performs a top-N
Recommender systems (RS) have become pervasive tools we expe- recommendation task by exploiting a corrupted version of the input
rience in our everyday life. While browsing a catalog of items RSs data.
exploit users’ past preferences in order to suggest new items they A pure Collaborative-Filtering (CF) model based on autoencoders
might be interested in. is described in [18], in which the authors develop both user-based
Knowledge Graphs have been recently adopted to represent and item-based autoencoders to tackle the recommendation task.
items, compute their similarity and relatedness [6] as well as to feed Stacked Denoising Autoencoders are combined with collaborative
Content-Based (CB) and hybrid recommendation engines [15]. The filtering techniques in [20] where the authors leverage autoen-
publication and spread of freely available Knowledge Graphs in the coders to get a smaller and non-linear representation of the users-
form of Linked Open Data datasets, such as DBpedia [1], has paved items interactions. This representation is eventually used to feed
the way to the development of knowledge-aware recommendation a deep neural network which can alleviate the cold-start problem
engines in many application domains and, still, gives the possibility thanks to the integration of side information. A hybrid recom-
to easily switch from a domain to another one by just feeding the mender system is finally built.
system with a different subset of the original graph. Wang et al. [22] suggest to apply deep learning methods on side
information to reduce the sparsity of the rating matrix in collab-
Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 orative approaches. In [21] the authors propose a deep learning
(co-located with RecSys 2018), October 7, 2018, Vancouver, Canada.
2018. ACM ISBN Copyright for the individual papers remains with the authors. Copying approach to build a high-dimensional semantic space based on
permitted for private and academic purposes. This volume is published and copyrighted the substitutability of items; then, a user-specific transformation
by its editors..
is learned in order to get a ranking of items from such a space.
Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Vancouver, Canada. V. Bellini et al.

Analysis about the impact of deep learning on both recommenda- Post-apocalyptic_films
tion quality and system scalability are presented in [7], where the t su
bjec bje
c
authors first represent users and items through a rich feature set su t

made on different domains and then map them to a latent space.
Finally, a content-based recommender system is built. 12_Monkeys Cloud_Atlas_(film)
Knowledge Graphs and Linked Open Data for RS. Several
r
works have been proposed exploiting side information coming Ghostwritten ec
to
dir
from knowledge graphs and Linked Open Data (LOD) to enhance
the performance of recommender systems. Most of them rely on
the usage of DBpedia as knowledge graph. In [10], for the very first

genre
The_Wachowskis
time, a LOD-based recommender system is proposed to alleviate cre
ato
r
some of the major problems that affect collaborative techniques
genre
mainly the high sparsity of the user-item matrix. The effectiveness Drama Sense8
of such an approach seems to be confirmed by a large number of
methods that have been proposed afterward. A detailed review of
Figure 1: Part of knowledge-graph.
LOD-based recommender systems is presented in [3]. By leverag-
ing the knowledge encoded in DBpedia, it is possible to build an
accurate content-based recommender system [5]. In the meantime, some communities started to build their own
KGs; among them, the most important is for sure DBpedia2 . We may
3 BACKGROUND TECHNOLOGIES investigate relationships among entities in the KG by exploiting
graph data sources and hence discover meaningful paths within
the graph.
Autoencoders. An Artificial Neural Network (ANN) is a math-
In figure 1 we show an excerpt of the DBpedia graph, involving
ematical model used to learn the relationships which underlie in
some entities in the movie domain. Interestingly we see that DB-
a given set of data. Starting from them, after a training phase, an
pedia encodes both factual information, e.g. “Cloud_Atlas_(film) has
ANN can be used to predict a single value or a vector, for regression
director The_Wachowskis”, and categorical one such as “Cloud_Atlas_(film)
or classification tasks.
has subject Post-apocalyptic_films”.
Basically, an ANN consists of a bunch of nodes, called neurons,
distributed among three different kinds of layers: the input layer,
one or more hidden layers, and the output layer. Typically, a neuron
4 SEMANTICS-AWARE AUTOENCODERS FOR
of a layer is connected to all the neurons of the next layer, making RATING PREDICTION
the ANN a fully connected network. The main idea of our approach is to map the connections in a
Autoencoders are ANNs which try to set the output values equal to knowledge graph (KG) with those between units from layer i to
the input ones, modeling an approximation of the identity function layer i+1, as shown in Figure 2. There we see that we injected
y = f (xx ) = x . Roughly, they are forced to predict the same values only categorical information in the autoencoder and we left out
they are fed with. Therefore, the number of output units and one factual one. As a matter of fact, if we analyze these two kinds of
of the input nodes is the same, i.e. |xx | = |yy |. The aim of such a task information in DBpedia we may notice that:
is to obtain a new representation of the original data based on the • the quantity of categorical information is higher than the
values of the hidden layer neurons. In fact, each of these layers factual one. If we consider movies, the overall number of
projects the input data in a new Euclidean space whose dimensions entities they are related with is lower than the overall number
depend on the number of the nodes in the hidden layer. of categories;
Therefore, when we use an autoencoder, we are not interested • categorical information is more distributed over the items
at its output, but at the encoded representation it computes: in this than the factual one. Going back to movies we see that they
way, we can leverage the implicit knowledge behind the original are more connected with each other via categories than via
data, performing the so-called feature extraction task. The actual other entities.
meaning of each dimension (represented by hidden nodes) in the Hence, we may argue that for a recommendation task where we are
new space is unknown, but we can be sure that they are based on looking for commonalities among items, categorical data may result
latent patterns binding the training cases. in more meaningful than the factual one. The main assumption
KG. In 2012, Google announced its Knowledge Graph1 as a new behind this choice is that, for instance, if a user rated positively
tool to improve the identification and retrieval of entities in return Cloud_Atlas this may be interpreted as a positive rating for the
to a search query. A Knowledge Graph is a form of representation connected category Post-apocalyptic_films.
of knowledge through a semantic (labelled) network that allows a In order to test our assumption, we mapped the autoencoder
system to store the human knowledge in a structured format well network topology with the categorical information related to items
understandable by a computer agent. rated by users. As we build a different autoencoder for each user
depending on the items she rated in the past, the mapping with
1 https://googleblog.blogspot.it/2012/05/introducing-knowledge-graph-things-not.
html 2 http://dbpedia.org
Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Computing recommendations via a Knowledge Graph-aware Autoencoder Vancouver, Canada.

a KG makes the hidden layer of variable length in the number of Our autoencoder is, therefore, able to learn the semantics behind
units, depending on how much categorical information is available the ratings of each user and weight them through backpropagation.
for items rated by the specific user. In our current implementation we used the well known sigmoid
σ (x) = 1+e1 −x activation function since we normalized the design
matrix to be within [0, 1]. We trained each autoencoder for 10,000
epochs with a learning rate of r = 0.03; weights are initialized to
Kung fu films
ct sub zero close values as Xavier et al. suggest in [9].
je ject
sub Starting from the trained autoencoder, we may build a user pro-
file by considering the categories associated to the items she rated
5.0 The Matrix subje ct
The Matrix 5.0
ct subje in the past as features and by assigning them a value according
Post-apocalyptic to the weights associated to the edges entering the corresponding
films
t

su
ec

hidden units. Given a user u, the weight associated to a feature
b
bj

jec
su

c is then the summation of the weights w ku (c) associated to the
t
4.0 The Karate
Kid (film 2010) subje ct
The Karate
Kid (film 2010) 4.0 edges entering the hidden node representing the Knowledge Graph
ct subje
category c after training the autoencoder with the ratings of u.
Chinese films More formally, we have:
sub
ject ject
sub
n(c)|
|IÕ
ω u (c) = w ku (c)
2.0 Astro Boy (film) Astro Boy (film) 2.0
k =1
subje ct
c t Computer- subje
animated films where In(c) is the set of the edges entering the node representing
the feature c. We remember that since the autoencoder is not fully
connected, |In(c)| varies depending on the related connections to
Input Layer Hidden Layer Output Layer the category c in the knowledge graph.
By means of the weights associated with each feature, we can
now model a user profile composed by a vector of weighted cat-
Figure 2: Architecture of a semantic autoencoder.
egorical features. Given F u as the set of categories belonging to
all the items rated by u and F = u ∈U F u as the set of all features
Ð
Let n be the number of items rated by u available in the graph and among all the users in the system we have for each user u ∈ U and
Ci = {c i1 , c i2 , . . . , c im } be the set of m categorical nodes associated for each feature c ∈ F :
in the KG to the item i. Then, F u = ni=1 Ci is the set of features
Ð
mapped into the hidden layer for the user and the overall number P(u) = {⟨c, ω⟩ | ω = ω u (c) if c ∈ F u }
of hidden units is equal to |F u |. Once the neural network setup is Considering that users provide a different number of ratings, we
done, the training process takes place, feeding the neural network have an unbalanced distribution in the dimension of user profiles.
with ratings provided by the user, normalized in the interval [0,1]. Moreover, as a user usually rate only a small subset of the entire
It is worth noticing that, as the autoencoder, we build to mimic the catalog, we have a massive number of missing features belonging
structure of the connections available in the Knowledge Graph, the to items not rated by u. In order to compute values associated
neural network we build is not fully connected. Moreover, it does with missing features, we leverage an unsupervised deep learning
not need bias nodes because these latter are not representative of model inspired by the word2vec approach [13]. It is an efficient
any semantic data in the graph. technique originally conceived to compute word embeddings (i.e.,
Nodes in the hidden layer correspond to categorical information numerical representations of words) by capturing the semantic
in the knowledge graph. At every iteration of the training process, distribution of textual words in a latent space starting from their
backpropagation will change weights accordingly on edges among distribution within the sentences composing the original text. Given
units in the layers, such that the sum of entering edges in an output a corpus, e.g., an excerpt from a book, it projects each word in a
unit will reconstruct the user rating for the item represented by multidimensional space such that words similar from a semantic
that unit. Regarding the nodes in the hidden layer, we may interpret point of view result closer to each other. In this way, we are able to
the sum of the weights associated to entering edges computed at evaluate the semantic similarity between two words even if they
the end of the training process as the importance of that feature never appear in the same sentence. Given a sequence of words
in the generation of an output which, in our case, are the ratings [x 1 , . . . , x n ] within a window, word2vec compute the probability
provided by the user. for a new word x ′ to be next one in the sequence. More formally, it
computes p(x ′ | [x 1 , . . . , x n ]).
4.1 User Profiles In our scenario, we may imagine replacing sentences represented
Once the network converges we have a latent representation of by sequences of words with user profiles represented by sequences
features associated with a user profile together with their weights. of categories in c ∈ F u and then use the word2vec approach to
However, very interestingly, this time the features represented by compute for a given user u the weight of missing features c ′ < F u .
nodes in the hidden layer also have an explicit meaning as they We need to prepare the user profiles P(u) to be processed by
are in a one to one mapping with categories in a knowledge graph. word2vec. Hence, we first generate a corpus made of sequences of
Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Vancouver, Canada. V. Bellini et al.

ordered features where the order is given by ω. The very prelim- where r (v j , i) is the rating assigned to i by the user v j . We use then
inary step is that of selecting an order among elements c ∈ F u ratings from Equation (1) to provide top-N recommendation for
which results coherently for all u ∈ U thus moving from the set each user.
P(u) to a representative sequence of elements s(u).
For each ⟨c, ω⟩ ∈ P(u) we create a corresponding pair ⟨c, norm(ω)⟩ 5 EXPERIMENTS
with norm being the mapping function In this section, we present the experimental evaluations performed
norm : [0, 1] 7→ {0.1, 0.2, 0.3, . . . , 1} on three different datasets. We first describe the structure of the
datasets used in the experiments and the evaluation protocol and
that linearly maps3 a value in the interval [0, 1] to a real value in then we move to the metrics adopted for the evaluation and the
the set {0.1, 0.2, 0.3, . . . , 1}. The new pairs form the set discussion of obtained results.
P norm (u) = {⟨c, norm(ω)⟩ | ⟨c, ω⟩ ∈ P(u)} Our experiments can be reproduced through the implementation
available on our public repository4 .
For each normalized user profile set P norm (u) we then build the
corresponding sequence
5.1 Dataset
s(u) = [. . . , ⟨c i , norm(ωiu )⟩, . . . ⟨c j , norm(ω uj )⟩, . . .] In order to validate our approach we performed experiments on
with ωiu ≥ ω uj . the three datasets summarized in Table 1.
Once we have the set S = {s(u) | u ∈ U } we can feed the
word2vec algorithm with this corpus in order to find patterns of #users #items #ratings sparsity
features according to their distribution across all users. In the pre- MovieLens 20M 138,493 26,744 20,000,263 99.46%
diction phase, by using each user’s sequence of features s(u) as Amazon Digital Music 478,235 266,414 836,006 99.99%
input for the trained word2vec model, we estimate the probability LibraryThing 7,279 37,232 626,000 99.77%
of ⟨c ′, norm(ω ′ )⟩ ∈ v ∈U P norm (v) − P norm (u) to belong to the
Ð
Table 1: Datasets
given context, or rather to be relevant for u. In other words, we
compute p(⟨c ′, norm(ω ′ )⟩ | s(u)).
It is worth noticing that given c ′ ∈ F u we may have multiple
pairs with c ′ as first element in v ∈U P norm (v) −P norm (u). For in-
Ð
In our experiments, we referred to the freely available knowledge
stance, given the category dbc:Kung_fu_films we may have both graph of DBpedia5 . The mapping contains 22,959 mapped items for
⟨dbc : Kung_fu_films, 0.2⟩ and ⟨dbc : Kung_fu_films, 0.5⟩, with MovieLens 20M6 , 4,077 items mapped for Amazon Digital Music7
the corresponding probabilities p(⟨dbc : Kung_fu_films, 0.2⟩ | s(u)), and 9,926 items mapped for LibraryThing8 . For our experiments,
p(⟨dbc : Kung_fu_films, 0.5⟩ | s(u)). Still, as we want to add the we removed from the datasets all the items without a mapping in
category dbc:Kung_fu_films together with its corresponding weight DBpedia.
only once in the user profile we select only the pair with the highest
probability. The new user profile is then 5.2 Evaluation protocol
norm
P̂(u) = P(u)∪{⟨c, ω⟩ | argmax p(⟨c, ω⟩ | s(u)) and ⟨c, ω⟩ < P (u)}Here, we show how we evaluated the performances of our methods
ω ∈ {0.1, ...,1} in recommending items. We split the dataset using Hold-Out 80/20,
We point out that while the original P(u) is built by exploiting ensuring that every user has 80% of their ratings in the training
only content-based information, the enhanced user profile P̂(u) set and the remaining 20% in the test set. For the evaluation of our
also considers collaborative information as it based also on the set approach we adopted the "all unrated items" protocol described
S containing a representation for the profiles of all the users in U . in [19]: for each user u, a top-N recommendation list is provided
by computing a score for every item i not rated by u, whether i
4.2 Computing Recommendations appears in the user test set or not. Then, recommendation lists are
compared with the test set by computing both performance and
Given the user profiles represented as vectors of weighted features,
diversity metrics such as Precision, Recall, F-1 score, nDCG [12],
recommendations are then computed by using a well-known k-
aggregate diversity, and Gini index as a measure of sales diversity
nearest neighbors approach. User similarities are found through
[8].
projecting their user profile in a Vector Space Model, and then
similarities between each pair of users u and v is computed using
5.3 Results Discussion
the cosine similarity.
For each user u we find the top-k similar neighbors to infer In our experiments, we compared our approach with three differ-
the rate r for the item i as the weighted average rate that the ent states of the art techniques widely used in recommendation
neighborhood gave to it: scenarios: BPRMF, WRMF and a single-layer autoencoder for rat-
ing prediction. BPRMF [17] is a Matrix Factorization algorithm
Ík
j=1 sim(u, v j ) · r (v j , i) 4 https://github.com/sisinflab/SEMAUTO-2.0
r (u, i) = Ík (1) 5 https://dbpedia.org
j=1 sim(u, v j ) 6 https://grouplens.org/datasets/movielens/20m/
7 http://jmcauley.ucsd.edu/data/amazon/
3 In our current implementation we use a standard minmax normalization. 8 https://www.librarything.com
Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Computing recommendations via a Knowledge Graph-aware Autoencoder Vancouver, Canada.

which leverages Bayesian Personalized Ranking as the objective applications of autoencoders to feature selection, we compute a
function. WRMF [11, 16] is a Weighted Regularized Matrix Factor- latent representation of items but, in our case, we attach an explicit
ization method which exploits users’ implicit feedbacks to provide semantics to selected features. This allows our system to exploit
recommendations. In their basic version, both strategies rely ex- both the power of deep learning techniques and, at the same time,
clusively on the User-Item matrix in a pure collaborative filtering to have a meaningful and understandable representation of the
approach. They can be hybridized by exploiting side information, trained model. We used our approach to autoencode user ratings in
i.e. additional data associated with items. In our experiments, we a recommendation scenario via the DBpedia knowledge graph and
adopted categorical information found on the DBpedia Knowledge proposed an algorithm to compute user profiles then adopted to pro-
Graph as side information. We used the implementations of BPRMF vide recommendations based on the semantic features we extract
and WRMF available in MyMediaLite9 and implemented the au- with our autoencoder. Experimental results show that we are able to
toencoder in Keras10 . We verified the statistical significance of our outperform state of the art recommendation algorithms in terms of
experiments by using the Wilcoxon Signed Rank test we get a accuracy and diversity. Furthermore, we will compare our approach
p-value very close to zero, which ensures the validity of our results. with other competitive baselines as suggested in more recent works
In Table 3 we report the results gathered on the three datasets by [14]. The results presented in this paper pave the way to various
applying the methods discussed above. As for our approach KG- further investigations in different directions. From a methodolog-
AUTOENCODER, we tested it for a different number of neighbors ical and algorithmic point of view, we can surely investigate the
by varying k. augmentation of further deep learning techniques via the injection
In terms of accuracy, we can see that KG-AUTOENCODER outper- of explicit and structured knowledge coming from external sources
forms our baselines on both MovieLens 20M and Amazon Digital of information. Giving an explicit meaning to neurons in an ANN as
Music datasets, while on LibraryThing the achieved results are well as to their connections can fill the semantic gap in describing
quite the same. In particular, on the LibraryThing dataset, only the models trained via deep learning algorithms. Moreover, having an
fully-connected autoencoder performs better than our approach explicit representation of latent features opens the door to a better
with regard to accuracy. and explicit user modeling. We are currently investigating how to
Concerning diversity, we get much better results on all the datasets. exploit the structure of a Knowledge Graph-enabled autoencoder
Furthermore, by analyzing the gathered results, it seems that our to infer qualitative preferences represented by means of expressive
approach provides very discriminative descriptions for each user, languages such as CP-theories [4]. Providing such a powerful rep-
letting us identify the most effective neighborhood and compute resentation may also result in being a key factor in the automatic
both accurate and diversified recommendations. As a matter of fact, generation of explanation to recommendation results.
we achieve the same results in terms of accuracy as the baselines
by suggesting much more items. REFERENCES
As shown in Table 2, KG-AUTOENCODER performs better on [1] Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak,
those datasets whose items can be associated with a large amount and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In Proceed-
of categorical information, which implies the usage of many hidden ings of the 6th International The Semantic Web and 2Nd Asian Conference on Asian
Semantic Web Conference (ISWC’07/ASWC’07). Springer-Verlag, 722–735.
units. This occurs because very complex functions can be modeled [2] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks
by ANNs if enough hidden units are provided, as Universal Ap- for youtube recommendations. In Proceedings of the 10th ACM Conference on
Recommender Systems. ACM, 191–198.
proximation Theorem points out. For this reason, our approach [3] Marco de Gemmis, Pasquale Lops, Cataldo Musto, Fedelucio Narducci, and Gio-
turned out to work better on MovieLens 20M dataset (whose related vanni Semeraro. 2015. Semantics-Aware Content-Based Recommender Systems.
neural networks have a high number of hidden units) rather than In Recommender Systems Handbook. 119–159.
[4] Tommaso Di Noia, Thomas Lukasiewicz, Maria Vanina Martínez, Gerardo I.
the others. In particular, the experiments on LibraryThing dataset Simari, and Oana Tifrea-Marciuska. 2015. Combining Existential Rules with the
show that the performances get worse as the number of the neurons Power of CP-Theories. In Proceedings of the Twenty-Fourth International Joint
decreases, i.e. available categories are not enough. Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July
25-31, 2015. 2918–2925.
[5] Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, and
avg #features std avg #features/avg #items Markus Zanker. 2012. Linked Open Data to Support Content-based Recommender
Systems. In Proceedings of the 8th International Conference on Semantic Systems
Movielens 20M 1015.87 823.26 8.82 (I-SEMANTICS ’12). ACM, New York, NY, USA, 1–8.
Amazon Digital Music 7.22 9.77 5.17 [6] T. Di Noia, V.C. Ostuni, J. Rosati, P. Tomeo, E. Di Sciascio, R. Mirizzi, and C.
LibraryThing 206.88 196.64 1.96 Bartolini. 2016. Building a relatedness graph from Linked Open Data: A case
study in the IT domain. Expert Systems with Applications 44 (2016), 354–366.
Table 2: Summary of hidden units for mapped items only. [7] Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. 2015. A Multi-View Deep
Learning Approach for Cross Domain User Modeling in Recommendation Sys-
tems. In Proceedings of the 24th International Conference on World Wide Web
(WWW ’15). International World Wide Web Conferences Steering Committee,
Republic and Canton of Geneva, Switzerland, 278–288.
[8] Daniel M Fleder and Kartik Hosanagar. 2007. Recommender systems and their
6 CONCLUSION AND FUTURE WORK impact on sales diversity. In Proceedings of the 8th ACM conference on Electronic
commerce. ACM, 192–199.
In this paper, we have presented a recommendation approach that [9] Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training
combines the computational power of deep learning with the rep- deep feedforward neural networks. In Proceedings of the thirteenth international
resentational expressiveness of knowledge graphs. As for classical conference on artificial intelligence and statistics. 249–256.
[10] Benjamin Heitmann and Conor Hayes. 2010. C.: Using linked data to build open,
9 http://mymedialite.net collaborative recommender systems. In In: AAAI Spring Symposium: Linked Data
10 https://keras.io Meets Artificial IntelligenceâĂŹ. (2010.
Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Vancouver, Canada. V. Bellini et al.
k F1 Prec. Recall nDCG Gini aggrdiv
MOVIELENS 20M
AUTOENCODER − 0.21306 0.21764 0.20868 0.24950 0.01443 1587
BPRMF − 0.14864 0.15315 0.14438 0.17106 0.00375 3263
BPRMF + SI − 0.16838 0.17112 0.16572 0.19500 0.00635 3552
WRMF − 0.19514 0.19806 0.19231 0.22768 0.00454 766
WRMF + SI − 0.19494 0.19782 0.19214 0.22773 0.00450 759
5 0.18857 0.18551 0.19173 0.21941 0.01835 5214
10 0.21268 0.21009 0.21533 0.24945 0.01305 3350
20 0.22886 0.22684 0.23092 0.27147 0.01015 2417
KG-AUTOENCODER
40 0.23675 0.23534 0.23818 0.28363 0.00827 1800
50 0.23827 0.23686 0.23970 0.28605 0.00780 1653
100 0.23961 0.23832 0.24090 0.28924 0.00662 1310
AMAZON DIGITAL MUSIC
AUTOENCODER − 0.00060 0.00035 0.00200 0.00102 0.33867 3559
BPRMF − 0.01010 0.00565 0.04765 0.02073 0.00346 539
BPRMF + SI − 0.00738 0.00413 0.03480 0.01624 0.06414 2374
WRMF − 0.02189 0.01236 0.09567 0.05511 0.01061 103
WRMF + SI − 0.02151 0.01216 0.09325 0.05220 0.01168 111
5 0.01514 0.00862 0.06233 0.04365 0.03407 3378
10 0.01920 0.01091 0.07994 0.05421 0.05353 3449
20 0.02233 0.01267 0.09385 0.06296 0.08562 3523
KG-AUTOENCODER
40 0.02572 0.01460 0.10805 0.06980 0.14514 3549
50 0.02618 0.01486 0.10974 0.07032 0.17192 3549
100 0.02835 0.01608 0.11964 0.07471 0.24859 3448
LIBRARYTHING
AUTOENCODER − 0.01562 0.01375 0.01808 0.01758 0.07628 2328
BPRMF − 0.01036 0.00954 0.01134 0.01001 0.06764 3140
BPRMF + SI − 0.01065 0.00994 0.01148 0.01041 0.10753 4946
WRMF − 0.01142 0.01071 0.01223 0.01247 0.00864 439
WRMF + SI − 0.01116 0.01030 0.01217 0.01258 0.00868 442
5 0.00840 0.00764 0.00931 0.00930 0.13836 4895
10 0.01034 0.00930 0.01163 0.01139 0.07888 3558
20 0.01152 0.01029 0.01310 0.01248 0.04586 2245
KG-AUTOENCODER
40 0.01195 0.01073 0.01347 0.01339 0.02800 1498
50 0.01229 0.01110 0.01378 0.01374 0.02403 1312
100 0.01278 0.01136 0.01461 0.01503 0.01521 873
Table 3: Experimental Results

[11] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for [15] S. Oramas, V.C. Ostuni, T. Di Noia, X. Serra, and E. Di Sciascio. 2016. Sound and
Implicit Feedback Datasets. In Proceedings of the 2008 Eighth IEEE International music recommendation with knowledge graphs. ACM Transactions on Intelligent
Conference on Data Mining (ICDM ’08). IEEE Computer Society, Washington, DC, Systems and Technology 8, 2 (2016).
USA, 263–272. [16] Rong Pan, Yunhong Zhou, Bin Cao, Nathan N. Liu, Rajan Lukose, Martin Scholz,
[12] Kalervo Järvelin and Jaana Kekäläinen. 2000. IR Evaluation Methods for Retriev- and Qiang Yang. 2008. One-Class Collaborative Filtering. In Proceedings of the 2008
ing Highly Relevant Documents. In Proceedings of the 23rd Annual International Eighth IEEE International Conference on Data Mining (ICDM ’08). IEEE Computer
ACM SIGIR Conference on Research and Development in Information Retrieval Society, Washington, DC, USA, 502–511.
(SIGIR ’00). ACM, New York, NY, USA, 41–48. [17] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.
[13] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings
2013. Distributed Representations of Words and Phrases and their Composi- of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI ’09).
tionality. In Advances in Neural Information Processing Systems 26: 27th Annual AUAI Press, Arlington, Virginia, United States, 452–461.
Conference on Neural Information Processing Systems 2013. Proceedings of a meeting [18] Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015.
held December 5-8, 2013, Lake Tahoe, Nevada, United States. 3111–3119. AutoRec: Autoencoders Meet Collaborative Filtering. In Proceedings of the 24th
[14] Cataldo Musto, Tiziano Franza, Giovanni Semeraro, Marco de Gemmis, and International Conference on World Wide Web (WWW ’15 Companion). ACM, New
Pasquale Lops. 2018. Deep Content-based Recommender Systems Exploiting York, NY, USA, 111–112.
Recurrent Neural Networks and Linked Open Data. In Adjunct Publication of the [19] Harald Steck. 2013. Evaluation of Recommendations: Rating-prediction and
26th Conference on User Modeling, Adaptation and Personalization (UMAP ’18). Ranking. In Proceedings of the 7th ACM Conference on Recommender Systems
ACM, New York, NY, USA, 239–244. (RecSys ’13). ACM, New York, NY, USA, 213–220.
Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Computing recommendations via a Knowledge Graph-aware Autoencoder Vancouver, Canada.

[20] Florian Strub, Romaric Gaudel, and Jérémie Mary. 2016. Hybrid Recommender
System Based on Autoencoders. In Proceedings of the 1st Workshop on Deep
Learning for Recommender Systems (DLRS 2016). ACM, New York, NY, USA, 11–
16.
[21] Jeroen B. P. Vuurens, Martha Larson, and Arjen P. de Vries. 2016. Exploring Deep
Space: Learning Personalized Ranking in a Semantic Space. In Proceedings of the
1st Workshop on Deep Learning for Recommender Systems (DLRS 2016). ACM, New
York, NY, USA, 23–28.
[22] Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative Deep Learning
for Recommender Systems. In Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD ’15). ACM, New York,
NY, USA, 1235–1244.
[23] Yao Wu, Christopher DuBois, Alice X. Zheng, and Martin Ester. 2016. Collabora-
tive Denoising Auto-Encoders for Top-N Recommender Systems. In Proceedings
of the Ninth ACM International Conference on Web Search and Data Mining (WSDM
’16). ACM, New York, NY, USA, 153–162.