=Paper= {{Paper |id=Vol-2290/kars2018_paper3 |storemode=property |title=Computing Recommendations via a Knowledge Graph-aware Autoencoder |pdfUrl=https://ceur-ws.org/Vol-2290/kars2018_paper3.pdf |volume=Vol-2290 |authors=Vito Bellini,Angelo Schiavone,Tommaso Di Noia,Azzurra Ragone,Eugenio Di Sciascio |dblpUrl=https://dblp.org/rec/conf/recsys/BelliniSNRS18a }} ==Computing Recommendations via a Knowledge Graph-aware Autoencoder== https://ceur-ws.org/Vol-2290/kars2018_paper3.pdf
     Computing recommendations via a Knowledge Graph-aware
                        Autoencoder
                                           Vito Bellini⋆, Angelo Schiavone⋆, Tommaso Di Noia⋆,
                                                  Azzurra Ragone• , Eugenio Di Sciascio⋆
                                                                  ⋆ Polytechnic University of Bari

                                                                              Bari - Italy
                                                                    firstname.lastname@poliba.it
                                                                       • Independent Researcher

                                                                     azzurra.ragone@gmail.com

ABSTRACT                                                                                   Another technology that surely boosted the development of a
In the last years, deep learning has shown to be a game-changing                        new generation of smarter and more accurate recommender sys-
technology in artificial intelligence thanks to the numerous suc-                       tems is deep learning [2]. Starting from the basic notion of an
cesses it reached in diverse application fields. Among others, the                      artificial neural net (ANN), several configurations of deep ANN
use of deep learning for the recommendation problem, although                           have been proposed over the years, such as autoencoders.
new, looks quite promising due to its positive performances in                             In this paper, we show how autoencoders technology can benefit
terms of accuracy of recommendation results. In a recommendation                        from the existence of a Knowledge Graph to create a representation
setting, in order to predict user ratings on unknown items a possi-                     of a user profile that can be eventually exploited to predict ratings
ble configuration of a deep neural network is that of autoencoders                      for unknown items. The main intuition behind the approach is that
typically used to produce a lower dimensionality representation of                      both ANN and Knowledge Graph expose a graph-based structure.
the original data. In this paper we present KG-AUTOENCODER, an                          Hence, we may imagine building the topology of the inner layers
autoencoder that bases the structure of its neural network on the                       in the ANN by mimicking that of a Knowledge Graph.
semantics-aware topology of a knowledge graph thus providing a la-                      The remainder of this paper is structured as follows: in the next
bel for neurons in the hidden layer that are eventually used to build                   section, we discuss related works on recommender systems exploit-
a user profile and then compute recommendations. We show the ef-                        ing deep learning, knowledge graphs and Linked Open Data. Then,
fectiveness of KG-AUTOENCODER in terms of accuracy, diversity                           the basic notions of the technologies we adopted are introduced
and novelty by comparing with state of the art recommendation                           in Section 3. The proposed recommendation model is described in
algorithms.                                                                             Section 4 while in Section 5 we present the experimental setting
                                                                                        and evaluation. Conclusions and Future Work close the paper.
ACM Reference Format:
Vito Bellini⋆ , Angelo Schiavone⋆ , Tommaso Di Noia⋆ ,, Azzurra Ragone• ,
Eugenio Di Sciascio⋆ . 2019. Computing recommendations via a Knowledge
                                                                                        2   RELATED WORK
Graph-aware Autoencoder. In Proceedings of Knowledge-aware and Conver-
sational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys               Autoencoders and Deep Learning for RS. The adoption of deep
2018). ACM, New York, NY, USA, 7 pages.                                                 learning techniques is for sure one of the main advances of the
                                                                                        last years in the field of recommender systems. In [23], the authors
1    INTRODUCTION                                                                       propose the usage of a denoising autoencoder performs a top-N
Recommender systems (RS) have become pervasive tools we expe-                           recommendation task by exploiting a corrupted version of the input
rience in our everyday life. While browsing a catalog of items RSs                      data.
exploit users’ past preferences in order to suggest new items they                          A pure Collaborative-Filtering (CF) model based on autoencoders
might be interested in.                                                                 is described in [18], in which the authors develop both user-based
   Knowledge Graphs have been recently adopted to represent                             and item-based autoencoders to tackle the recommendation task.
items, compute their similarity and relatedness [6] as well as to feed                  Stacked Denoising Autoencoders are combined with collaborative
Content-Based (CB) and hybrid recommendation engines [15]. The                          filtering techniques in [20] where the authors leverage autoen-
publication and spread of freely available Knowledge Graphs in the                      coders to get a smaller and non-linear representation of the users-
form of Linked Open Data datasets, such as DBpedia [1], has paved                       items interactions. This representation is eventually used to feed
the way to the development of knowledge-aware recommendation                            a deep neural network which can alleviate the cold-start problem
engines in many application domains and, still, gives the possibility                   thanks to the integration of side information. A hybrid recom-
to easily switch from a domain to another one by just feeding the                       mender system is finally built.
system with a different subset of the original graph.                                       Wang et al. [22] suggest to apply deep learning methods on side
                                                                                        information to reduce the sparsity of the rating matrix in collab-
Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018             orative approaches. In [21] the authors propose a deep learning
(co-located with RecSys 2018), October 7, 2018, Vancouver, Canada.
2018. ACM ISBN Copyright for the individual papers remains with the authors. Copying    approach to build a high-dimensional semantic space based on
permitted for private and academic purposes. This volume is published and copyrighted   the substitutability of items; then, a user-specific transformation
by its editors..
                                                                                        is learned in order to get a ranking of items from such a space.
Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Vancouver, Canada.                                                                                              V. Bellini et al.

Analysis about the impact of deep learning on both recommenda-                                                        Post-apocalyptic_films
tion quality and system scalability are presented in [7], where the                                                      t                  su
                                                                                                                  bjec                           bje
                                                                                                                                                     c
authors first represent users and items through a rich feature set                                             su                                        t

made on different domains and then map them to a latent space.
Finally, a content-based recommender system is built.                                           12_Monkeys                                               Cloud_Atlas_(film)
Knowledge Graphs and Linked Open Data for RS. Several
                                                                                                                                                 r
works have been proposed exploiting side information coming                                   Ghostwritten                                 ec
                                                                                                                                             to
                                                                                                                                        dir
from knowledge graphs and Linked Open Data (LOD) to enhance
the performance of recommender systems. Most of them rely on
the usage of DBpedia as knowledge graph. In [10], for the very first




                                                                                                 genre
                                                                                                                      The_Wachowskis
time, a LOD-based recommender system is proposed to alleviate                                                                             cre
                                                                                                                                                ato
                                                                                                                                                     r
some of the major problems that affect collaborative techniques
                                                                                                                             genre
mainly the high sparsity of the user-item matrix. The effectiveness                             Drama                                                         Sense8
of such an approach seems to be confirmed by a large number of
methods that have been proposed afterward. A detailed review of
                                                                                                         Figure 1: Part of knowledge-graph.
LOD-based recommender systems is presented in [3]. By leverag-
ing the knowledge encoded in DBpedia, it is possible to build an
accurate content-based recommender system [5].                                        In the meantime, some communities started to build their own
                                                                                   KGs; among them, the most important is for sure DBpedia2 . We may
3      BACKGROUND TECHNOLOGIES                                                     investigate relationships among entities in the KG by exploiting
                                                                                   graph data sources and hence discover meaningful paths within
                                                                                   the graph.
    Autoencoders. An Artificial Neural Network (ANN) is a math-
                                                                                      In figure 1 we show an excerpt of the DBpedia graph, involving
ematical model used to learn the relationships which underlie in
                                                                                   some entities in the movie domain. Interestingly we see that DB-
a given set of data. Starting from them, after a training phase, an
                                                                                   pedia encodes both factual information, e.g. “Cloud_Atlas_(film) has
ANN can be used to predict a single value or a vector, for regression
                                                                                   director The_Wachowskis”, and categorical one such as “Cloud_Atlas_(film)
or classification tasks.
                                                                                   has subject Post-apocalyptic_films”.
Basically, an ANN consists of a bunch of nodes, called neurons,
distributed among three different kinds of layers: the input layer,
one or more hidden layers, and the output layer. Typically, a neuron
                                                                                   4     SEMANTICS-AWARE AUTOENCODERS FOR
of a layer is connected to all the neurons of the next layer, making                     RATING PREDICTION
the ANN a fully connected network.                                                 The main idea of our approach is to map the connections in a
Autoencoders are ANNs which try to set the output values equal to                  knowledge graph (KG) with those between units from layer i to
the input ones, modeling an approximation of the identity function                 layer i+1, as shown in Figure 2. There we see that we injected
y = f (xx ) = x . Roughly, they are forced to predict the same values              only categorical information in the autoencoder and we left out
they are fed with. Therefore, the number of output units and one                   factual one. As a matter of fact, if we analyze these two kinds of
of the input nodes is the same, i.e. |xx | = |yy |. The aim of such a task         information in DBpedia we may notice that:
is to obtain a new representation of the original data based on the                     • the quantity of categorical information is higher than the
values of the hidden layer neurons. In fact, each of these layers                         factual one. If we consider movies, the overall number of
projects the input data in a new Euclidean space whose dimensions                         entities they are related with is lower than the overall number
depend on the number of the nodes in the hidden layer.                                    of categories;
    Therefore, when we use an autoencoder, we are not interested                        • categorical information is more distributed over the items
at its output, but at the encoded representation it computes: in this                     than the factual one. Going back to movies we see that they
way, we can leverage the implicit knowledge behind the original                           are more connected with each other via categories than via
data, performing the so-called feature extraction task. The actual                        other entities.
meaning of each dimension (represented by hidden nodes) in the                     Hence, we may argue that for a recommendation task where we are
new space is unknown, but we can be sure that they are based on                    looking for commonalities among items, categorical data may result
latent patterns binding the training cases.                                        in more meaningful than the factual one. The main assumption
   KG. In 2012, Google announced its Knowledge Graph1 as a new                     behind this choice is that, for instance, if a user rated positively
tool to improve the identification and retrieval of entities in return             Cloud_Atlas this may be interpreted as a positive rating for the
to a search query. A Knowledge Graph is a form of representation                   connected category Post-apocalyptic_films.
of knowledge through a semantic (labelled) network that allows a                      In order to test our assumption, we mapped the autoencoder
system to store the human knowledge in a structured format well                    network topology with the categorical information related to items
understandable by a computer agent.                                                rated by users. As we build a different autoencoder for each user
                                                                                   depending on the items she rated in the past, the mapping with
1 https://googleblog.blogspot.it/2012/05/introducing-knowledge-graph-things-not.
html                                                                               2 http://dbpedia.org
   Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Computing recommendations via a Knowledge Graph-aware Autoencoder                                        Vancouver, Canada.

a KG makes the hidden layer of variable length in the number of                                               Our autoencoder is, therefore, able to learn the semantics behind
units, depending on how much categorical information is available                                             the ratings of each user and weight them through backpropagation.
for items rated by the specific user.                                                                         In our current implementation we used the well known sigmoid
                                                                                                              σ (x) = 1+e1 −x activation function since we normalized the design
                                                                                                              matrix to be within [0, 1]. We trained each autoencoder for 10,000
                                                                                                              epochs with a learning rate of r = 0.03; weights are initialized to
                                                Kung fu films
                                        ct                         sub                                        zero close values as Xavier et al. suggest in [9].
                                   je                                    ject
                               sub                                                                                Starting from the trained autoencoder, we may build a user pro-
                                                                                                              file by considering the categories associated to the items she rated
  5.0       The Matrix          subje                                     ct
                                                                                       The Matrix       5.0
                                         ct                       subje                                       in the past as features and by assigning them a value according
                                               Post-apocalyptic                                               to the weights associated to the edges entering the corresponding
                                                     films
                                t




                                                                      su
                               ec




                                                                                                              hidden units. Given a user u, the weight associated to a feature
                                                                         b
                               bj




                                                                           jec
                             su




                                                                                                              c is then the summation of the weights w ku (c) associated to the
                                                                                t
  4.0       The Karate
           Kid (film 2010)      subje                                     ct
                                                                                      The Karate
                                                                                     Kid (film 2010)    4.0   edges entering the hidden node representing the Knowledge Graph
                                         ct                       subje
                                                                                                              category c after training the autoencoder with the ratings of u.
                                                Chinese films                                                 More formally, we have:
                                                                   sub
                                    ject                                 ject
                               sub
                                                                                                                                                   n(c)|
                                                                                                                                                 |IÕ
                                                                                                                                     ω u (c) =             w ku (c)
  2.0     Astro Boy (film)                                                           Astro Boy (film)   2.0
                                                                                                                                                  k =1
                               subje                                       ct
                                     c   t       Computer-         subje
                                               animated films                                                 where In(c) is the set of the edges entering the node representing
                                                                                                              the feature c. We remember that since the autoencoder is not fully
                                                                                                              connected, |In(c)| varies depending on the related connections to
          Input Layer                         Hidden Layer                          Output Layer              the category c in the knowledge graph.
                                                                                                                  By means of the weights associated with each feature, we can
                                                                                                              now model a user profile composed by a vector of weighted cat-
        Figure 2: Architecture of a semantic autoencoder.
                                                                                                              egorical features. Given F u as the set of categories belonging to
                                                                                                              all the items rated by u and F = u ∈U F u as the set of all features
                                                                                                                                               Ð
    Let n be the number of items rated by u available in the graph and                                        among all the users in the system we have for each user u ∈ U and
Ci = {c i1 , c i2 , . . . , c im } be the set of m categorical nodes associated                               for each feature c ∈ F :
in the KG to the item i. Then, F u = ni=1 Ci is the set of features
                                                 Ð
mapped into the hidden layer for the user and the overall number                                                             P(u) = {⟨c, ω⟩ | ω = ω u (c) if c ∈ F u }
of hidden units is equal to |F u |. Once the neural network setup is                                              Considering that users provide a different number of ratings, we
done, the training process takes place, feeding the neural network                                            have an unbalanced distribution in the dimension of user profiles.
with ratings provided by the user, normalized in the interval [0,1].                                          Moreover, as a user usually rate only a small subset of the entire
It is worth noticing that, as the autoencoder, we build to mimic the                                          catalog, we have a massive number of missing features belonging
structure of the connections available in the Knowledge Graph, the                                            to items not rated by u. In order to compute values associated
neural network we build is not fully connected. Moreover, it does                                             with missing features, we leverage an unsupervised deep learning
not need bias nodes because these latter are not representative of                                            model inspired by the word2vec approach [13]. It is an efficient
any semantic data in the graph.                                                                               technique originally conceived to compute word embeddings (i.e.,
    Nodes in the hidden layer correspond to categorical information                                           numerical representations of words) by capturing the semantic
in the knowledge graph. At every iteration of the training process,                                           distribution of textual words in a latent space starting from their
backpropagation will change weights accordingly on edges among                                                distribution within the sentences composing the original text. Given
units in the layers, such that the sum of entering edges in an output                                         a corpus, e.g., an excerpt from a book, it projects each word in a
unit will reconstruct the user rating for the item represented by                                             multidimensional space such that words similar from a semantic
that unit. Regarding the nodes in the hidden layer, we may interpret                                          point of view result closer to each other. In this way, we are able to
the sum of the weights associated to entering edges computed at                                               evaluate the semantic similarity between two words even if they
the end of the training process as the importance of that feature                                             never appear in the same sentence. Given a sequence of words
in the generation of an output which, in our case, are the ratings                                            [x 1 , . . . , x n ] within a window, word2vec compute the probability
provided by the user.                                                                                         for a new word x ′ to be next one in the sequence. More formally, it
                                                                                                              computes p(x ′ | [x 1 , . . . , x n ]).
4.1      User Profiles                                                                                            In our scenario, we may imagine replacing sentences represented
Once the network converges we have a latent representation of                                                 by sequences of words with user profiles represented by sequences
features associated with a user profile together with their weights.                                          of categories in c ∈ F u and then use the word2vec approach to
However, very interestingly, this time the features represented by                                            compute for a given user u the weight of missing features c ′ < F u .
nodes in the hidden layer also have an explicit meaning as they                                                   We need to prepare the user profiles P(u) to be processed by
are in a one to one mapping with categories in a knowledge graph.                                             word2vec. Hence, we first generate a corpus made of sequences of
Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Vancouver, Canada.                                                                                              V. Bellini et al.

ordered features where the order is given by ω. The very prelim-                           where r (v j , i) is the rating assigned to i by the user v j . We use then
inary step is that of selecting an order among elements c ∈ F u                            ratings from Equation (1) to provide top-N recommendation for
which results coherently for all u ∈ U thus moving from the set                            each user.
P(u) to a representative sequence of elements s(u).
   For each ⟨c, ω⟩ ∈ P(u) we create a corresponding pair ⟨c, norm(ω)⟩                      5     EXPERIMENTS
with norm being the mapping function                                                       In this section, we present the experimental evaluations performed
                   norm : [0, 1] 7→ {0.1, 0.2, 0.3, . . . , 1}                             on three different datasets. We first describe the structure of the
                                                                                           datasets used in the experiments and the evaluation protocol and
that linearly maps3 a value in the interval [0, 1] to a real value in                      then we move to the metrics adopted for the evaluation and the
the set {0.1, 0.2, 0.3, . . . , 1}. The new pairs form the set                             discussion of obtained results.
              P norm (u) = {⟨c, norm(ω)⟩ | ⟨c, ω⟩ ∈ P(u)}                                     Our experiments can be reproduced through the implementation
                                                                                           available on our public repository4 .
For each normalized user profile set P norm (u) we then build the
corresponding sequence
                                                                                           5.1      Dataset
         s(u) = [. . . , ⟨c i , norm(ωiu )⟩, . . . ⟨c j , norm(ω uj )⟩, . . .]             In order to validate our approach we performed experiments on
with ωiu ≥ ω uj .                                                                          the three datasets summarized in Table 1.
   Once we have the set S = {s(u) | u ∈ U } we can feed the
word2vec algorithm with this corpus in order to find patterns of                                                            #users      #items    #ratings     sparsity
features according to their distribution across all users. In the pre-                     MovieLens 20M                    138,493     26,744    20,000,263   99.46%
diction phase, by using each user’s sequence of features s(u) as                           Amazon Digital Music             478,235     266,414   836,006      99.99%
input for the trained word2vec model, we estimate the probability                          LibraryThing                     7,279       37,232    626,000      99.77%
of ⟨c ′, norm(ω ′ )⟩ ∈ v ∈U P norm (v) − P norm (u) to belong to the
                        Ð
                                                                                                                        Table 1: Datasets
given context, or rather to be relevant for u. In other words, we
compute p(⟨c ′, norm(ω ′ )⟩ | s(u)).
   It is worth noticing that given c ′ ∈ F u we may have multiple
pairs with c ′ as first element in v ∈U P norm (v) −P norm (u). For in-
                                  Ð
                                                                                              In our experiments, we referred to the freely available knowledge
stance, given the category dbc:Kung_fu_films we may have both                              graph of DBpedia5 . The mapping contains 22,959 mapped items for
⟨dbc : Kung_fu_films, 0.2⟩ and ⟨dbc : Kung_fu_films, 0.5⟩, with                            MovieLens 20M6 , 4,077 items mapped for Amazon Digital Music7
the corresponding probabilities p(⟨dbc : Kung_fu_films, 0.2⟩ | s(u)),                      and 9,926 items mapped for LibraryThing8 . For our experiments,
p(⟨dbc : Kung_fu_films, 0.5⟩ | s(u)). Still, as we want to add the                         we removed from the datasets all the items without a mapping in
category dbc:Kung_fu_films together with its corresponding weight                          DBpedia.
only once in the user profile we select only the pair with the highest
probability. The new user profile is then                                                  5.2      Evaluation protocol
                                                                                 norm
P̂(u) = P(u)∪{⟨c, ω⟩ |         argmax          p(⟨c, ω⟩ | s(u)) and ⟨c, ω⟩ < P          (u)}Here, we show how we evaluated the performances of our methods
                            ω ∈ {0.1, ...,1}                                                in recommending items. We split the dataset using Hold-Out 80/20,
We point out that while the original P(u) is built by exploiting                            ensuring that every user has 80% of their ratings in the training
only content-based information, the enhanced user profile P̂(u)                             set and the remaining 20% in the test set. For the evaluation of our
also considers collaborative information as it based also on the set                        approach we adopted the "all unrated items" protocol described
S containing a representation for the profiles of all the users in U .                      in [19]: for each user u, a top-N recommendation list is provided
                                                                                            by computing a score for every item i not rated by u, whether i
4.2     Computing Recommendations                                                           appears in the user test set or not. Then, recommendation lists are
                                                                                            compared with the test set by computing both performance and
Given the user profiles represented as vectors of weighted features,
                                                                                            diversity metrics such as Precision, Recall, F-1 score, nDCG [12],
recommendations are then computed by using a well-known k-
                                                                                            aggregate diversity, and Gini index as a measure of sales diversity
nearest neighbors approach. User similarities are found through
                                                                                            [8].
projecting their user profile in a Vector Space Model, and then
similarities between each pair of users u and v is computed using
                                                                                           5.3      Results Discussion
the cosine similarity.
   For each user u we find the top-k similar neighbors to infer                            In our experiments, we compared our approach with three differ-
the rate r for the item i as the weighted average rate that the                            ent states of the art techniques widely used in recommendation
neighborhood gave to it:                                                                   scenarios: BPRMF, WRMF and a single-layer autoencoder for rat-
                                                                                           ing prediction. BPRMF [17] is a Matrix Factorization algorithm
                                 Ík
                                    j=1 sim(u, v j ) · r (v j , i)                         4 https://github.com/sisinflab/SEMAUTO-2.0
                    r (u, i) =        Ík                                         (1)       5 https://dbpedia.org
                                         j=1 sim(u, v j )                                  6 https://grouplens.org/datasets/movielens/20m/
                                                                                           7 http://jmcauley.ucsd.edu/data/amazon/
3 In our current implementation we use a standard minmax normalization.                    8 https://www.librarything.com
   Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Computing recommendations via a Knowledge Graph-aware Autoencoder                                        Vancouver, Canada.

which leverages Bayesian Personalized Ranking as the objective                 applications of autoencoders to feature selection, we compute a
function. WRMF [11, 16] is a Weighted Regularized Matrix Factor-               latent representation of items but, in our case, we attach an explicit
ization method which exploits users’ implicit feedbacks to provide             semantics to selected features. This allows our system to exploit
recommendations. In their basic version, both strategies rely ex-              both the power of deep learning techniques and, at the same time,
clusively on the User-Item matrix in a pure collaborative filtering            to have a meaningful and understandable representation of the
approach. They can be hybridized by exploiting side information,               trained model. We used our approach to autoencode user ratings in
i.e. additional data associated with items. In our experiments, we             a recommendation scenario via the DBpedia knowledge graph and
adopted categorical information found on the DBpedia Knowledge                 proposed an algorithm to compute user profiles then adopted to pro-
Graph as side information. We used the implementations of BPRMF                vide recommendations based on the semantic features we extract
and WRMF available in MyMediaLite9 and implemented the au-                     with our autoencoder. Experimental results show that we are able to
toencoder in Keras10 . We verified the statistical significance of our         outperform state of the art recommendation algorithms in terms of
experiments by using the Wilcoxon Signed Rank test we get a                    accuracy and diversity. Furthermore, we will compare our approach
p-value very close to zero, which ensures the validity of our results.         with other competitive baselines as suggested in more recent works
In Table 3 we report the results gathered on the three datasets by             [14]. The results presented in this paper pave the way to various
applying the methods discussed above. As for our approach KG-                  further investigations in different directions. From a methodolog-
AUTOENCODER, we tested it for a different number of neighbors                  ical and algorithmic point of view, we can surely investigate the
by varying k.                                                                  augmentation of further deep learning techniques via the injection
In terms of accuracy, we can see that KG-AUTOENCODER outper-                   of explicit and structured knowledge coming from external sources
forms our baselines on both MovieLens 20M and Amazon Digital                   of information. Giving an explicit meaning to neurons in an ANN as
Music datasets, while on LibraryThing the achieved results are                 well as to their connections can fill the semantic gap in describing
quite the same. In particular, on the LibraryThing dataset, only the           models trained via deep learning algorithms. Moreover, having an
fully-connected autoencoder performs better than our approach                  explicit representation of latent features opens the door to a better
with regard to accuracy.                                                       and explicit user modeling. We are currently investigating how to
Concerning diversity, we get much better results on all the datasets.          exploit the structure of a Knowledge Graph-enabled autoencoder
Furthermore, by analyzing the gathered results, it seems that our              to infer qualitative preferences represented by means of expressive
approach provides very discriminative descriptions for each user,              languages such as CP-theories [4]. Providing such a powerful rep-
letting us identify the most effective neighborhood and compute                resentation may also result in being a key factor in the automatic
both accurate and diversified recommendations. As a matter of fact,            generation of explanation to recommendation results.
we achieve the same results in terms of accuracy as the baselines
by suggesting much more items.                                                 REFERENCES
    As shown in Table 2, KG-AUTOENCODER performs better on                      [1] Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak,
those datasets whose items can be associated with a large amount                    and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In Proceed-
of categorical information, which implies the usage of many hidden                  ings of the 6th International The Semantic Web and 2Nd Asian Conference on Asian
                                                                                    Semantic Web Conference (ISWC’07/ASWC’07). Springer-Verlag, 722–735.
units. This occurs because very complex functions can be modeled                [2] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks
by ANNs if enough hidden units are provided, as Universal Ap-                       for youtube recommendations. In Proceedings of the 10th ACM Conference on
                                                                                    Recommender Systems. ACM, 191–198.
proximation Theorem points out. For this reason, our approach                   [3] Marco de Gemmis, Pasquale Lops, Cataldo Musto, Fedelucio Narducci, and Gio-
turned out to work better on MovieLens 20M dataset (whose related                   vanni Semeraro. 2015. Semantics-Aware Content-Based Recommender Systems.
neural networks have a high number of hidden units) rather than                     In Recommender Systems Handbook. 119–159.
                                                                                [4] Tommaso Di Noia, Thomas Lukasiewicz, Maria Vanina Martínez, Gerardo I.
the others. In particular, the experiments on LibraryThing dataset                  Simari, and Oana Tifrea-Marciuska. 2015. Combining Existential Rules with the
show that the performances get worse as the number of the neurons                   Power of CP-Theories. In Proceedings of the Twenty-Fourth International Joint
decreases, i.e. available categories are not enough.                                Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July
                                                                                    25-31, 2015. 2918–2925.
                                                                                [5] Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, and
                           avg #features    std     avg #features/avg #items        Markus Zanker. 2012. Linked Open Data to Support Content-based Recommender
                                                                                    Systems. In Proceedings of the 8th International Conference on Semantic Systems
Movielens 20M                1015.87       823.26             8.82                  (I-SEMANTICS ’12). ACM, New York, NY, USA, 1–8.
Amazon Digital Music          7.22          9.77              5.17              [6] T. Di Noia, V.C. Ostuni, J. Rosati, P. Tomeo, E. Di Sciascio, R. Mirizzi, and C.
LibraryThing                 206.88        196.64             1.96                  Bartolini. 2016. Building a relatedness graph from Linked Open Data: A case
                                                                                    study in the IT domain. Expert Systems with Applications 44 (2016), 354–366.
 Table 2: Summary of hidden units for mapped items only.                        [7] Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. 2015. A Multi-View Deep
                                                                                    Learning Approach for Cross Domain User Modeling in Recommendation Sys-
                                                                                    tems. In Proceedings of the 24th International Conference on World Wide Web
                                                                                    (WWW ’15). International World Wide Web Conferences Steering Committee,
                                                                                    Republic and Canton of Geneva, Switzerland, 278–288.
                                                                                [8] Daniel M Fleder and Kartik Hosanagar. 2007. Recommender systems and their
6     CONCLUSION AND FUTURE WORK                                                    impact on sales diversity. In Proceedings of the 8th ACM conference on Electronic
                                                                                    commerce. ACM, 192–199.
In this paper, we have presented a recommendation approach that                 [9] Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training
combines the computational power of deep learning with the rep-                     deep feedforward neural networks. In Proceedings of the thirteenth international
resentational expressiveness of knowledge graphs. As for classical                  conference on artificial intelligence and statistics. 249–256.
                                                                               [10] Benjamin Heitmann and Conor Hayes. 2010. C.: Using linked data to build open,
9 http://mymedialite.net                                                            collaborative recommender systems. In In: AAAI Spring Symposium: Linked Data
10 https://keras.io                                                                 Meets Artificial IntelligenceâĂŹ. (2010.
Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Vancouver, Canada.                                                                                              V. Bellini et al.
                                                               k          F1  Prec.   Recall    nDCG                         Gini        aggrdiv
                                                                        MOVIELENS 20M
                            AUTOENCODER                        −   0.21306 0.21764 0.20868 0.24950                         0.01443          1587
                            BPRMF                              −   0.14864 0.15315 0.14438 0.17106                         0.00375          3263
                            BPRMF + SI                         −   0.16838 0.17112 0.16572 0.19500                         0.00635          3552
                            WRMF                               −   0.19514 0.19806 0.19231 0.22768                         0.00454           766
                            WRMF + SI                          −   0.19494 0.19782 0.19214 0.22773                         0.00450           759
                                                               5   0.18857 0.18551 0.19173 0.21941                         0.01835          5214
                                                               10  0.21268 0.21009 0.21533 0.24945                         0.01305          3350
                                                               20  0.22886 0.22684 0.23092 0.27147                         0.01015          2417
                            KG-AUTOENCODER
                                                               40  0.23675 0.23534 0.23818 0.28363                         0.00827          1800
                                                               50  0.23827 0.23686 0.23970 0.28605                         0.00780          1653
                                                              100 0.23961 0.23832 0.24090 0.28924                          0.00662          1310
                                                                   AMAZON DIGITAL MUSIC
                            AUTOENCODER                        −   0.00060 0.00035 0.00200 0.00102                         0.33867          3559
                            BPRMF                              −   0.01010 0.00565 0.04765 0.02073                         0.00346           539
                            BPRMF + SI                         −   0.00738 0.00413 0.03480 0.01624                         0.06414          2374
                            WRMF                               −   0.02189 0.01236 0.09567 0.05511                         0.01061           103
                            WRMF + SI                          −   0.02151 0.01216 0.09325 0.05220                         0.01168           111
                                                               5   0.01514 0.00862 0.06233 0.04365                         0.03407          3378
                                                               10  0.01920 0.01091 0.07994 0.05421                         0.05353          3449
                                                               20  0.02233 0.01267 0.09385 0.06296                         0.08562          3523
                            KG-AUTOENCODER
                                                               40  0.02572 0.01460 0.10805 0.06980                         0.14514          3549
                                                               50  0.02618 0.01486 0.10974 0.07032                         0.17192          3549
                                                              100 0.02835 0.01608 0.11964 0.07471                          0.24859          3448
                                                                        LIBRARYTHING
                            AUTOENCODER                        − 0.01562 0.01375 0.01808 0.01758                           0.07628          2328
                            BPRMF                              −   0.01036 0.00954 0.01134 0.01001                         0.06764          3140
                            BPRMF + SI                         −   0.01065 0.00994 0.01148 0.01041                         0.10753          4946
                            WRMF                               −   0.01142 0.01071 0.01223 0.01247                         0.00864           439
                            WRMF + SI                          −   0.01116 0.01030 0.01217 0.01258                         0.00868           442
                                                               5   0.00840 0.00764 0.00931 0.00930                         0.13836          4895
                                                               10  0.01034 0.00930 0.01163 0.01139                         0.07888          3558
                                                               20  0.01152 0.01029 0.01310 0.01248                         0.04586          2245
                            KG-AUTOENCODER
                                                               40  0.01195 0.01073 0.01347 0.01339                         0.02800          1498
                                                               50  0.01229 0.01110 0.01378 0.01374                         0.02403          1312
                                                              100 0.01278 0.01136 0.01461 0.01503                          0.01521           873
                                                                  Table 3: Experimental Results



[11] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for        [15] S. Oramas, V.C. Ostuni, T. Di Noia, X. Serra, and E. Di Sciascio. 2016. Sound and
     Implicit Feedback Datasets. In Proceedings of the 2008 Eighth IEEE International          music recommendation with knowledge graphs. ACM Transactions on Intelligent
     Conference on Data Mining (ICDM ’08). IEEE Computer Society, Washington, DC,              Systems and Technology 8, 2 (2016).
     USA, 263–272.                                                                        [16] Rong Pan, Yunhong Zhou, Bin Cao, Nathan N. Liu, Rajan Lukose, Martin Scholz,
[12] Kalervo Järvelin and Jaana Kekäläinen. 2000. IR Evaluation Methods for Retriev-           and Qiang Yang. 2008. One-Class Collaborative Filtering. In Proceedings of the 2008
     ing Highly Relevant Documents. In Proceedings of the 23rd Annual International            Eighth IEEE International Conference on Data Mining (ICDM ’08). IEEE Computer
     ACM SIGIR Conference on Research and Development in Information Retrieval                 Society, Washington, DC, USA, 502–511.
     (SIGIR ’00). ACM, New York, NY, USA, 41–48.                                          [17] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.
[13] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean.            2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings
     2013. Distributed Representations of Words and Phrases and their Composi-                 of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI ’09).
     tionality. In Advances in Neural Information Processing Systems 26: 27th Annual           AUAI Press, Arlington, Virginia, United States, 452–461.
     Conference on Neural Information Processing Systems 2013. Proceedings of a meeting   [18] Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015.
     held December 5-8, 2013, Lake Tahoe, Nevada, United States. 3111–3119.                    AutoRec: Autoencoders Meet Collaborative Filtering. In Proceedings of the 24th
[14] Cataldo Musto, Tiziano Franza, Giovanni Semeraro, Marco de Gemmis, and                    International Conference on World Wide Web (WWW ’15 Companion). ACM, New
     Pasquale Lops. 2018. Deep Content-based Recommender Systems Exploiting                    York, NY, USA, 111–112.
     Recurrent Neural Networks and Linked Open Data. In Adjunct Publication of the        [19] Harald Steck. 2013. Evaluation of Recommendations: Rating-prediction and
     26th Conference on User Modeling, Adaptation and Personalization (UMAP ’18).              Ranking. In Proceedings of the 7th ACM Conference on Recommender Systems
     ACM, New York, NY, USA, 239–244.                                                          (RecSys ’13). ACM, New York, NY, USA, 213–220.
   Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018,
Computing recommendations via a Knowledge Graph-aware Autoencoder                                        Vancouver, Canada.

[20] Florian Strub, Romaric Gaudel, and Jérémie Mary. 2016. Hybrid Recommender
     System Based on Autoencoders. In Proceedings of the 1st Workshop on Deep
     Learning for Recommender Systems (DLRS 2016). ACM, New York, NY, USA, 11–
     16.
[21] Jeroen B. P. Vuurens, Martha Larson, and Arjen P. de Vries. 2016. Exploring Deep
     Space: Learning Personalized Ranking in a Semantic Space. In Proceedings of the
     1st Workshop on Deep Learning for Recommender Systems (DLRS 2016). ACM, New
     York, NY, USA, 23–28.
[22] Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative Deep Learning
     for Recommender Systems. In Proceedings of the 21th ACM SIGKDD International
     Conference on Knowledge Discovery and Data Mining (KDD ’15). ACM, New York,
     NY, USA, 1235–1244.
[23] Yao Wu, Christopher DuBois, Alice X. Zheng, and Martin Ester. 2016. Collabora-
     tive Denoising Auto-Encoders for Top-N Recommender Systems. In Proceedings
     of the Ninth ACM International Conference on Web Search and Data Mining (WSDM
     ’16). ACM, New York, NY, USA, 153–162.