=Paper=
{{Paper
|id=Vol-2290/kars2018_paper3
|storemode=property
|title=Computing Recommendations via a Knowledge Graph-aware Autoencoder
|pdfUrl=https://ceur-ws.org/Vol-2290/kars2018_paper3.pdf
|volume=Vol-2290
|authors=Vito Bellini,Angelo Schiavone,Tommaso Di Noia,Azzurra Ragone,Eugenio Di Sciascio
|dblpUrl=https://dblp.org/rec/conf/recsys/BelliniSNRS18a
}}
==Computing Recommendations via a Knowledge Graph-aware Autoencoder==
Computing recommendations via a Knowledge Graph-aware Autoencoder Vito Bellini⋆, Angelo Schiavone⋆, Tommaso Di Noia⋆, Azzurra Ragone• , Eugenio Di Sciascio⋆ ⋆ Polytechnic University of Bari Bari - Italy firstname.lastname@poliba.it • Independent Researcher azzurra.ragone@gmail.com ABSTRACT Another technology that surely boosted the development of a In the last years, deep learning has shown to be a game-changing new generation of smarter and more accurate recommender sys- technology in artificial intelligence thanks to the numerous suc- tems is deep learning [2]. Starting from the basic notion of an cesses it reached in diverse application fields. Among others, the artificial neural net (ANN), several configurations of deep ANN use of deep learning for the recommendation problem, although have been proposed over the years, such as autoencoders. new, looks quite promising due to its positive performances in In this paper, we show how autoencoders technology can benefit terms of accuracy of recommendation results. In a recommendation from the existence of a Knowledge Graph to create a representation setting, in order to predict user ratings on unknown items a possi- of a user profile that can be eventually exploited to predict ratings ble configuration of a deep neural network is that of autoencoders for unknown items. The main intuition behind the approach is that typically used to produce a lower dimensionality representation of both ANN and Knowledge Graph expose a graph-based structure. the original data. In this paper we present KG-AUTOENCODER, an Hence, we may imagine building the topology of the inner layers autoencoder that bases the structure of its neural network on the in the ANN by mimicking that of a Knowledge Graph. semantics-aware topology of a knowledge graph thus providing a la- The remainder of this paper is structured as follows: in the next bel for neurons in the hidden layer that are eventually used to build section, we discuss related works on recommender systems exploit- a user profile and then compute recommendations. We show the ef- ing deep learning, knowledge graphs and Linked Open Data. Then, fectiveness of KG-AUTOENCODER in terms of accuracy, diversity the basic notions of the technologies we adopted are introduced and novelty by comparing with state of the art recommendation in Section 3. The proposed recommendation model is described in algorithms. Section 4 while in Section 5 we present the experimental setting and evaluation. Conclusions and Future Work close the paper. ACM Reference Format: Vito Bellini⋆ , Angelo Schiavone⋆ , Tommaso Di Noia⋆ ,, Azzurra Ragone• , Eugenio Di Sciascio⋆ . 2019. Computing recommendations via a Knowledge 2 RELATED WORK Graph-aware Autoencoder. In Proceedings of Knowledge-aware and Conver- sational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys Autoencoders and Deep Learning for RS. The adoption of deep 2018). ACM, New York, NY, USA, 7 pages. learning techniques is for sure one of the main advances of the last years in the field of recommender systems. In [23], the authors 1 INTRODUCTION propose the usage of a denoising autoencoder performs a top-N Recommender systems (RS) have become pervasive tools we expe- recommendation task by exploiting a corrupted version of the input rience in our everyday life. While browsing a catalog of items RSs data. exploit users’ past preferences in order to suggest new items they A pure Collaborative-Filtering (CF) model based on autoencoders might be interested in. is described in [18], in which the authors develop both user-based Knowledge Graphs have been recently adopted to represent and item-based autoencoders to tackle the recommendation task. items, compute their similarity and relatedness [6] as well as to feed Stacked Denoising Autoencoders are combined with collaborative Content-Based (CB) and hybrid recommendation engines [15]. The filtering techniques in [20] where the authors leverage autoen- publication and spread of freely available Knowledge Graphs in the coders to get a smaller and non-linear representation of the users- form of Linked Open Data datasets, such as DBpedia [1], has paved items interactions. This representation is eventually used to feed the way to the development of knowledge-aware recommendation a deep neural network which can alleviate the cold-start problem engines in many application domains and, still, gives the possibility thanks to the integration of side information. A hybrid recom- to easily switch from a domain to another one by just feeding the mender system is finally built. system with a different subset of the original graph. Wang et al. [22] suggest to apply deep learning methods on side information to reduce the sparsity of the rating matrix in collab- Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 orative approaches. In [21] the authors propose a deep learning (co-located with RecSys 2018), October 7, 2018, Vancouver, Canada. 2018. ACM ISBN Copyright for the individual papers remains with the authors. Copying approach to build a high-dimensional semantic space based on permitted for private and academic purposes. This volume is published and copyrighted the substitutability of items; then, a user-specific transformation by its editors.. is learned in order to get a ranking of items from such a space. Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018, Vancouver, Canada. V. Bellini et al. Analysis about the impact of deep learning on both recommenda- Post-apocalyptic_films tion quality and system scalability are presented in [7], where the t su bjec bje c authors first represent users and items through a rich feature set su t made on different domains and then map them to a latent space. Finally, a content-based recommender system is built. 12_Monkeys Cloud_Atlas_(film) Knowledge Graphs and Linked Open Data for RS. Several r works have been proposed exploiting side information coming Ghostwritten ec to dir from knowledge graphs and Linked Open Data (LOD) to enhance the performance of recommender systems. Most of them rely on the usage of DBpedia as knowledge graph. In [10], for the very first genre The_Wachowskis time, a LOD-based recommender system is proposed to alleviate cre ato r some of the major problems that affect collaborative techniques genre mainly the high sparsity of the user-item matrix. The effectiveness Drama Sense8 of such an approach seems to be confirmed by a large number of methods that have been proposed afterward. A detailed review of Figure 1: Part of knowledge-graph. LOD-based recommender systems is presented in [3]. By leverag- ing the knowledge encoded in DBpedia, it is possible to build an accurate content-based recommender system [5]. In the meantime, some communities started to build their own KGs; among them, the most important is for sure DBpedia2 . We may 3 BACKGROUND TECHNOLOGIES investigate relationships among entities in the KG by exploiting graph data sources and hence discover meaningful paths within the graph. Autoencoders. An Artificial Neural Network (ANN) is a math- In figure 1 we show an excerpt of the DBpedia graph, involving ematical model used to learn the relationships which underlie in some entities in the movie domain. Interestingly we see that DB- a given set of data. Starting from them, after a training phase, an pedia encodes both factual information, e.g. “Cloud_Atlas_(film) has ANN can be used to predict a single value or a vector, for regression director The_Wachowskis”, and categorical one such as “Cloud_Atlas_(film) or classification tasks. has subject Post-apocalyptic_films”. Basically, an ANN consists of a bunch of nodes, called neurons, distributed among three different kinds of layers: the input layer, one or more hidden layers, and the output layer. Typically, a neuron 4 SEMANTICS-AWARE AUTOENCODERS FOR of a layer is connected to all the neurons of the next layer, making RATING PREDICTION the ANN a fully connected network. The main idea of our approach is to map the connections in a Autoencoders are ANNs which try to set the output values equal to knowledge graph (KG) with those between units from layer i to the input ones, modeling an approximation of the identity function layer i+1, as shown in Figure 2. There we see that we injected y = f (xx ) = x . Roughly, they are forced to predict the same values only categorical information in the autoencoder and we left out they are fed with. Therefore, the number of output units and one factual one. As a matter of fact, if we analyze these two kinds of of the input nodes is the same, i.e. |xx | = |yy |. The aim of such a task information in DBpedia we may notice that: is to obtain a new representation of the original data based on the • the quantity of categorical information is higher than the values of the hidden layer neurons. In fact, each of these layers factual one. If we consider movies, the overall number of projects the input data in a new Euclidean space whose dimensions entities they are related with is lower than the overall number depend on the number of the nodes in the hidden layer. of categories; Therefore, when we use an autoencoder, we are not interested • categorical information is more distributed over the items at its output, but at the encoded representation it computes: in this than the factual one. Going back to movies we see that they way, we can leverage the implicit knowledge behind the original are more connected with each other via categories than via data, performing the so-called feature extraction task. The actual other entities. meaning of each dimension (represented by hidden nodes) in the Hence, we may argue that for a recommendation task where we are new space is unknown, but we can be sure that they are based on looking for commonalities among items, categorical data may result latent patterns binding the training cases. in more meaningful than the factual one. The main assumption KG. In 2012, Google announced its Knowledge Graph1 as a new behind this choice is that, for instance, if a user rated positively tool to improve the identification and retrieval of entities in return Cloud_Atlas this may be interpreted as a positive rating for the to a search query. A Knowledge Graph is a form of representation connected category Post-apocalyptic_films. of knowledge through a semantic (labelled) network that allows a In order to test our assumption, we mapped the autoencoder system to store the human knowledge in a structured format well network topology with the categorical information related to items understandable by a computer agent. rated by users. As we build a different autoencoder for each user depending on the items she rated in the past, the mapping with 1 https://googleblog.blogspot.it/2012/05/introducing-knowledge-graph-things-not. html 2 http://dbpedia.org Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018, Computing recommendations via a Knowledge Graph-aware Autoencoder Vancouver, Canada. a KG makes the hidden layer of variable length in the number of Our autoencoder is, therefore, able to learn the semantics behind units, depending on how much categorical information is available the ratings of each user and weight them through backpropagation. for items rated by the specific user. In our current implementation we used the well known sigmoid σ (x) = 1+e1 −x activation function since we normalized the design matrix to be within [0, 1]. We trained each autoencoder for 10,000 epochs with a learning rate of r = 0.03; weights are initialized to Kung fu films ct sub zero close values as Xavier et al. suggest in [9]. je ject sub Starting from the trained autoencoder, we may build a user pro- file by considering the categories associated to the items she rated 5.0 The Matrix subje ct The Matrix 5.0 ct subje in the past as features and by assigning them a value according Post-apocalyptic to the weights associated to the edges entering the corresponding films t su ec hidden units. Given a user u, the weight associated to a feature b bj jec su c is then the summation of the weights w ku (c) associated to the t 4.0 The Karate Kid (film 2010) subje ct The Karate Kid (film 2010) 4.0 edges entering the hidden node representing the Knowledge Graph ct subje category c after training the autoencoder with the ratings of u. Chinese films More formally, we have: sub ject ject sub n(c)| |IÕ ω u (c) = w ku (c) 2.0 Astro Boy (film) Astro Boy (film) 2.0 k =1 subje ct c t Computer- subje animated films where In(c) is the set of the edges entering the node representing the feature c. We remember that since the autoencoder is not fully connected, |In(c)| varies depending on the related connections to Input Layer Hidden Layer Output Layer the category c in the knowledge graph. By means of the weights associated with each feature, we can now model a user profile composed by a vector of weighted cat- Figure 2: Architecture of a semantic autoencoder. egorical features. Given F u as the set of categories belonging to all the items rated by u and F = u ∈U F u as the set of all features Ð Let n be the number of items rated by u available in the graph and among all the users in the system we have for each user u ∈ U and Ci = {c i1 , c i2 , . . . , c im } be the set of m categorical nodes associated for each feature c ∈ F : in the KG to the item i. Then, F u = ni=1 Ci is the set of features Ð mapped into the hidden layer for the user and the overall number P(u) = {⟨c, ω⟩ | ω = ω u (c) if c ∈ F u } of hidden units is equal to |F u |. Once the neural network setup is Considering that users provide a different number of ratings, we done, the training process takes place, feeding the neural network have an unbalanced distribution in the dimension of user profiles. with ratings provided by the user, normalized in the interval [0,1]. Moreover, as a user usually rate only a small subset of the entire It is worth noticing that, as the autoencoder, we build to mimic the catalog, we have a massive number of missing features belonging structure of the connections available in the Knowledge Graph, the to items not rated by u. In order to compute values associated neural network we build is not fully connected. Moreover, it does with missing features, we leverage an unsupervised deep learning not need bias nodes because these latter are not representative of model inspired by the word2vec approach [13]. It is an efficient any semantic data in the graph. technique originally conceived to compute word embeddings (i.e., Nodes in the hidden layer correspond to categorical information numerical representations of words) by capturing the semantic in the knowledge graph. At every iteration of the training process, distribution of textual words in a latent space starting from their backpropagation will change weights accordingly on edges among distribution within the sentences composing the original text. Given units in the layers, such that the sum of entering edges in an output a corpus, e.g., an excerpt from a book, it projects each word in a unit will reconstruct the user rating for the item represented by multidimensional space such that words similar from a semantic that unit. Regarding the nodes in the hidden layer, we may interpret point of view result closer to each other. In this way, we are able to the sum of the weights associated to entering edges computed at evaluate the semantic similarity between two words even if they the end of the training process as the importance of that feature never appear in the same sentence. Given a sequence of words in the generation of an output which, in our case, are the ratings [x 1 , . . . , x n ] within a window, word2vec compute the probability provided by the user. for a new word x ′ to be next one in the sequence. More formally, it computes p(x ′ | [x 1 , . . . , x n ]). 4.1 User Profiles In our scenario, we may imagine replacing sentences represented Once the network converges we have a latent representation of by sequences of words with user profiles represented by sequences features associated with a user profile together with their weights. of categories in c ∈ F u and then use the word2vec approach to However, very interestingly, this time the features represented by compute for a given user u the weight of missing features c ′ < F u . nodes in the hidden layer also have an explicit meaning as they We need to prepare the user profiles P(u) to be processed by are in a one to one mapping with categories in a knowledge graph. word2vec. Hence, we first generate a corpus made of sequences of Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018, Vancouver, Canada. V. Bellini et al. ordered features where the order is given by ω. The very prelim- where r (v j , i) is the rating assigned to i by the user v j . We use then inary step is that of selecting an order among elements c ∈ F u ratings from Equation (1) to provide top-N recommendation for which results coherently for all u ∈ U thus moving from the set each user. P(u) to a representative sequence of elements s(u). For each ⟨c, ω⟩ ∈ P(u) we create a corresponding pair ⟨c, norm(ω)⟩ 5 EXPERIMENTS with norm being the mapping function In this section, we present the experimental evaluations performed norm : [0, 1] 7→ {0.1, 0.2, 0.3, . . . , 1} on three different datasets. We first describe the structure of the datasets used in the experiments and the evaluation protocol and that linearly maps3 a value in the interval [0, 1] to a real value in then we move to the metrics adopted for the evaluation and the the set {0.1, 0.2, 0.3, . . . , 1}. The new pairs form the set discussion of obtained results. P norm (u) = {⟨c, norm(ω)⟩ | ⟨c, ω⟩ ∈ P(u)} Our experiments can be reproduced through the implementation available on our public repository4 . For each normalized user profile set P norm (u) we then build the corresponding sequence 5.1 Dataset s(u) = [. . . , ⟨c i , norm(ωiu )⟩, . . . ⟨c j , norm(ω uj )⟩, . . .] In order to validate our approach we performed experiments on with ωiu ≥ ω uj . the three datasets summarized in Table 1. Once we have the set S = {s(u) | u ∈ U } we can feed the word2vec algorithm with this corpus in order to find patterns of #users #items #ratings sparsity features according to their distribution across all users. In the pre- MovieLens 20M 138,493 26,744 20,000,263 99.46% diction phase, by using each user’s sequence of features s(u) as Amazon Digital Music 478,235 266,414 836,006 99.99% input for the trained word2vec model, we estimate the probability LibraryThing 7,279 37,232 626,000 99.77% of ⟨c ′, norm(ω ′ )⟩ ∈ v ∈U P norm (v) − P norm (u) to belong to the Ð Table 1: Datasets given context, or rather to be relevant for u. In other words, we compute p(⟨c ′, norm(ω ′ )⟩ | s(u)). It is worth noticing that given c ′ ∈ F u we may have multiple pairs with c ′ as first element in v ∈U P norm (v) −P norm (u). For in- Ð In our experiments, we referred to the freely available knowledge stance, given the category dbc:Kung_fu_films we may have both graph of DBpedia5 . The mapping contains 22,959 mapped items for ⟨dbc : Kung_fu_films, 0.2⟩ and ⟨dbc : Kung_fu_films, 0.5⟩, with MovieLens 20M6 , 4,077 items mapped for Amazon Digital Music7 the corresponding probabilities p(⟨dbc : Kung_fu_films, 0.2⟩ | s(u)), and 9,926 items mapped for LibraryThing8 . For our experiments, p(⟨dbc : Kung_fu_films, 0.5⟩ | s(u)). Still, as we want to add the we removed from the datasets all the items without a mapping in category dbc:Kung_fu_films together with its corresponding weight DBpedia. only once in the user profile we select only the pair with the highest probability. The new user profile is then 5.2 Evaluation protocol norm P̂(u) = P(u)∪{⟨c, ω⟩ | argmax p(⟨c, ω⟩ | s(u)) and ⟨c, ω⟩ < P (u)}Here, we show how we evaluated the performances of our methods ω ∈ {0.1, ...,1} in recommending items. We split the dataset using Hold-Out 80/20, We point out that while the original P(u) is built by exploiting ensuring that every user has 80% of their ratings in the training only content-based information, the enhanced user profile P̂(u) set and the remaining 20% in the test set. For the evaluation of our also considers collaborative information as it based also on the set approach we adopted the "all unrated items" protocol described S containing a representation for the profiles of all the users in U . in [19]: for each user u, a top-N recommendation list is provided by computing a score for every item i not rated by u, whether i 4.2 Computing Recommendations appears in the user test set or not. Then, recommendation lists are compared with the test set by computing both performance and Given the user profiles represented as vectors of weighted features, diversity metrics such as Precision, Recall, F-1 score, nDCG [12], recommendations are then computed by using a well-known k- aggregate diversity, and Gini index as a measure of sales diversity nearest neighbors approach. User similarities are found through [8]. projecting their user profile in a Vector Space Model, and then similarities between each pair of users u and v is computed using 5.3 Results Discussion the cosine similarity. For each user u we find the top-k similar neighbors to infer In our experiments, we compared our approach with three differ- the rate r for the item i as the weighted average rate that the ent states of the art techniques widely used in recommendation neighborhood gave to it: scenarios: BPRMF, WRMF and a single-layer autoencoder for rat- ing prediction. BPRMF [17] is a Matrix Factorization algorithm Ík j=1 sim(u, v j ) · r (v j , i) 4 https://github.com/sisinflab/SEMAUTO-2.0 r (u, i) = Ík (1) 5 https://dbpedia.org j=1 sim(u, v j ) 6 https://grouplens.org/datasets/movielens/20m/ 7 http://jmcauley.ucsd.edu/data/amazon/ 3 In our current implementation we use a standard minmax normalization. 8 https://www.librarything.com Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018, Computing recommendations via a Knowledge Graph-aware Autoencoder Vancouver, Canada. which leverages Bayesian Personalized Ranking as the objective applications of autoencoders to feature selection, we compute a function. WRMF [11, 16] is a Weighted Regularized Matrix Factor- latent representation of items but, in our case, we attach an explicit ization method which exploits users’ implicit feedbacks to provide semantics to selected features. This allows our system to exploit recommendations. In their basic version, both strategies rely ex- both the power of deep learning techniques and, at the same time, clusively on the User-Item matrix in a pure collaborative filtering to have a meaningful and understandable representation of the approach. They can be hybridized by exploiting side information, trained model. We used our approach to autoencode user ratings in i.e. additional data associated with items. In our experiments, we a recommendation scenario via the DBpedia knowledge graph and adopted categorical information found on the DBpedia Knowledge proposed an algorithm to compute user profiles then adopted to pro- Graph as side information. We used the implementations of BPRMF vide recommendations based on the semantic features we extract and WRMF available in MyMediaLite9 and implemented the au- with our autoencoder. Experimental results show that we are able to toencoder in Keras10 . We verified the statistical significance of our outperform state of the art recommendation algorithms in terms of experiments by using the Wilcoxon Signed Rank test we get a accuracy and diversity. Furthermore, we will compare our approach p-value very close to zero, which ensures the validity of our results. with other competitive baselines as suggested in more recent works In Table 3 we report the results gathered on the three datasets by [14]. The results presented in this paper pave the way to various applying the methods discussed above. As for our approach KG- further investigations in different directions. From a methodolog- AUTOENCODER, we tested it for a different number of neighbors ical and algorithmic point of view, we can surely investigate the by varying k. augmentation of further deep learning techniques via the injection In terms of accuracy, we can see that KG-AUTOENCODER outper- of explicit and structured knowledge coming from external sources forms our baselines on both MovieLens 20M and Amazon Digital of information. Giving an explicit meaning to neurons in an ANN as Music datasets, while on LibraryThing the achieved results are well as to their connections can fill the semantic gap in describing quite the same. In particular, on the LibraryThing dataset, only the models trained via deep learning algorithms. Moreover, having an fully-connected autoencoder performs better than our approach explicit representation of latent features opens the door to a better with regard to accuracy. and explicit user modeling. We are currently investigating how to Concerning diversity, we get much better results on all the datasets. exploit the structure of a Knowledge Graph-enabled autoencoder Furthermore, by analyzing the gathered results, it seems that our to infer qualitative preferences represented by means of expressive approach provides very discriminative descriptions for each user, languages such as CP-theories [4]. Providing such a powerful rep- letting us identify the most effective neighborhood and compute resentation may also result in being a key factor in the automatic both accurate and diversified recommendations. As a matter of fact, generation of explanation to recommendation results. we achieve the same results in terms of accuracy as the baselines by suggesting much more items. REFERENCES As shown in Table 2, KG-AUTOENCODER performs better on [1] Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, those datasets whose items can be associated with a large amount and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In Proceed- of categorical information, which implies the usage of many hidden ings of the 6th International The Semantic Web and 2Nd Asian Conference on Asian Semantic Web Conference (ISWC’07/ASWC’07). Springer-Verlag, 722–735. units. This occurs because very complex functions can be modeled [2] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks by ANNs if enough hidden units are provided, as Universal Ap- for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 191–198. proximation Theorem points out. For this reason, our approach [3] Marco de Gemmis, Pasquale Lops, Cataldo Musto, Fedelucio Narducci, and Gio- turned out to work better on MovieLens 20M dataset (whose related vanni Semeraro. 2015. Semantics-Aware Content-Based Recommender Systems. neural networks have a high number of hidden units) rather than In Recommender Systems Handbook. 119–159. [4] Tommaso Di Noia, Thomas Lukasiewicz, Maria Vanina Martínez, Gerardo I. the others. In particular, the experiments on LibraryThing dataset Simari, and Oana Tifrea-Marciuska. 2015. Combining Existential Rules with the show that the performances get worse as the number of the neurons Power of CP-Theories. In Proceedings of the Twenty-Fourth International Joint decreases, i.e. available categories are not enough. Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015. 2918–2925. [5] Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, and avg #features std avg #features/avg #items Markus Zanker. 2012. Linked Open Data to Support Content-based Recommender Systems. In Proceedings of the 8th International Conference on Semantic Systems Movielens 20M 1015.87 823.26 8.82 (I-SEMANTICS ’12). ACM, New York, NY, USA, 1–8. Amazon Digital Music 7.22 9.77 5.17 [6] T. Di Noia, V.C. Ostuni, J. Rosati, P. Tomeo, E. Di Sciascio, R. Mirizzi, and C. LibraryThing 206.88 196.64 1.96 Bartolini. 2016. Building a relatedness graph from Linked Open Data: A case study in the IT domain. Expert Systems with Applications 44 (2016), 354–366. Table 2: Summary of hidden units for mapped items only. [7] Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. 2015. A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Sys- tems. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 278–288. [8] Daniel M Fleder and Kartik Hosanagar. 2007. Recommender systems and their 6 CONCLUSION AND FUTURE WORK impact on sales diversity. In Proceedings of the 8th ACM conference on Electronic commerce. ACM, 192–199. In this paper, we have presented a recommendation approach that [9] Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training combines the computational power of deep learning with the rep- deep feedforward neural networks. In Proceedings of the thirteenth international resentational expressiveness of knowledge graphs. As for classical conference on artificial intelligence and statistics. 249–256. [10] Benjamin Heitmann and Conor Hayes. 2010. C.: Using linked data to build open, 9 http://mymedialite.net collaborative recommender systems. In In: AAAI Spring Symposium: Linked Data 10 https://keras.io Meets Artificial IntelligenceâĂŹ. (2010. Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018, Vancouver, Canada. V. Bellini et al. k F1 Prec. Recall nDCG Gini aggrdiv MOVIELENS 20M AUTOENCODER − 0.21306 0.21764 0.20868 0.24950 0.01443 1587 BPRMF − 0.14864 0.15315 0.14438 0.17106 0.00375 3263 BPRMF + SI − 0.16838 0.17112 0.16572 0.19500 0.00635 3552 WRMF − 0.19514 0.19806 0.19231 0.22768 0.00454 766 WRMF + SI − 0.19494 0.19782 0.19214 0.22773 0.00450 759 5 0.18857 0.18551 0.19173 0.21941 0.01835 5214 10 0.21268 0.21009 0.21533 0.24945 0.01305 3350 20 0.22886 0.22684 0.23092 0.27147 0.01015 2417 KG-AUTOENCODER 40 0.23675 0.23534 0.23818 0.28363 0.00827 1800 50 0.23827 0.23686 0.23970 0.28605 0.00780 1653 100 0.23961 0.23832 0.24090 0.28924 0.00662 1310 AMAZON DIGITAL MUSIC AUTOENCODER − 0.00060 0.00035 0.00200 0.00102 0.33867 3559 BPRMF − 0.01010 0.00565 0.04765 0.02073 0.00346 539 BPRMF + SI − 0.00738 0.00413 0.03480 0.01624 0.06414 2374 WRMF − 0.02189 0.01236 0.09567 0.05511 0.01061 103 WRMF + SI − 0.02151 0.01216 0.09325 0.05220 0.01168 111 5 0.01514 0.00862 0.06233 0.04365 0.03407 3378 10 0.01920 0.01091 0.07994 0.05421 0.05353 3449 20 0.02233 0.01267 0.09385 0.06296 0.08562 3523 KG-AUTOENCODER 40 0.02572 0.01460 0.10805 0.06980 0.14514 3549 50 0.02618 0.01486 0.10974 0.07032 0.17192 3549 100 0.02835 0.01608 0.11964 0.07471 0.24859 3448 LIBRARYTHING AUTOENCODER − 0.01562 0.01375 0.01808 0.01758 0.07628 2328 BPRMF − 0.01036 0.00954 0.01134 0.01001 0.06764 3140 BPRMF + SI − 0.01065 0.00994 0.01148 0.01041 0.10753 4946 WRMF − 0.01142 0.01071 0.01223 0.01247 0.00864 439 WRMF + SI − 0.01116 0.01030 0.01217 0.01258 0.00868 442 5 0.00840 0.00764 0.00931 0.00930 0.13836 4895 10 0.01034 0.00930 0.01163 0.01139 0.07888 3558 20 0.01152 0.01029 0.01310 0.01248 0.04586 2245 KG-AUTOENCODER 40 0.01195 0.01073 0.01347 0.01339 0.02800 1498 50 0.01229 0.01110 0.01378 0.01374 0.02403 1312 100 0.01278 0.01136 0.01461 0.01503 0.01521 873 Table 3: Experimental Results [11] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for [15] S. Oramas, V.C. Ostuni, T. Di Noia, X. Serra, and E. Di Sciascio. 2016. Sound and Implicit Feedback Datasets. In Proceedings of the 2008 Eighth IEEE International music recommendation with knowledge graphs. ACM Transactions on Intelligent Conference on Data Mining (ICDM ’08). IEEE Computer Society, Washington, DC, Systems and Technology 8, 2 (2016). USA, 263–272. [16] Rong Pan, Yunhong Zhou, Bin Cao, Nathan N. Liu, Rajan Lukose, Martin Scholz, [12] Kalervo Järvelin and Jaana Kekäläinen. 2000. IR Evaluation Methods for Retriev- and Qiang Yang. 2008. One-Class Collaborative Filtering. In Proceedings of the 2008 ing Highly Relevant Documents. In Proceedings of the 23rd Annual International Eighth IEEE International Conference on Data Mining (ICDM ’08). IEEE Computer ACM SIGIR Conference on Research and Development in Information Retrieval Society, Washington, DC, USA, 502–511. (SIGIR ’00). ACM, New York, NY, USA, 41–48. [17] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. [13] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings 2013. Distributed Representations of Words and Phrases and their Composi- of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI ’09). tionality. In Advances in Neural Information Processing Systems 26: 27th Annual AUAI Press, Arlington, Virginia, United States, 452–461. Conference on Neural Information Processing Systems 2013. Proceedings of a meeting [18] Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015. held December 5-8, 2013, Lake Tahoe, Nevada, United States. 3111–3119. AutoRec: Autoencoders Meet Collaborative Filtering. In Proceedings of the 24th [14] Cataldo Musto, Tiziano Franza, Giovanni Semeraro, Marco de Gemmis, and International Conference on World Wide Web (WWW ’15 Companion). ACM, New Pasquale Lops. 2018. Deep Content-based Recommender Systems Exploiting York, NY, USA, 111–112. Recurrent Neural Networks and Linked Open Data. In Adjunct Publication of the [19] Harald Steck. 2013. Evaluation of Recommendations: Rating-prediction and 26th Conference on User Modeling, Adaptation and Personalization (UMAP ’18). Ranking. In Proceedings of the 7th ACM Conference on Recommender Systems ACM, New York, NY, USA, 239–244. (RecSys ’13). ACM, New York, NY, USA, 213–220. Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop 2018 (co-located with RecSys 2018), October 7, 2018, Computing recommendations via a Knowledge Graph-aware Autoencoder Vancouver, Canada. [20] Florian Strub, Romaric Gaudel, and Jérémie Mary. 2016. Hybrid Recommender System Based on Autoencoders. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems (DLRS 2016). ACM, New York, NY, USA, 11– 16. [21] Jeroen B. P. Vuurens, Martha Larson, and Arjen P. de Vries. 2016. Exploring Deep Space: Learning Personalized Ranking in a Semantic Space. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems (DLRS 2016). ACM, New York, NY, USA, 23–28. [22] Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative Deep Learning for Recommender Systems. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15). ACM, New York, NY, USA, 1235–1244. [23] Yao Wu, Christopher DuBois, Alice X. Zheng, and Martin Ester. 2016. Collabora- tive Denoising Auto-Encoders for Top-N Recommender Systems. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM ’16). ACM, New York, NY, USA, 153–162.