Tag-based embedding representations in neural collaborative filtering approaches

Tag-based embedding representations in neural collaborative filtering approaches Tahar-RafikBoudiba IRIS/IRIT UMR 5505 CNRS

118 Route de Narbonne F-31062 TOULOUSE CEDEX 9 France

ADBI Accelarator Data & Business Intelligence

8 rue rossini 75009 Paris France

TaoufiqDkaki taoufiq.dkaki@irit.fr IRIS/IRIT UMR 5505 CNRS

118 Route de Narbonne F-31062 TOULOUSE CEDEX 9 France

Tag-based embedding representations in neural collaborative filtering approaches 1613-0073 2540C623459F5EF4DCE37A988312E232 GROBID - A machine learning software for extracting information from scholarly documents Learning representation folksonomies deep learning word embedding social tagging

Learning user-item interactions in collaborative systems have become a promising method to improve the performance of collaborative filtering approaches. In such systems, contents surrounding users and items, particularly user tags, have a key role since they are leveraged with collaborative filtering approaches. Tags are commonly represented using the bag of words paradigm, although it is subject to ambiguity due principally to the poor semantic relation between tags. Recent methods suggest the use of deep neuronal architectures as they attempt to learn semantic and contextual word representations. On this basis, we have addressed how to integrate semantically such content into different neural collaborative filtering models for rating prediction. Based on effective models initially developed to learn user-item interaction, in this paper, we have extended different neural collaborative filtering models for rating prediction to evaluate the impact of using static or contextualized word embeddings within a neural collaborative filtering strategy. Moreover, the presented models use dense tag-based user and item representations extracted from pre-trained static Word2vec and contextual BERT. In addition, the paper emphasizes the impact of using contextualized tag embedding neighbors in a neural graph collaborative filtering approach that learns an aggregated function. Finally, to determine whether the use of different neural architectures can influence the recommendation quality, we adapt neural architectures, including three popular end-to-end learning models that are an MLP an autoencoder, and a Graph Neural Network. We evaluated and compared all the models with recent baselines on several MovieLens datasets.

Introduction

Deep learning (DL) techniques are the milestones of several recent recommendation engines. Platforms such as Facebook 1 and Pinterest 2 have already shared their experience in using DL for recommender systems (RS). In such platforms, Collaborative Filtering (CF) approaches are mainly exploited. Such methods enable the users to get recommendations on favourite items. When such methods are put into practice in RS, it implies being able to predict how users will rate a particular item. Classical CF approaches are based either on Matrix Factorization (MF) techniques or on simple user-item vector similarity methods. However, these models share the property of being essentially linear since they combine user and item latent factors linearly. In contrast, DL models for RS have the main property of learning multiple level of representation and hence have enabled the deep integration of several type of content. As result, recent neural collaborative filtering approaches capture more complex user-item interactions and enable high-level abstractions for content description. Such content often makes reference to user's tags since they are commonly used to describe items and users' profiles using the bag of words representation. Although such representations commonly appearing as one-hot vectors are efficient for computing user-item similarity, many problems such as ambiguity and vocabulary mismatch have been raised [1]. In this sens, common NLP techniques suggest the use of dense representations in the forme of eitheir user or item agregated semantic embedding vectors extracted from pre-trained Word2vec neural language model [2,3]. However, how to include efficiently such embedding vectors at the top layer of a neural CF architecture? A design choice is to combine the two embedding vectors, then feed them through multiple fully connected layers to get the likelihood that a user interacts with an item. In that way, multiplying the embedding vectors element-wise with each other or simply concatenating them might be a raisonnable technique to integrate both user and item dense representations in a neural CF model. Some works have discussed text embedding aggregation techniques [4] others have suggested the concatenation of mean Word Embedding since they compute word average embedding representations [3]. Recent neural approaches for recommendation consider in addition other relationships such as neighborhood proximity among graph-based approaches. Such approaches have been proposed to explore multi-layer neighbor embedding representations. Since these embeddings are integrated with neural CF architectures this has resulted in Neural Graph CF (NGCF) approaches [5]. In this paper, we have considered tag embeddings as the starting point for integrating explicitly a tag-based vocabulary within neural collaborative filtering models. However, such initiative raises some research issues, such as determining the most efficient neural architecture to use or defining the best tag embedding representations. At this end, we handle dense tag-based representations that we exploit within effective neural CF models for rating prediction. We have developed several neural models that combine neural CF with tagging information integrated into a training process. For this purpose, we handled word vector representation to include more valuable tag' semantic and so to enhance neural CF models ability to generalize. We compared different tag embedding representations from pre-trained static (Word2vec) and contextual BERT models. Furthermore, we evaluated the impact of using such tag embeddings through several neuronal model's architecture that is an MLP, an autoencoder and a graph-based neural collaborative architecture. We provided empirical results from MovieLens Dataset 10 M, 20M et 25M. The main contributions of this paper are summarized as follows:

• Integrate efficiently tag-embedding representations into several neural CF models. The remaining of the paper is organized as follows. The next section presents some background and reviews recent research works related to content-based recommendation using neuronal networks and word vector representation. We gathered works that describe neural approaches from a collaborative filtering point of view, specifying the most used neural architectures. Section 3 highlights the basis of our proposed models. Section 4 details datasets, evaluation metrics, and experimental settings. Section 5 gives the evaluation results and discusses performance comparison with baselines. Following these sections, we will draw our conclusion in the final section.

Background and related works

DL methods have made breakthroughs in data representation learning from various data sources. As result, recent neural recommendation models have been able to handle learning representations of user preferences, item features and textual interactions [6,1]. Yet, neural recommendation models attempt to introduce in addition, tag semantic-aware representations based on distributional tag semantic used as features [6]. In this area, Musto et al., [7] exploit Word2vec approach to learn a low dimensional vector space word representation and exploited it to represent both items and user profiles in a recommendation scenario. Zhang et al., [8] proposed to integrate traditional matrix factorization with Word2vec for user profiling and rating prediction. Liang et al., [9] exploited pre-trained word embeddings from Word2vec to represent user tags and construct item and user profiles based on the items' tags set and users' tagging behaviors. They use deep neural networks (DNNs) and recurrent neural networks (RNNs) to extract the latent features of items and users to predict ratings. Moreover, TagEm-bedSVD [10] uses pre-trained word embedding from Word2vec for tags to enhance personalized recommendations that are integrated to an SVD model in the context of cross-domain CF. Other works [11,1] take advantage of network embedding techniques to propose embedding-based recommendation models that exploit CF approaches. Along with learning content representation for recommendation, exploiting rating patterns often require the use of a neural network-based embedding model that is first pre-trained. Features are extracted and integrated into a CF model by fusing those features with latent factors thanks to non-linear transformations that better leverage abstract content representations and so perform higher quality recommendations. Since pre-training word embedding from large-scale corpus became widely used in different information retrieval tasks, it was also exploited to generate recommendations by ranking useritem matrix from users' similar tags vocabulary. Models such as Word2vec [12] or GloVe [13] for instance learned meaningful user tag representations by modeling tag co-occurrences. However, these methods don't consider the deep contextual information that some single content words may suffer. Moreover, they do not handle unknown words. In contrast, contextualized word representations such as BERT [14], have been proposed to overcome the lack of static word embeddings, since it was shown that such contextual neural language model improves the performance of many downstream tasks. Yet, graph-based neuronal approaches [15,16,17] have considered heterogeneous graphs as they try to overcome the missing of relationship modeling in features-based neural recommendation models. Such approaches have been proposed to explore multi-layer neighbor embedding representations [18]. Neural graph network models consider content information features extracted from either graph properties [19] or learned from node embedding representations [20]. Particularly, Neural Graph Collaborative Filtering (NGCF) approaches exploit feature representations of the user-item graph structure by propagating either user-based or items-based content embeddings on it [21]. Such process is often the result of learning aggregation functions that allow deep-based relationship modeling among both user-item interaction and content features. In this way, Graph Convolutional Networks (GCNs) have also been exploited through learning aggregator functions which required additional layers to obtain a convolution neighborhood aggregation by neighborhood's embeddings at these layers [22]. As result, deep semantic representations are extracted using embeddings propagation on user-item graph structure. An instance of such method is used in Ying et al,. [23] since it employs multiple graph convolution layers on an item-item graph in Pinterest3 image recommendation.

In the following, we introduce some recommendation models of the literature that have handled neural CF approaches [24,25,26]. Those models resolved user rating prediction. Some of them have been adapted to include tagging content [27,28,29], they are mostly composite through which multiple neural building modules compose a single distinguishable function that is trained end-to-end. Here, we introduce some summary definitions related to tagging that will allow us to address later most common architectures and topologies giving recommendation strategies for each of them. A folksonomy 𝐹 can be defined as a 4-tuple 𝐹 = (𝑈, 𝑇, 𝐼, 𝐴), where U is the set of users annotating the set of items 𝐼, 𝑈 = {𝑢 1 , 𝑢 2 , ...𝑢 𝑀 } where each 𝑢 𝑖 is a user. 𝑇 is the set of tags that includes the vocabulary expressed by the folksonomy. 𝐼 is the set of tagged items by user 𝐼 = {𝑖 1 , 𝑖 2 ...𝑖 𝑁 }. 𝐴 = {𝑢 𝑚 , 𝑡 𝑘 , 𝑖 𝑗 } ∈ 𝑈 × 𝑇 × 𝐼 is the set of annotations of each tag 𝑡 𝑘 to an item 𝑖 𝑗 by user 𝑢 𝑚 . We have also considered 𝑅 as the set of user ratings 𝑟 𝑢,𝑖 .

MLP-based neural collaborative filtering for Recommendation

Approaches of neural collaborative filtering (NCF) for rating prediction often involves dealing with binary property of implicit data. Some works [30,31,26] have in addition discussed the choice of the neural architecture to be implemented. A possible instance of the neural CF approach can be formulated using a multi-layer perceptron (MLP). As addressed in [30] the input layer (the embedding layer) is a fully connected layer that maps the sparse representations to dense feature vectors. It consists of two feature vectors 𝑣 𝑓 (𝑢) and 𝑣 𝑓 (𝑖) that describe user 𝑣 𝑈 (𝑢) and item 𝑣 𝐼 (𝑖) represented initially through one-hot encoding. The obtained user (item) embedding can be seen as the latent vector for user (item). The user embedding and item embedding are then fed into neural CF layers to map the latent vectors to prediction scores. Final output layer is the predicted score 𝑟 ^𝑢,𝑖 , and training is performed by minimizing the point wise loss between 𝑟 ^𝑢,𝑖 and its target value 𝑟 𝑢,𝑖 . NCF predictive model can be formulated as:

𝑟 ^𝑢,𝑖 = MLP(𝑃 𝑇 𝑢 . 𝑣 𝑓 (𝑢) , 𝑄 𝑇 𝑖 . 𝑣 𝑓 (𝑖) |𝑃 𝑢 , 𝑄 𝑖 , Γ MLP )(1)

𝑃 𝑢 ∈ R 𝑀 ×𝐾 and 𝑄 𝑖 ∈ R 𝑁 ×𝐾 are latent factor matrix for users and items respectively. Γ 𝑀 𝐿𝑃 denotes the model parameters of the interaction function that is defined as a multi-layer neural network.

Autoencoder-Based collaborative filtering for Recommendation

Another way to consider neural CF is to approach user-item rating as a matrix 𝑋 ∈ 𝑅 𝑚×𝑛 with partially observable row vectors that form a user 𝑢 ∈ the set of users 𝑈 = {1...𝑚} given by the set of user ratings 𝑟 (𝑢) = {𝑋 𝑢1 ...𝑋 𝑢𝑚 } ∈ 𝑅 and column vectors from the set of items 𝑖 ∈ 𝐼 = {1...𝑛} also given by their corresponding ratings 𝑟 (𝑖) = {𝑋 𝑖1 ...𝑋 𝑖𝑛 }. An efficient neural method to encode each partially observed vector into law-dimensional latent space is to handle an autoencoder architecture as suggested in [25] that will reconstruct the output space to predict missing ratings for recommendation [25,32,24]. Given a set of rating vectors 𝑟 (𝑢) and 𝑟 (𝑖) ∈ R 𝑑 , the autoencoder solves:

𝑚𝑖𝑛 𝜃 ∑︁ 𝑟∈𝑅 = ||𝑟 − ℎ(𝑟; 𝜃)|| 2(2)

Where ℎ(𝑟; 𝜃) is the reconstruction of input 𝑟 ∈ R 𝑑 that is defined as:

ℎ(𝑟; 𝜃) = 𝑓 (𝑊.𝑔(𝑉 𝑟 + 𝜇) + 𝑏)(3)

𝑓 (.) and 𝑔(.) are activation functions associated to the encoder and decoder respectively and 𝜃 gather model parameters; 𝑊 ∈ R 𝑑×𝑘 and 𝑉 ∈ R 𝑘×𝑑 are weight matrices and 𝜇 ∈ R 𝑘 , 𝑏 ∈ R 𝑁 biases. In an item-based recommendation perspective, the autoencoder applies 𝑟 (𝑖) as the set of input vectors. Weights associated to those vectors are updating during backpropagation.

Neural Graph Collaborative Filtering for Recommendation

NGCF approaches are particular in the sens that they exploit embeddings of users and items represented initially as a graph structure. Most of them adopt a user-item bipartite graph of as it much represents user-item interactions [15,20,16]. Promising recent methods suggest learning user and item representations from their bipartite associated graph by stacking multiple embedding propagation layers to allow high-order connectivity from user-item interactions [21]. Other works [15] learn aggregator functions that induce the embedding of a new node given its features and neighborhood. In the following we formalized what can be associated to a neural graph-based collaborative filtering approach for user rating prediction based on multiple embedding aggregation layers. This neural graph-oriented approach is designed to exploit node embeddings from neighborhood aggregation. Given a bipartite weighted graph of user-item 𝒢 = (𝒱, ℰ, 𝐴, 𝒳 ), with 𝒱 = {𝒱 𝑢 ∪ 𝒱 𝑖 }, ℰ denotes the set of undirected weighted edges representing user ratings, 𝐴 is the adjacency matrix and 𝒳 ∈ R 𝑚×𝑛 is defined as the node feature matrix. Let ℎ 0 𝑣 = 𝑥 𝑢 𝑣 with 𝑣 ∈ 𝒱 𝑢 be the user node feature at the 0th layer. Then, At the k-th layer :

ℎ 𝑘 𝑣 = 𝛿(𝑊 𝑘 ∑︁ 𝑢∈𝑁 (𝑣) ℎ 𝑘−1 𝑢 |𝑁 (𝑣)| + 𝐴 𝑘 ℎ 𝑘−1 𝑣 ) (4) ℎ 𝑘−1 𝑣

is the embedding of user node 𝑣 ∈ 𝒱 𝑢 from previous layer. |𝑁 (𝑣)| is the number of the neighbors of node 𝑣. The sum expressed in the equation enables aggregate neighboring features of node 𝑣 from previous layer. 𝛿 is the activation function (Tanh) that enables non-linearity. 𝑊 𝑘 and 𝐴 𝑘 are trainable parameters. The final embedding after K layers (𝑘 ∈ {1...𝐾}) is extracted from the output layer: 𝑧 𝑢 𝑣 = ℎ 𝐾 𝑣 after K layers. This can be expressed as a matrix multiplication form for the whole graph as:

𝐻 𝑙+1 = 𝛿(𝐻 𝑙 𝑊 𝑙 0 + 𝐴 ˜𝐻𝑙 𝑊 𝑙 1 )(5)

In such a way that 𝐴 ˜= 𝐷 −1/2 𝐴𝐷 −1/2 with 𝐴 represents adjacency matrix and 𝐷 represents the degree matrix. Thereafter, after applying similar process to item nodes embeddings to get 𝑧 𝑖 𝑣 with 𝑣 ∈ 𝒱 𝑖 , one way is to employ a concatenated operator ⊕ on both user and item final embeddings to obtain 𝑧 𝑢⊕𝑖 𝑒 = 𝑧 𝑢 𝑣 ⊕ 𝑧 𝑖 𝑣 that represents the edge embedding 𝑒 𝑢,𝑖 between a user node 𝑣 𝑢 and item node 𝑣 𝑖 , with 𝑒 𝑢,𝑖 = [𝑣 𝑢 , 𝑣 𝑖 ]. These edge embeddings are passed through a link regression layer to obtain predicted user-item ratings. The model is trained end-to-end by minimizing a regression loss function (RMSE or root mean square error between predicted and true ratings) using stochastic gradient descent (SGD) updates of the model parameters, with minibatches of user-item training edges fed into the model.

Overview of the proposed models

In this section, we introduce our tag-aware neural models for recommendation. More explicitly, we integrate tag-based embeddings into CF neural architectures, namely a Multilayer perceptron, an autoencoder and a neural graph based model. More explicitly, to integrate side information into predictive neural models a naive approach consists of appending additional user/item bias to the rating prediction. We estimate that computing those biases can be handled either by hand-crafted engineering or by implementing an appropriate CF strategy. A simple Neural collaborating filtering framework architecture implies considering the input layer(embedding layer) as a fully connected layer that projects sparse representation of users and items to dense vectors. To integrate explicitly tags vocabulary in a neural model for rating prediction, we have made use of feature vectors that we have considered as tag vector representations sharing a common embedding space using projection matrices. The obtained user (item) embedding can be seen as the latent vector for the user (item) in the tag latent space. Feature vectors 𝑣 𝑓 (𝑢) and 𝑣 𝑓 (𝑖) are reconsidered since we have projected tag representations into lower dimension using projected matrices E and F. Consequently, tag-based vector representation is expressed as a user feature vector 𝑣 𝑓 (𝑢 ˜):

𝑣 𝑓 (𝑢 ˜) = 1 |𝑇 𝑢 | ∑︁ 𝑡 𝑘 ∈𝑇𝑢 𝐸(𝑡 𝑘 )(6)

Such as 𝑡 𝑘 ∈ R 𝑐 is the embedding vector associated with tag k, and c denotes the embedding dimension. 𝐸 denotes the projection matrix with 𝐸 ∈ R 𝑑×𝑐 .

Similarly, if 𝐹 denotes the projection matrix with 𝐹 ∈ R 𝑑×𝑐 , then the item feature vector

𝑣 𝑓 (𝑖 ˜) is expressed as: 1-19 𝑣 𝑓 (𝑖 ˜) = 1 |𝑇 𝑖 | ∑︁ 𝑡 𝑘 ∈𝑇 𝑖 𝐹 (𝑡 𝑘 )(7)

We denoted 𝑇 𝑢 the set of tags of a user 𝑢 and 𝑇 𝑖 as the set of related tags describing a particular item. Moreover, we have obtained embeddings for tags from Word2vec and BERT pre-trained neural model by handling projection matrices E and F ∈ R 𝑑×𝑐 .

CF-based MLP model

Extended tag-based NCF predictive model can be reformulated relying on the previous NCF model that has been described in section 3.1 equation (1) as:

𝑟 ^𝑢,𝑖 = MLP( 𝑣 𝑓 (𝑢 ˜), 𝑣 𝑓 (𝑖 ˜), 𝜃 MLP )(8)

The user and item embeddings can be fed into a multi-layer neural model.

Where, 𝑟 ^(𝑢, 𝑖) is the rating score for a user on an item. Figure 1 (𝒞) details an instance of the model. Prediction Pipeline exploits user and item vectors extracted from dense space representation (Figure 1 (𝒜) ), hidden layers are added to learn interactions between user and item latent features, a regressor at the last hidden layer is set to produce the final rating. (Figure 1 (𝒜) ) is a dynamic module in which dense representations are computed through inner product of user and items embedding' representations. Tag embedding representations are extracted from neural pre-trained language model (Figure 1 (ℰ) ).

CF-based Autoencoder model

Following the autoencoder paradigm, instead of encoding user vectors containing user ratings to be predicted like in Autorec [25], we have extended a multilayered autoencoder architecture to integrate element wise product of pre-trained tag-based embeddings. Such embeddings are concatenated with the user rating representations and are projected on a dimensional latent (hidden) space. As such, user' rating 𝑟(𝑢 𝑚 , 𝑖 𝑙 ) of a particular user is reconstructed using an objective function 𝜃 that minimizes :

∑︁ ||𝑟(𝑢 𝑚 , 𝑖 𝑙 ) ⊕ (𝑣 𝑓 (𝑢 ˜) ⊗ 𝑣 𝑓 (𝑖 ˜)) − ℎ(𝑟(𝑢 𝑚 , 𝑖 𝑙 ) ⊕ (𝑣 𝑓 (𝑢 ˜) ⊗ 𝑣 𝑓 (𝑖 ˜)); 𝜃)|| 2 (9)

Where (𝑟(𝑢 𝑚 , 𝑖 𝑙 ), 𝜃) is the reconstruction of the input 𝑟(𝑢 𝑚 , 𝑖 𝑙 ) ∈ R 𝑑 . The operator ⊗ denotes element-wise multiplication between user and item feature vectors. The operator ⊕ denotes a concatenation operator. 𝑡𝑎𝑛ℎ is the selected activation function. Figure 1 (ℬ) presents a detailed instance of the model. Prediction Pipeline exploits user and item vectors extracted from dense space representation. Such representations are concatenated with user rating and fed as input of the autoencoder model. Layers are added to learn interactions between user and item latent features to be compressed in a dense space. User's ratings reconstruction from the dense space produce the final rating.

Neural graph CF-based model

As part of collaborative filtering approaches, neural graph-based networks consider for the most [20,19,15] bipartite graphs of users and items in a recommendation context, where edges represent the rating interactions between the users and the items. From the bipartite graph 𝐺 defined in section 2.1.3 where nodes' classes are derived from the set of user nodes 𝒱 𝑢 and the set of item nodes 𝒱 𝑖 respectively. Each edge corresponds to whatever user's rates an item. Each edge 𝑒 𝑢,𝑖 ∈ ℰ is associated to a value 𝑟 (𝑢,𝑖) ∈ {0, 1}.In order to learn the topological structure of each class of node neighborhoods, the idea is to aggregate feature information from node's local neighborhood [15], however in this paper we handled node's features from pre-trained static and contextual tag embeddings model. Users' nodes features are taken from mean average users' tags embedding vectors, equivalently items' nodes features are represented throws the mean average of their tag embeddings vectors. We have previously explored a simple neighborhood aggregation process in section 2.0.3. By defining a neighborhood function 𝑁 (𝑣), that is set to a fixed-size (in our experiments K=2), the bipartite graph is sampled as the model learn a function that generates aggregates from tag-based textual feature node neighbors. This method can be generalized by applying different aggregation methods to nodes ∈ 𝐺 by concatenating the features with the nodes itself. For this purpose, we have associated each node 𝑣 ∈ {𝒱 𝑢 ∪ 𝒱 𝑖 } to features from word vector representation by joining tag-based vector representation 𝑣 𝑓 (𝑢 ˜) and 𝑣 𝑓 (𝑖 ˜) (Figure 1 (𝒢) ). We have designed a mean aggregation function that is commonly used since it imply element wise mean of the feature vectors in ℎ 𝑘−1 𝑢 . We have also designed a convolution aggregator function that we have detailed next.

Mean aggregator function

Since the rating interactions between users and items are represented as a bipartite graph 𝐺 = (𝑈, 𝑉, 𝐸), 𝒱 𝑢 and 𝒱 𝑖 corresponds respectively to users and items sets. Thus, aggregation mean tag embedding features from the neighbors of the node 𝑣 ∈ {𝒱 𝑢 ∪ 𝒱 𝑖 } is processed given the following update rule (Figure1 𝒟 (𝒜 ′ ) ):

ℎ 𝑘 𝑁 (𝑣) = 1 |𝑁 (𝑣)| 𝐷 𝑝 [ℎ 𝑘−1 𝑣 ]

We give the forward pass through layer 𝑘 as follows:

ℎ 𝑘 𝑣 = 𝛿(𝑐𝑜𝑛𝑐𝑎𝑡[𝑊 𝑘 𝐼 𝐷 𝑝 [ℎ 𝑘−1 𝑣 ], 𝑊 𝑘 𝑁 𝑒𝑖𝑔ℎ𝑏𝑜𝑟 ℎ 𝑘 𝑁 (𝑣)] + 𝑏 𝑘 )

Where, ℎ 𝑘 𝑣 is the output node 𝑣 at layer 𝑘, 𝑊 𝑘 𝐼 and 𝑊 𝑘 𝑁 𝑒𝑖𝑔ℎ𝑏𝑜𝑟 are trainable parameters, 𝑏 𝑘 is an optional bias, 𝑑 𝑘 is node feature dimensionality at layer 𝑘, 𝛿 is a non linear activation function (Tanh), 𝐷 𝑝 is a random dropout with probability 𝑝 applied to its argument vector used to reduce model's over-fitting. 𝑁 (𝑣) represent the neighborhood of a node 𝑣 ∈ {𝒱 𝑢 ∪ 𝒱 𝑖 } . The number of trainable parameters in layer k for the mean aggregator is 𝑑 𝑘 .𝑑 𝑘−1 + 𝑑 𝑘 .

Convolutional aggregator function

To generalize the collaborative filtering process from a graph convolutional network perspective, we adopted a GCN aggregator [15] (Figure1 𝒟 (𝒜 ′ ) ), that concatenates nodes from the previous layer representation ℎ 𝑘−1 𝑣 with the aggregated neighborhood vectors ℎ 𝑘 𝑁 (𝑣) . Features are updated given the following equation:

ℎ 𝑘 𝑁 (𝑣) = 1 |𝑁 (𝑣)| + 1 (ℎ 𝑘−1 𝑣 + ∑︁ 𝑣∈𝑁 (𝑣) ℎ 𝑘−1 𝑣 )(10)

Forward pass through layer 𝑘 is defined as:

ℎ 𝑘 𝑣 = 𝛿(𝑊 𝑘 .ℎ 𝑘 𝑁 (𝑣) + 𝑏 𝑘 )(11)

Where, 𝑊 𝑘 , is a trainable weight matrix, shared between all nodes 𝑣 ∈ {𝒱 𝑢 ∪ 𝒱 𝑖 }. The size of 𝑊 𝑘 is given as 𝑑 𝑘 × 𝑑 𝑘−1 . The number of trainable parameters in layer 𝑘 for the GCN aggregator is 𝑑 𝑘 .𝑑 𝑘−1 + 𝑑 𝑘 .

Experiments

In this section, we have conducted experiments intending to answer the following research questions:

RQ1: Are tag-based contextual embeddings efficient representations to be used in a neural CF model compared to static tag-based embedding representations? RQ2: Which extended neural collaborative architecture perform significant improvement and ranking quality for a rating prediction task?

From there, an underlying research question can be derived, it concerns the various methods used for aggregating tag embeddings. Assuming that, the methods used for aggregating tag embeddings may affect the performance of recommendation models.

RQ3: Are contextual neural graph embeddings more efficient representations to be used in a neural collaborative filtering architecture ? regarding such process, which aggregator function should leads to better recommendation performance? A mean aggregator function? a convolutional aggregator function?

Experimental Settings

1. Datasets: The data sets describe 5-stars ratings and free-text tagging from MovieLens, a movie recommendation service. We extracted user annotations from the ML-10M, ML-20M, and ML-25M data sets. Only users that have annotated and rated at least 20 movies were selected. We observed from Table 1 an unequal distribution of user rating classes, because of users trend scoring items with good rating values. This can lose models capacity to generalize. To overcome, we over-sample minority classes [33] by duplicating samples from the minority class and adding them to the training data.

Hyper-parameters:

After splitting the data in each dataset into random 90%, 10% training and testing sets, we hold 10% of the training set for hyper-parameters tuning. Then, we conducted 5 cross-fold validation strategy in each dataset and averaged RMSE measure. We have applied a grid search for hyper-parameters tuning such as the learning rate that we tuned among values ∈ {0.0001, 0.0005, 0.001, 0.005}, latent dimensions ∈ {100, 200, 300, 400, 500, 1000} for both autoencoder and MLP architecture. We handled the Neural Collaborative Autoencoder with a default rating of 2.5 for testing set without training observations. Graph neuronal and convolutional models handled same dataset, except that models derived from these approaches handle edges prediction throw bipartite graph samples. We tuned the dropout ratio4 from values ∈ {0.0, 0.1, , 0.8}, we have also defined the neighbor nodes embeddings features at a particular layer of 2. The models were optimized thanks to the well known Adam optimizer.

Evaluation Metrics:

We have evaluated rating prediction using two metrics: Mean Absolute Error (MAE) and Root Mean Square Error(RMSE). Both of them are widely used for rating prediction in recommended systems. Given a predicted rating 𝑟 ^𝑢,𝑖 and a ground-truth rating 𝑟 𝑢,𝑖 from the user 𝑢 for item 𝑖, the RMSE is computed as:

𝑅𝑀 𝑆𝐸 = √︃ 1 𝑁 ∑︁ 𝑢,𝑖 (𝑟 𝑢,𝑖 − 𝑟 ^𝑢,𝑖 ) 2(12)

Where 𝑁 indicates the number of ratings between users and items.

MAE is computed as follows:

𝑀 𝐴𝐸 = 1 𝑁 ∑︁ 𝑢,𝑖 |𝑟 𝑢,𝑖 − 𝑟 ^𝑢,𝑖 |(13)

Indeed, we have also evaluated ranking accuracy using NDCG (Normalized Discounted Cumulative Gain [34]) at 10. For this purpose, we assumed rating values at 5 as being a good appreciation of a user regarding a movie. In contrast, rating values under 3 are considered as bad. Hence, the rating value of each movie is used as a gained value for its ranked position in the result. The gain is summed from the ranked position from 1 to 𝑛. To compute 𝑁 𝐷𝐶𝐺, relevance scores are set to six(5) points scale from 1 to 5 and denotes the relevance score from low to strong relevance. We set the Ideal DCG for user movies ranked in decreasing order of their ratings. NDCG values presented further are averaged over user testing set.

Tag-based embedding representations

We have considered tag-based embeddings thanks to word vector representations. We have extracted such tag-based embedding representations from pre-trained neural language models. Owing to the users' writing discrepancy, users' tags semantic meaning is often ambiguous.

Tags can be composed of several words and may contain subjective expressions. They can also be unique words which can occasionally lead to a lack of context. That makes it difficult to integrate tags explicitly in an effective neural CF architecture. Our main objective is to map users, items and their tags' interaction in the same latent space. Rather to exploit straightly dimensional latent space representations of users and items like in most neural collaborative approaches [30,35], we propose to project first both users' and items' representations into a dense tag space representation. Both previous neural approaches are somehow representative of our objective since they are from CF. We assume that users and items are represented by their corresponding tags. Particularly, they are represented from the aggregate average of their tag embedding representations.

1. Static Word2vec tag-based embdddings: We have handled static tag-based embedding vectors from Word2vec. We have exploited pre-trained vectors trained on part of Google News dataset (about 100 billion words) and have extracted user's tags embedding by associating them to a vector of a well known fixed size for each tag. However, we found that some tags were out of tag vocabulary, since those user tags represent respectively 8%, 5%, 5% of our Movielens Datasets 10M, 20M and 25 millions ratings. We fixed this issue by initiate those samples with random vector values. The inability to handle unknown or out-of-vocabulary words is one of limitation encountered when using such pre-trained model. Finally, each set of tags per user is represented through a multidimensional vector of 𝑑𝑖𝑚 = 300. 2. Contextualized BERT tag-based embdddings: We have addressed extracting contextualised embeddings from BERT neural language model. For this purpose, we have assumed that the fist token which is '[CLS]' that captures the context is treated as sentence embeddings [36]. The word embedding sequence corresponding to each set of tags is entered into the pre-trained model. We have then handled the activation from the last layers of BERT model since the features associated with the activation in these layers are far more complex and include more contextual information. These contextual embeddings are used as input to our proposed models. Thus, each set of tags per user is represented through a multidimensional embedding vector of 𝑑𝑖𝑚 = 768. We have implemented the pre-trained bert-base model5 (12 blocks of hidden dimension 768, 12 heads for attention) and defining the '[CLS]' which indicates the beginning of a sequence as well as the '[SEP]' that we used as a separation between two tags of a same sequence.

Evaluation and Performance comparison

First, to solve the RQ1, we extended neural models [30,25] by handling static and contextual tag-based embedding representations. We compared those models with recent neural models from CF that we set as baselines. We evaluated rating score accuracy using RMSE (Root Mean Square Error) and MAE (Mean Absolut Error). Then, to address RQ2, we have implemented an MLP and an autoencoder-based CF architecture then, we compared the performance of each neural model according to tag-based embedding representations with which such models were integrated. Moreover, ranking accuracy metric was carried out among the different neural models using NDCG (Normalized Discount Cumulative Gain) at 10. Finally, to answer RQ3, we managed to exploit user/item based tag embeddings thanks to an aggregate function that is learned from training samples of user-item graphs. Such function operates either by performing element wise multiplication between the tag embedding neighbor vectors of a given node or by concatenating tag embedding vectors with their tag embedding neighbor vectors to get the embedding of that node.

We have detailed bellow all the models that are included in the neural models Comparative study.

• Neural GMF-MLP [30]: Is a neural CF approach that exploits a multi-layer perceptron (MLP) to learn the user-item interaction function. The bottom input layer consists of two

Effects on recommendation quality and ranking (RQ1)

Results of our experiments are synthesized in Table 2. Initially, as regards to ML-10M dataset, top RMSE and MAE scores are valued from CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 model with 𝑀 𝐴𝐸 = 0.715 and 𝑅𝑀 𝑆𝐸 = 0.791. Our proposed contextual tag embeddings based NGCF model has also achieved top quality ranking to reach 𝑁 𝐷𝐶𝐺@10 = 0.48. We have noticed that the static tag-based embedding extension of this model that is CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝑊 2𝑉 has also achieved good results outperforming most of the baselines except TRSDL model [9] that has reached 𝑀 𝐴𝐸 = 0.73, 𝑅𝑀 𝑆𝐸 = 0.810 with a ranking metric of 𝑁 𝐷𝐶𝐺@10 = 0.45. Regarding to Hinsage model [15] that reached 𝑀 𝐴𝐸 = 0.75, 𝑅𝑀 𝑆𝐸 = 0.85 with a ranking score of 𝑁 𝐷𝐶𝐺@10 = 0.48 and CF-GNN ++ 𝑀 𝑒𝑎𝑛 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 model that reached 𝑀 𝐴𝐸 = 0.774 and 𝑅𝑀 𝑆𝐸 = 0.89 with a ranking quality that achieved 𝑁 𝐷𝐶𝐺@10 = 0.451, we might be tempted at first sight to claim that NGCF approaches describe strong performance compared with other neural collaborative approaches no matter which tag embeddings we have integrated to the models. However, by considering the significant performance of the neural models that integrate contextualized tag embeddings such as Neural CF-MLP ++ 𝐵𝑒𝑟𝑡 that has achieved scores valued to 𝑅𝑀 𝑆𝐸 = 0.72 and 𝑀 𝐴𝐸 = 0.93 or even the autoencoder model CF-Autoencoder ++ 𝐵𝑒𝑟𝑡 that has risen 𝑅𝑀 𝑆𝐸 = 0.96 and 𝑀 𝐴𝐸 = 0.76, our thoughts then focused to determine which model's architecture performs better among all the proposed neural architectures that effectively do integrate static/contextual tag embedding representations or those who additionally have aggregated tag-based neighborhood embeddings.

Furthermore, in ML-20M dataset, the same NGCF model named CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 has shown top RMSE and MAE score with 𝑀 𝐴𝐸 = 0.723 and 𝑅𝑀 𝑆𝐸 = 0.802. This confirms the performance of NGCF approaches combined with contextualized tag embeddings. It also appeared that such models reach top quality ranking, additionally, ranking metric score shown that the most competitive baseline is Hinsage [15] with a ranking quality that does not exceed 𝑁 𝐷𝐶𝐺@10 = 0.448. Both CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 and CF-GNN ++ 𝑀 𝑒𝑎𝑛 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 models have the highest ranking scores with 𝑁 𝐷𝐶𝐺@10 = 0.47 and 𝑁 𝐷𝐶𝐺@10 = 0.441 respectively. This is the case even if those models do not use the same aggregation technique nor the same tag embeddings process. In this regard, we found that mean aggregator function which is operated with static tag embeddings in a NGCF process named CF-GNN ++ 𝑀 𝑒𝑎𝑛 Agg (𝑘=2) 𝑊 2𝑉 has performed well and obtained 𝑀 𝐴𝐸 = 0.80, 𝑅𝑀 𝑆𝐸 = 0.94 with a ranking quality of 𝑁 𝐷𝐶𝐺@10 = 0.464 which is a score that outperforms the autoencoder-based model extension named CF-Autoencoder++ 𝐵𝑒𝑟𝑡 with 𝑁 𝐷𝐶𝐺@10 = 0.44 since this model has already achieved 𝑀 𝐴𝐸 = 0.811 and 𝑅𝑀 𝑆𝐸 = 0.89.This demonstrates the efficiency of such aggregation function.

Finally, in ML-25M dataset, impact of contextualized tag embeddings on models is definitely established since both RMSE and MAE scores have shown significant improvements compared to baselines. Such is the case for Neural CF-MLP ++ 𝐵𝑒𝑟𝑡 model that has reached 𝑀 𝐴𝐸 = 0.791, 𝑅𝑀 𝑆𝐸 = 0.83 for a quality ranking of 𝑛𝑑𝑐𝑔@10 = 0.46. It is likewise for CF-Autoencoder++ 𝐵𝑒𝑟𝑡 model with RMSE and MAE scores to 𝑀 𝐴𝐸 = 0.79, 𝑅𝑀 𝑆𝐸 = 0.86 and a ranking quality to 𝑁 𝐷𝐶𝐺@10 = 0.445. On top of that, impact of aggregator functions are also distinguishable through NCGF model scores since we noticed that results were much improved using a convolutional aggregator function applied to contextualized tag embeddings. CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 model performed best RMSE and MAE scores comparing to CF-GNN ++ 𝑀 𝑒𝑎𝑛 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 model which exploits a mean aggregator function despite such model integrates contextualized tag embeddings. We ensure that those results can be strengthened by increasing the training data.

Effects on error distribution (RQ2)

In the following, we have discussed the effectiveness of our approaches on predicting user ratings with an acceptable amount of error. We highlighted impact of exploiting contextualized tag-based embedding representations through studying error distribution when predicting user ratings. Such impact is summarized at the top of Figure 2. Error distribution values have been presented among testing sets of the data sets ML-10M, ML-20M and ML-25M. This is to propose an overview of the error distributions resulted from baselines compared with those from our predictive models that do integrate tag-based static or contextualized embedding representations and describe specific architectures for each model.

First, in ML-10M dataset we observe that error distribution values from the models exploiting contextual tag embeddings such as CF-MLP ++ 𝐵𝑒𝑟𝑡 and CF-Autoencoder ++ 𝐵𝑒𝑟𝑡 are most located in the interval ∈ [−1, 1] compared to the error distribution values of the other baselines. We also observe that the NGCF models that are CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 and CF-GNN ++ 𝑀 𝑒𝑎𝑛 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 outperforming all other models with a number of 980 and 890 accurate predictions respectively. Secondly, in ML-20M we notice that CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 model conduct to a large number of accurate predictions which are estimated to be 7220. Such performance is closely followed by the CF-GNN ++ 𝑀 𝑒𝑎𝑛 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 with a number of 4250 accurate predictions. Lastly, in ML-25M the same models reached 7980 and 7740 accurate predictions respectively.

Impact of learning aggregated tag-based functions (RQ3)

We have given for each model the validation scores after 20 epochs, this allows us to estimate the model's capacity to generalize past the data that it was trained on. From the bottom of the figure 2 we have Analyzed which models perform optimal convergence rate. It appears that among the three collections that are ML-10M, ML-20M and ML-25M, the convergence rate of the models are clearly more significant when it comes from neural graph approaches. Particularly, CF-GNN ++ 𝑀 𝑒𝑎𝑛 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 and CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 that are our NGCF models that exploit fine-tuned tag embedding representations. This leaves us to believe that when contextualized tag embeddings are aggregate throw neighborhood embeddings they give more effective representations of users and items and enhance recommendation quality. We argue that our NGCF approaches catch the multiple semantic dimensions that a tags can take have including the abstract formalization of tag neighborhood embeddings that have conducted to fine-gained representations.

Conclusion

Following the experiments, we came to the conclusion that exploiting neural graph models to learn aggregation functions has enabled us to gain quality recommendations and improve ranking quality. We have shown that handling a convolutional aggregator function can generalize an efficient graph-based neural collaborative filtering process. It concatenates contextualized tag embedding representations of user/item nodes from previous layer representations. This has enabled us to gain more refined embedding features and achieved to catch non-trivial tagging behavior.

Figure 1 :1Figure 1: Extended NCF based on an MLP (on the right) and an Autoencoder (on the left), Graph-based NCF architecture based on tag feature embeddings and aggregator functions

Figure 2 :2Figure 2: At the top of this figure, we presented each neural model's error distribution. At the bottom, we gave model's validation scores after 20 epochs

• Evaluate the impact of static/contextual embedding representations and comparing model architecture. • Evaluate impact of multi-layer neighbor static/contextual embedding representations to be exploited in a neural graph CF model. • Extensive series of experiments on real data from several MovieLens data sets.

Table 11Statistical details of the 10M, 20M and 25M collections from MovieLens

Collection10M20M25MNumber of users71567138000162541Number of movies106812700062423TAS( Tag assignment)955804650001093360Ratings10000054 200000025000095Nodes71142055535363Edges24564126080210725PeriodDec-2015 Oct-2016 Nov-2019

Table 2 A2synthesis of RMSE and MAE values for each model including 𝑛𝑑𝑐𝑔@10 scores, the best scores are in bold.Evaluation measuresModelsML-10MML-20MML-25MMAERMSEndcg@10MAERMSEndcg@10MAERMSEndcg@10Neural CF-MLP ++ 𝑊 2𝑣 Neural CF-MLP ++ 𝐵𝑒𝑟𝑡 CF-Autoencoder ++ 𝑊 2𝑣 CF-Autoencoder ++ 𝐵𝑒𝑟𝑡0.77 0.72 0.83 0.760.98 0.93 1.1 0.960.43 0.46 0.411 0.420.88 0.791 0.85 0.8110.96 0.86 0.97 0.890.381 0.42 0.39 0.440.84 0.791 0.80 0.798 0.865 1.01 0.83 1.020.42 0.46 0.42 0.445U-Autorec [25]0.821.090.380.841.070.370.811.010.40Neural CF-MLP[30]0.730.980.440.891.0250.390.870.920.43CF-GNN ++ 𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2) 𝑊 2𝑣0.881.100.470.801.020.490.821.040.44CF-GNN ++ 𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2)𝐵 𝑒𝑟𝑡0.7740.890.4510.780.850.4410.772 0.799 0.471CF-GCN ++ 𝑐𝑜𝑛𝑣 𝐴𝑔𝑔 (𝑘=2)𝑊 2𝑣0.7980.8210.470.740.838 0.4640.790.8010.465CF-GCN ++ 𝑐𝑜𝑛𝑣 𝐴𝑔𝑔 (𝑘=2) 𝐵𝑒𝑟𝑡0.715 0.7910.48 0.723 0.7820.470.712 0.7870.48HINSAGE [15]0.750.850.480.771 0.801 0.4480.740.791 0.475TRSDL [9]0.730.8100.450.740.820 0.4610.750.870.44

https://www.facebook.com/ https://www.pinterest.com/ https://www.pinterest.fr/ The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent over-fitting BERT was pre-trained on a corpus composed of 11,038 unpublished books belonging to 16 different domains and 2,500 million words from English Wikipedia text passages

vectors that describe user u and item i in a binarized sparse vector (one-hot encoding), such model employ only the identity of a user and an item as input feature.

• Neural CF-MLP ++ : Is an extension of Neural CF-MLP, the model integrates in the bottom input layer two feature vectors that are described as tag embedding features of users and items. These features are extracted from word vector representation. User and item feature vectors are extracted from tag-based embeddings, with 300-dimensional word vectors from pre-trained Word2vec model Neural CF-MLP ++ 𝑊 𝑜𝑟𝑑2𝑣𝑒𝑐 and Neural CF-MLP ++ 𝐵𝐸𝑅𝑇 that exploits a 768-dimensional word vectors from pre-trained BERT model.

• U-Autorec [25] U-AutoRec is a neural CF framework for rating prediction that exploits an autoencoder architecture. It takes user vectors as input and reconstructs them in the output layer. The values in the reconstructed vectors are the predicted value of the corresponding position. • CF-Autoencoder ++ Our autoencoder-based neural collaborative approach that integrates as input tag embedding features by performing element-wise multiplication on their word vector representations and do concatenate such representations with user/item rating vectors to get the reconstructed ratings. We have termed the autoencoder-based model using static tag vector representations as CF-Autoencoder ++ 𝑊 𝑜𝑟𝑑2𝑣𝑒𝑐 meanwhile CF-Autoencoder ++ 𝐵𝐸𝑅𝑇 stands for autoencoder-based model using contextual tag vectors.

• CF-GNN ++ 𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2) Our NGCF tag-based predictive model that generates node embeddings by sampling and aggregating features (tag embeddings) from nodes local neighborhood using a mean aggregation function that operates at neighborhood of 𝑘 = 2. We distinguish between the NGCF model that handles features extracted from tag-based embeddings using 300-dimensional tag vectors extracted from pre-trained Word2vec model and that we term CF-GNN 𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2) 𝑊 𝑜𝑟𝑑2𝑣𝑒𝑐 and CF-GNN 𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2) 𝐵𝑒𝑟𝑡 that exploits 768-dimensional tag vectors from pre-trained BERT model.

• CF-GCN ++ 𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2) We do consider this NGCF model as being convolutional since it learn convolutional aggregator function that concatenate the node's previous layers representations with the aggregated neighborhood vectors. We differentiate between the model that handles features extracted from tag-based static embeddings with 300-dimensional tag vectors from pre-trained Word2vec model and that we term CF-GCN 𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2) 𝑊 𝑜𝑟𝑑2𝑣𝑒𝑐 and CF-GCN 𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2) 𝐵𝑒𝑟𝑡 that exploit 768-dimensional tag vectors from pre-trained BERT model.

• Hinsage [15] is a model that employs a technique for computing node representations in an inductive way. This method operates by sampling a fixed-size neighborhood of each user/item node and then performing a specific aggregator over all the sampled neighbors' feature vectors. This model learns general-purpose node embeddings that use the graph structure and particularly node features. It was evaluated for a rating prediction task using demographic users information (no tags information). • TRSDL [9]: Tag-aware recommender system that uses a deep neural networks (DNNs) and recurrent networks (RNNs) to extract latent features of both users and items. In their model Liang et al., [9] use Word2Vec for mapping user tags to k-dimensional dense

Semantic-based tag recommendation in scientific bookmarking systems HA MHassan GSansonetti FGasparetti AMicarelli Proceedings of the 12th ACM Conference on Recommender Systems the 12th ACM Conference on Recommender Systems 2018 JManotumruksa CMacdonald IOunis arXiv:1606.07828 Modelling user preferences using word embeddings for context-aware venue recommendation 2016 arXiv preprint ARücklé SEger MPeyrard IGurevych arXiv:1803.01400 Concatenated power mean word embeddings as universal cross-lingual sentence representations 2018 arXiv preprint A context-aware user-item representation learning for item recommendation LWu CQuan CLi QWang BZheng XLuo ACM Transactions on Information Systems (TOIS) 37 2019 Exploiting parallelism opportunities with deep learning frameworks YEWang C.-JWu XWang KHazelwood DBrooks ACM Transactions on Architecture and Code Optimization (TACO) 18 2020 Hybrid neural recommendation with joint deep representation learning of ratings and reviews HLiu YWang QPeng FWu LGan LPan PJiao Neurocomputing 374 2020 Word embedding techniques for contentbased recommender systems: An empirical evaluation CMusto GSemeraro MDe Gemmis PLops Recsys posters 2015 Collaborative multi-level embedding learning from reviews for rating prediction WZhang QYuan JHan JWang IJCAI 16 2016 Trsdl: Tag-aware recommender system based on deep learning-intelligent computing systems NLiang H.-TZheng J.-YChen AKSangaiah C.-ZZhao Applied Sciences 8 799 2018 Tagembedsvd: Leveraging tag embeddings for cross-domain collaborative filtering MVijaikumar SShevade MNMurty International Conference on Pattern Recognition and Machine Intelligence Springer 2019 Exploiting pre-trained network embeddings for recommendations in social networks LGuo Y.-FWen X.-HWang Journal of Computer Science and Technology 33 2018 Distributed representations of sentences and documents QLe TMikolov International conference on machine learning 2014 Glove: Global vectors for word representation JPennington RSocher CDManning Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) the 2014 conference on empirical methods in natural language processing (EMNLP) 2014 JDevlin M.-WChang KLee KToutanova arXiv:1810.04805 Bert: Pre-training of deep bidirectional transformers for language understanding 2018 arXiv preprint Inductive representation learning on large graphs WHamilton ZYing JLeskovec Advances in neural information processing systems 2017 TNKipf MWelling arXiv:1609.02907 Semi-supervised classification with graph convolutional networks 2016 arXiv preprint Tag-aware recommender systems: a state-of-the-art survey Z.-KZhang TZhou Y.-CZhang Journal of computer science and technology 26 767 2011 Discriminative embeddings of latent variable models for structured data HDai BDai LSong International conference on machine learning 2016 node2vec: Scalable feature learning for networks AGrover JLeskovec Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining 2016 Grarep: Learning graph representations with global structural information SCao WLu QXu Proceedings of the 24th ACM international on conference on information and knowledge management the 24th ACM international on conference on information and knowledge management 2015 Neural graph collaborative filtering XWang XHe MWang FFeng T.-SChua Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval the 42nd international ACM SIGIR conference on Research and development in Information Retrieval 2019 Learning fair representations for recommendation: A graph-based perspective LWu LChen PShao RHong XWang MWang Proceedings of the Web Conference 2021 the Web Conference 2021 2021 Graph convolutional neural networks for web-scale recommender systems RYing RHe KChen PEksombatchai WLHamilton JLeskovec Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2018 Autoencoder-based collaborative filtering YOuyang WLiu WRong ZXiong International Conference on Neural Information Processing Springer 2014 Autorec: Autoencoders meet collaborative filtering SSedhain AKMenon SSanner LXie Proceedings of the 24th international conference on World Wide Web the 24th international conference on World Wide Web 2015 Joint neural collaborative filtering for recommender systems WChen FCai HChen MDRijke ACM Transactions on Information Systems (TOIS) 37 2019 GKDziugaite DMRoy arXiv:1511.06443 Neural network matrix factorization 2015 arXiv preprint Tag-aware recommender systems based on deep neural networks YZuo JZeng MGong LJiao Neurocomputing 204 2016 Joint deep modeling of users and items using reviews for recommendation LZheng VNoroozi PSYu Proceedings of the tenth ACM international conference on web search and data mining the tenth ACM international conference on web search and data mining 2017 Neural collaborative filtering XHe LLiao HZhang LNie XHu T.-SChua Proceedings of the 26th international conference on world wide web the 26th international conference on world wide web 2017 Extracting deep semantic information for intelligent recommendation WChen H.-TZheng X.-XMao International Conference on Neural Information Processing Springer 2017 Tnam: A tag-aware neural attention model for top-n recommendation RHuang NWang CHan FYu LCui Neurocomputing 385 2020 Smote: synthetic minority over-sampling technique NVChawla KWBowyer LOHall WPKegelmeyer Journal of artificial intelligence research 16 2002 Cumulated gain-based evaluation of ir techniques KJärvelin JKekäläinen ACM Transactions on Information Systems (TOIS) 20 2002 XHe XDu XWang FTian JTang T.-SChua arXiv:1808.03912 Outer product-based neural collaborative filtering 2018 arXiv preprint NReimers IGurevych arXiv:1908.10084 Sentence-bert: Sentence embeddings using siamese bert-networks 2019 arXiv preprint