<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Tag-based embedding representations in neural collaborative filtering approaches</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Tahar-Rafik</forename><surname>Boudiba</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory" key="lab1">IRIS/IRIT</orgName>
								<orgName type="laboratory" key="lab2">UMR 5505 CNRS</orgName>
								<address>
									<addrLine>118 Route de Narbonne</addrLine>
									<postCode>F-31062</postCode>
									<settlement>TOULOUSE CEDEX 9</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">ADBI Accelarator Data &amp; Business Intelligence</orgName>
								<address>
									<addrLine>8 rue rossini</addrLine>
									<postCode>75009</postCode>
									<settlement>Paris</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Taoufiq</forename><surname>Dkaki</surname></persName>
							<email>taoufiq.dkaki@irit.fr</email>
							<affiliation key="aff0">
								<orgName type="laboratory" key="lab1">IRIS/IRIT</orgName>
								<orgName type="laboratory" key="lab2">UMR 5505 CNRS</orgName>
								<address>
									<addrLine>118 Route de Narbonne</addrLine>
									<postCode>F-31062</postCode>
									<settlement>TOULOUSE CEDEX 9</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Tag-based embedding representations in neural collaborative filtering approaches</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">2540C623459F5EF4DCE37A988312E232</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T06:48+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Learning representation</term>
					<term>folksonomies</term>
					<term>deep learning</term>
					<term>word embedding</term>
					<term>social tagging</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Learning user-item interactions in collaborative systems have become a promising method to improve the performance of collaborative filtering approaches. In such systems, contents surrounding users and items, particularly user tags, have a key role since they are leveraged with collaborative filtering approaches. Tags are commonly represented using the bag of words paradigm, although it is subject to ambiguity due principally to the poor semantic relation between tags. Recent methods suggest the use of deep neuronal architectures as they attempt to learn semantic and contextual word representations. On this basis, we have addressed how to integrate semantically such content into different neural collaborative filtering models for rating prediction. Based on effective models initially developed to learn user-item interaction, in this paper, we have extended different neural collaborative filtering models for rating prediction to evaluate the impact of using static or contextualized word embeddings within a neural collaborative filtering strategy. Moreover, the presented models use dense tag-based user and item representations extracted from pre-trained static Word2vec and contextual BERT. In addition, the paper emphasizes the impact of using contextualized tag embedding neighbors in a neural graph collaborative filtering approach that learns an aggregated function. Finally, to determine whether the use of different neural architectures can influence the recommendation quality, we adapt neural architectures, including three popular end-to-end learning models that are an MLP an autoencoder, and a Graph Neural Network. We evaluated and compared all the models with recent baselines on several MovieLens datasets.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Deep learning (DL) techniques are the milestones of several recent recommendation engines. Platforms such as Facebook <ref type="foot" target="#foot_0">1</ref> and Pinterest 2 have already shared their experience in using DL for recommender systems (RS). In such platforms, Collaborative Filtering (CF) approaches are mainly exploited. Such methods enable the users to get recommendations on favourite items. When such methods are put into practice in RS, it implies being able to predict how users will rate a particular item. Classical CF approaches are based either on Matrix Factorization (MF) techniques or on simple user-item vector similarity methods. However, these models share the property of being essentially linear since they combine user and item latent factors linearly. In contrast, DL models for RS have the main property of learning multiple level of representation and hence have enabled the deep integration of several type of content. As result, recent neural collaborative filtering approaches capture more complex user-item interactions and enable high-level abstractions for content description. Such content often makes reference to user's tags since they are commonly used to describe items and users' profiles using the bag of words representation. Although such representations commonly appearing as one-hot vectors are efficient for computing user-item similarity, many problems such as ambiguity and vocabulary mismatch have been raised <ref type="bibr" target="#b0">[1]</ref>. In this sens, common NLP techniques suggest the use of dense representations in the forme of eitheir user or item agregated semantic embedding vectors extracted from pre-trained Word2vec neural language model <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>. However, how to include efficiently such embedding vectors at the top layer of a neural CF architecture? A design choice is to combine the two embedding vectors, then feed them through multiple fully connected layers to get the likelihood that a user interacts with an item. In that way, multiplying the embedding vectors element-wise with each other or simply concatenating them might be a raisonnable technique to integrate both user and item dense representations in a neural CF model. Some works have discussed text embedding aggregation techniques <ref type="bibr" target="#b3">[4]</ref> others have suggested the concatenation of mean Word Embedding since they compute word average embedding representations <ref type="bibr" target="#b2">[3]</ref>. Recent neural approaches for recommendation consider in addition other relationships such as neighborhood proximity among graph-based approaches. Such approaches have been proposed to explore multi-layer neighbor embedding representations. Since these embeddings are integrated with neural CF architectures this has resulted in Neural Graph CF (NGCF) approaches <ref type="bibr" target="#b4">[5]</ref>. In this paper, we have considered tag embeddings as the starting point for integrating explicitly a tag-based vocabulary within neural collaborative filtering models. However, such initiative raises some research issues, such as determining the most efficient neural architecture to use or defining the best tag embedding representations. At this end, we handle dense tag-based representations that we exploit within effective neural CF models for rating prediction. We have developed several neural models that combine neural CF with tagging information integrated into a training process. For this purpose, we handled word vector representation to include more valuable tag' semantic and so to enhance neural CF models ability to generalize. We compared different tag embedding representations from pre-trained static (Word2vec) and contextual BERT models. Furthermore, we evaluated the impact of using such tag embeddings through several neuronal model's architecture that is an MLP, an autoencoder and a graph-based neural collaborative architecture. We provided empirical results from MovieLens Dataset 10 M, 20M et 25M. The main contributions of this paper are summarized as follows:</p><p>• Integrate efficiently tag-embedding representations into several neural CF models. The remaining of the paper is organized as follows. The next section presents some background and reviews recent research works related to content-based recommendation using neuronal networks and word vector representation. We gathered works that describe neural approaches from a collaborative filtering point of view, specifying the most used neural architectures. Section 3 highlights the basis of our proposed models. Section 4 details datasets, evaluation metrics, and experimental settings. Section 5 gives the evaluation results and discusses performance comparison with baselines. Following these sections, we will draw our conclusion in the final section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background and related works</head><p>DL methods have made breakthroughs in data representation learning from various data sources. As result, recent neural recommendation models have been able to handle learning representations of user preferences, item features and textual interactions <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b0">1]</ref>. Yet, neural recommendation models attempt to introduce in addition, tag semantic-aware representations based on distributional tag semantic used as features <ref type="bibr" target="#b5">[6]</ref>. In this area, Musto et al., <ref type="bibr" target="#b6">[7]</ref> exploit Word2vec approach to learn a low dimensional vector space word representation and exploited it to represent both items and user profiles in a recommendation scenario. Zhang et al., <ref type="bibr" target="#b7">[8]</ref> proposed to integrate traditional matrix factorization with Word2vec for user profiling and rating prediction. Liang et al., <ref type="bibr" target="#b8">[9]</ref> exploited pre-trained word embeddings from Word2vec to represent user tags and construct item and user profiles based on the items' tags set and users' tagging behaviors. They use deep neural networks (DNNs) and recurrent neural networks (RNNs) to extract the latent features of items and users to predict ratings. Moreover, TagEm-bedSVD <ref type="bibr" target="#b9">[10]</ref> uses pre-trained word embedding from Word2vec for tags to enhance personalized recommendations that are integrated to an SVD model in the context of cross-domain CF. Other works <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b0">1]</ref> take advantage of network embedding techniques to propose embedding-based recommendation models that exploit CF approaches. Along with learning content representation for recommendation, exploiting rating patterns often require the use of a neural network-based embedding model that is first pre-trained. Features are extracted and integrated into a CF model by fusing those features with latent factors thanks to non-linear transformations that better leverage abstract content representations and so perform higher quality recommendations. Since pre-training word embedding from large-scale corpus became widely used in different information retrieval tasks, it was also exploited to generate recommendations by ranking useritem matrix from users' similar tags vocabulary. Models such as Word2vec <ref type="bibr" target="#b11">[12]</ref> or GloVe <ref type="bibr" target="#b12">[13]</ref> for instance learned meaningful user tag representations by modeling tag co-occurrences. However, these methods don't consider the deep contextual information that some single content words may suffer. Moreover, they do not handle unknown words. In contrast, contextualized word representations such as BERT <ref type="bibr" target="#b13">[14]</ref>, have been proposed to overcome the lack of static word embeddings, since it was shown that such contextual neural language model improves the performance of many downstream tasks. Yet, graph-based neuronal approaches <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b16">17]</ref> have considered heterogeneous graphs as they try to overcome the missing of relationship modeling in features-based neural recommendation models. Such approaches have been proposed to explore multi-layer neighbor embedding representations <ref type="bibr" target="#b17">[18]</ref>. Neural graph network models consider content information features extracted from either graph properties <ref type="bibr" target="#b18">[19]</ref> or learned from node embedding representations <ref type="bibr" target="#b19">[20]</ref>. Particularly, Neural Graph Collaborative Filtering (NGCF) approaches exploit feature representations of the user-item graph structure by propagating either user-based or items-based content embeddings on it <ref type="bibr" target="#b20">[21]</ref>. Such process is often the result of learning aggregation functions that allow deep-based relationship modeling among both user-item interaction and content features. In this way, Graph Convolutional Networks (GCNs) have also been exploited through learning aggregator functions which required additional layers to obtain a convolution neighborhood aggregation by neighborhood's embeddings at these layers <ref type="bibr" target="#b21">[22]</ref>. As result, deep semantic representations are extracted using embeddings propagation on user-item graph structure. An instance of such method is used in Ying et al,. <ref type="bibr" target="#b22">[23]</ref> since it employs multiple graph convolution layers on an item-item graph in Pinterest<ref type="foot" target="#foot_2">3</ref> image recommendation.</p><p>In the following, we introduce some recommendation models of the literature that have handled neural CF approaches <ref type="bibr" target="#b23">[24,</ref><ref type="bibr" target="#b24">25,</ref><ref type="bibr" target="#b25">26]</ref>. Those models resolved user rating prediction. Some of them have been adapted to include tagging content <ref type="bibr" target="#b26">[27,</ref><ref type="bibr" target="#b27">28,</ref><ref type="bibr" target="#b28">29]</ref>, they are mostly composite through which multiple neural building modules compose a single distinguishable function that is trained end-to-end. Here, we introduce some summary definitions related to tagging that will allow us to address later most common architectures and topologies giving recommendation strategies for each of them. A folksonomy 𝐹 can be defined as a 4-tuple 𝐹 = (𝑈, 𝑇, 𝐼, 𝐴), where U is the set of users annotating the set of items 𝐼, 𝑈 = {𝑢 1 , 𝑢 2 , ...𝑢 𝑀 } where each 𝑢 𝑖 is a user. 𝑇 is the set of tags that includes the vocabulary expressed by the folksonomy. 𝐼 is the set of tagged items by user 𝐼 = {𝑖 1 , 𝑖 2 ...𝑖 𝑁 }. 𝐴 = {𝑢 𝑚 , 𝑡 𝑘 , 𝑖 𝑗 } ∈ 𝑈 × 𝑇 × 𝐼 is the set of annotations of each tag 𝑡 𝑘 to an item 𝑖 𝑗 by user 𝑢 𝑚 . We have also considered 𝑅 as the set of user ratings 𝑟 𝑢,𝑖 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">MLP-based neural collaborative filtering for Recommendation</head><p>Approaches of neural collaborative filtering (NCF) for rating prediction often involves dealing with binary property of implicit data. Some works <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31,</ref><ref type="bibr" target="#b25">26]</ref> have in addition discussed the choice of the neural architecture to be implemented. A possible instance of the neural CF approach can be formulated using a multi-layer perceptron (MLP). As addressed in <ref type="bibr" target="#b29">[30]</ref> the input layer (the embedding layer) is a fully connected layer that maps the sparse representations to dense feature vectors. It consists of two feature vectors 𝑣 𝑓 (𝑢) and 𝑣 𝑓 (𝑖) that describe user 𝑣 𝑈 (𝑢) and item 𝑣 𝐼 (𝑖) represented initially through one-hot encoding. The obtained user (item) embedding can be seen as the latent vector for user (item). The user embedding and item embedding are then fed into neural CF layers to map the latent vectors to prediction scores. Final output layer is the predicted score 𝑟 ^𝑢,𝑖 , and training is performed by minimizing the point wise loss between 𝑟 ^𝑢,𝑖 and its target value 𝑟 𝑢,𝑖 . NCF predictive model can be formulated as:</p><formula xml:id="formula_0">𝑟 ^𝑢,𝑖 = MLP(𝑃 𝑇 𝑢 . 𝑣 𝑓 (𝑢) , 𝑄 𝑇 𝑖 . 𝑣 𝑓 (𝑖) |𝑃 𝑢 , 𝑄 𝑖 , Γ MLP )<label>(1)</label></formula><p>𝑃 𝑢 ∈ R 𝑀 ×𝐾 and 𝑄 𝑖 ∈ R 𝑁 ×𝐾 are latent factor matrix for users and items respectively. Γ 𝑀 𝐿𝑃 denotes the model parameters of the interaction function that is defined as a multi-layer neural network.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Autoencoder-Based collaborative filtering for Recommendation</head><p>Another way to consider neural CF is to approach user-item rating as a matrix 𝑋 ∈ 𝑅 𝑚×𝑛 with partially observable row vectors that form a user 𝑢 ∈ the set of users 𝑈 = {1...𝑚} given by the set of user ratings 𝑟 (𝑢) = {𝑋 𝑢1 ...𝑋 𝑢𝑚 } ∈ 𝑅 and column vectors from the set of items 𝑖 ∈ 𝐼 = {1...𝑛} also given by their corresponding ratings 𝑟 (𝑖) = {𝑋 𝑖1 ...𝑋 𝑖𝑛 }. An efficient neural method to encode each partially observed vector into law-dimensional latent space is to handle an autoencoder architecture as suggested in <ref type="bibr" target="#b24">[25]</ref> that will reconstruct the output space to predict missing ratings for recommendation <ref type="bibr" target="#b24">[25,</ref><ref type="bibr" target="#b31">32,</ref><ref type="bibr" target="#b23">24]</ref>. Given a set of rating vectors 𝑟 (𝑢) and 𝑟 (𝑖) ∈ R 𝑑 , the autoencoder solves:</p><formula xml:id="formula_1">𝑚𝑖𝑛 𝜃 ∑︁ 𝑟∈𝑅 = ||𝑟 − ℎ(𝑟; 𝜃)|| 2<label>(2)</label></formula><p>Where ℎ(𝑟; 𝜃) is the reconstruction of input 𝑟 ∈ R 𝑑 that is defined as:</p><formula xml:id="formula_2">ℎ(𝑟; 𝜃) = 𝑓 (𝑊.𝑔(𝑉 𝑟 + 𝜇) + 𝑏)<label>(3)</label></formula><p>𝑓 (.) and 𝑔(.) are activation functions associated to the encoder and decoder respectively and 𝜃 gather model parameters; 𝑊 ∈ R 𝑑×𝑘 and 𝑉 ∈ R 𝑘×𝑑 are weight matrices and 𝜇 ∈ R 𝑘 , 𝑏 ∈ R 𝑁 biases. In an item-based recommendation perspective, the autoencoder applies 𝑟 (𝑖) as the set of input vectors. Weights associated to those vectors are updating during backpropagation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Neural Graph Collaborative Filtering for Recommendation</head><p>NGCF approaches are particular in the sens that they exploit embeddings of users and items represented initially as a graph structure. Most of them adopt a user-item bipartite graph of as it much represents user-item interactions <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b15">16]</ref>. Promising recent methods suggest learning user and item representations from their bipartite associated graph by stacking multiple embedding propagation layers to allow high-order connectivity from user-item interactions <ref type="bibr" target="#b20">[21]</ref>. Other works <ref type="bibr" target="#b14">[15]</ref> learn aggregator functions that induce the embedding of a new node given its features and neighborhood. In the following we formalized what can be associated to a neural graph-based collaborative filtering approach for user rating prediction based on multiple embedding aggregation layers. This neural graph-oriented approach is designed to exploit node embeddings from neighborhood aggregation. Given a bipartite weighted graph of user-item 𝒢 = (𝒱, ℰ, 𝐴, 𝒳 ), with 𝒱 = {𝒱 𝑢 ∪ 𝒱 𝑖 }, ℰ denotes the set of undirected weighted edges representing user ratings, 𝐴 is the adjacency matrix and 𝒳 ∈ R 𝑚×𝑛 is defined as the node feature matrix. Let ℎ 0 𝑣 = 𝑥 𝑢 𝑣 with 𝑣 ∈ 𝒱 𝑢 be the user node feature at the 0th layer. Then, At the k-th layer :</p><formula xml:id="formula_3">ℎ 𝑘 𝑣 = 𝛿(𝑊 𝑘 ∑︁ 𝑢∈𝑁 (𝑣) ℎ 𝑘−1 𝑢 |𝑁 (𝑣)| + 𝐴 𝑘 ℎ 𝑘−1 𝑣 ) (4) ℎ 𝑘−1 𝑣</formula><p>is the embedding of user node 𝑣 ∈ 𝒱 𝑢 from previous layer. |𝑁 (𝑣)| is the number of the neighbors of node 𝑣. The sum expressed in the equation enables aggregate neighboring features of node 𝑣 from previous layer. 𝛿 is the activation function (Tanh) that enables non-linearity. 𝑊 𝑘 and 𝐴 𝑘 are trainable parameters. The final embedding after K layers (𝑘 ∈ {1...𝐾}) is extracted from the output layer: 𝑧 𝑢 𝑣 = ℎ 𝐾 𝑣 after K layers. This can be expressed as a matrix multiplication form for the whole graph as:</p><formula xml:id="formula_4">𝐻 𝑙+1 = 𝛿(𝐻 𝑙 𝑊 𝑙 0 + 𝐴 ˜𝐻𝑙 𝑊 𝑙 1 )<label>(5)</label></formula><p>In such a way that 𝐴 ˜= 𝐷 −1/2 𝐴𝐷 −1/2 with 𝐴 represents adjacency matrix and 𝐷 represents the degree matrix. Thereafter, after applying similar process to item nodes embeddings to get 𝑧 𝑖 𝑣 with 𝑣 ∈ 𝒱 𝑖 , one way is to employ a concatenated operator ⊕ on both user and item final embeddings to obtain 𝑧 𝑢⊕𝑖 𝑒 = 𝑧 𝑢 𝑣 ⊕ 𝑧 𝑖 𝑣 that represents the edge embedding 𝑒 𝑢,𝑖 between a user node 𝑣 𝑢 and item node 𝑣 𝑖 , with 𝑒 𝑢,𝑖 = [𝑣 𝑢 , 𝑣 𝑖 ]. These edge embeddings are passed through a link regression layer to obtain predicted user-item ratings. The model is trained end-to-end by minimizing a regression loss function (RMSE or root mean square error between predicted and true ratings) using stochastic gradient descent (SGD) updates of the model parameters, with minibatches of user-item training edges fed into the model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Overview of the proposed models</head><p>In this section, we introduce our tag-aware neural models for recommendation. More explicitly, we integrate tag-based embeddings into CF neural architectures, namely a Multilayer perceptron, an autoencoder and a neural graph based model. More explicitly, to integrate side information into predictive neural models a naive approach consists of appending additional user/item bias to the rating prediction. We estimate that computing those biases can be handled either by hand-crafted engineering or by implementing an appropriate CF strategy. A simple Neural collaborating filtering framework architecture implies considering the input layer(embedding layer) as a fully connected layer that projects sparse representation of users and items to dense vectors. To integrate explicitly tags vocabulary in a neural model for rating prediction, we have made use of feature vectors that we have considered as tag vector representations sharing a common embedding space using projection matrices. The obtained user (item) embedding can be seen as the latent vector for the user (item) in the tag latent space. Feature vectors 𝑣 𝑓 (𝑢) and 𝑣 𝑓 (𝑖) are reconsidered since we have projected tag representations into lower dimension using projected matrices E and F. Consequently, tag-based vector representation is expressed as a user feature vector 𝑣 𝑓 (𝑢 ˜):</p><formula xml:id="formula_5">𝑣 𝑓 (𝑢 ˜) = 1 |𝑇 𝑢 | ∑︁ 𝑡 𝑘 ∈𝑇𝑢 𝐸(𝑡 𝑘 )<label>(6)</label></formula><p>Such as 𝑡 𝑘 ∈ R 𝑐 is the embedding vector associated with tag k, and c denotes the embedding dimension. 𝐸 denotes the projection matrix with 𝐸 ∈ R 𝑑×𝑐 .</p><p>Similarly, if 𝐹 denotes the projection matrix with 𝐹 ∈ R 𝑑×𝑐 , then the item feature vector</p><formula xml:id="formula_6">𝑣 𝑓 (𝑖 ˜) is expressed as: 1-19 𝑣 𝑓 (𝑖 ˜) = 1 |𝑇 𝑖 | ∑︁ 𝑡 𝑘 ∈𝑇 𝑖 𝐹 (𝑡 𝑘 )<label>(7)</label></formula><p>We denoted 𝑇 𝑢 the set of tags of a user 𝑢 and 𝑇 𝑖 as the set of related tags describing a particular item. Moreover, we have obtained embeddings for tags from Word2vec and BERT pre-trained neural model by handling projection matrices E and F ∈ R 𝑑×𝑐 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">CF-based MLP model</head><p>Extended tag-based NCF predictive model can be reformulated relying on the previous NCF model that has been described in section 3.1 equation (1) as:</p><formula xml:id="formula_7">𝑟 ^𝑢,𝑖 = MLP( 𝑣 𝑓 (𝑢 ˜), 𝑣 𝑓 (𝑖 ˜), 𝜃 MLP )<label>(8)</label></formula><p>The user and item embeddings can be fed into a multi-layer neural model.</p><p>Where, 𝑟 ^(𝑢, 𝑖) is the rating score for a user on an item. Figure <ref type="figure" target="#fig_0">1</ref> (𝒞) details an instance of the model. Prediction Pipeline exploits user and item vectors extracted from dense space representation (Figure <ref type="figure" target="#fig_0">1</ref> (𝒜) ), hidden layers are added to learn interactions between user and item latent features, a regressor at the last hidden layer is set to produce the final rating. (Figure <ref type="figure" target="#fig_0">1</ref> (𝒜) ) is a dynamic module in which dense representations are computed through inner product of user and items embedding' representations. Tag embedding representations are extracted from neural pre-trained language model (Figure <ref type="figure" target="#fig_0">1</ref> (ℰ) ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">CF-based Autoencoder model</head><p>Following the autoencoder paradigm, instead of encoding user vectors containing user ratings to be predicted like in Autorec <ref type="bibr" target="#b24">[25]</ref>, we have extended a multilayered autoencoder architecture to integrate element wise product of pre-trained tag-based embeddings. Such embeddings are concatenated with the user rating representations and are projected on a dimensional latent (hidden) space. As such, user' rating 𝑟(𝑢 𝑚 , 𝑖 𝑙 ) of a particular user is reconstructed using an objective function 𝜃 that minimizes :</p><formula xml:id="formula_8">∑︁ ||𝑟(𝑢 𝑚 , 𝑖 𝑙 ) ⊕ (𝑣 𝑓 (𝑢 ˜) ⊗ 𝑣 𝑓 (𝑖 ˜)) − ℎ(𝑟(𝑢 𝑚 , 𝑖 𝑙 ) ⊕ (𝑣 𝑓 (𝑢 ˜) ⊗ 𝑣 𝑓 (𝑖 ˜)); 𝜃)|| 2 (9)</formula><p>Where (𝑟(𝑢 𝑚 , 𝑖 𝑙 ), 𝜃) is the reconstruction of the input 𝑟(𝑢 𝑚 , 𝑖 𝑙 ) ∈ R 𝑑 . The operator ⊗ denotes element-wise multiplication between user and item feature vectors. The operator ⊕ denotes a concatenation operator. 𝑡𝑎𝑛ℎ is the selected activation function. Figure <ref type="figure" target="#fig_0">1</ref> (ℬ) presents a detailed instance of the model. Prediction Pipeline exploits user and item vectors extracted from dense space representation. Such representations are concatenated with user rating and fed as input of the autoencoder model. Layers are added to learn interactions between user and item latent features to be compressed in a dense space. User's ratings reconstruction from the dense space produce the final rating. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Neural graph CF-based model</head><p>As part of collaborative filtering approaches, neural graph-based networks consider for the most <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b18">19,</ref><ref type="bibr" target="#b14">15]</ref> bipartite graphs of users and items in a recommendation context, where edges represent the rating interactions between the users and the items. From the bipartite graph 𝐺 defined in section 2.1.3 where nodes' classes are derived from the set of user nodes 𝒱 𝑢 and the set of item nodes 𝒱 𝑖 respectively. Each edge corresponds to whatever user's rates an item. Each edge 𝑒 𝑢,𝑖 ∈ ℰ is associated to a value 𝑟 (𝑢,𝑖) ∈ {0, 1}.In order to learn the topological structure of each class of node neighborhoods, the idea is to aggregate feature information from node's local neighborhood <ref type="bibr" target="#b14">[15]</ref>, however in this paper we handled node's features from pre-trained static and contextual tag embeddings model. Users' nodes features are taken from mean average users' tags embedding vectors, equivalently items' nodes features are represented throws the mean average of their tag embeddings vectors. We have previously explored a simple neighborhood aggregation process in section 2.0.3. By defining a neighborhood function 𝑁 (𝑣), that is set to a fixed-size (in our experiments K=2), the bipartite graph is sampled as the model learn a function that generates aggregates from tag-based textual feature node neighbors. This method can be generalized by applying different aggregation methods to nodes ∈ 𝐺 by concatenating the features with the nodes itself. For this purpose, we have associated each node 𝑣 ∈ {𝒱 𝑢 ∪ 𝒱 𝑖 } to features from word vector representation by joining tag-based vector representation 𝑣 𝑓 (𝑢 ˜) and 𝑣 𝑓 (𝑖 ˜) (Figure <ref type="figure" target="#fig_0">1</ref> (𝒢) ). We have designed a mean aggregation function that is commonly used since it imply element wise mean of the feature vectors in ℎ 𝑘−1 𝑢 . We have also designed a convolution aggregator function that we have detailed next.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.1.">Mean aggregator function</head><p>Since the rating interactions between users and items are represented as a bipartite graph 𝐺 = (𝑈, 𝑉, 𝐸), 𝒱 𝑢 and 𝒱 𝑖 corresponds respectively to users and items sets. Thus, aggregation mean tag embedding features from the neighbors of the node 𝑣 ∈ {𝒱 𝑢 ∪ 𝒱 𝑖 } is processed given the following update rule (Figure1 𝒟 (𝒜 ′ ) ):</p><formula xml:id="formula_9">ℎ 𝑘 𝑁 (𝑣) = 1 |𝑁 (𝑣)| 𝐷 𝑝 [ℎ 𝑘−1 𝑣 ]</formula><p>We give the forward pass through layer 𝑘 as follows:</p><formula xml:id="formula_10">ℎ 𝑘 𝑣 = 𝛿(𝑐𝑜𝑛𝑐𝑎𝑡[𝑊 𝑘 𝐼 𝐷 𝑝 [ℎ 𝑘−1 𝑣 ], 𝑊 𝑘 𝑁 𝑒𝑖𝑔ℎ𝑏𝑜𝑟 ℎ 𝑘 𝑁 (𝑣)] + 𝑏 𝑘 )</formula><p>Where, ℎ 𝑘 𝑣 is the output node 𝑣 at layer 𝑘, 𝑊 𝑘 𝐼 and 𝑊 𝑘 𝑁 𝑒𝑖𝑔ℎ𝑏𝑜𝑟 are trainable parameters, 𝑏 𝑘 is an optional bias, 𝑑 𝑘 is node feature dimensionality at layer 𝑘, 𝛿 is a non linear activation function (Tanh), 𝐷 𝑝 is a random dropout with probability 𝑝 applied to its argument vector used to reduce model's over-fitting. 𝑁 (𝑣) represent the neighborhood of a node 𝑣 ∈ {𝒱 𝑢 ∪ 𝒱 𝑖 } . The number of trainable parameters in layer k for the mean aggregator is 𝑑 𝑘 .𝑑 𝑘−1 + 𝑑 𝑘 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.2.">Convolutional aggregator function</head><p>To generalize the collaborative filtering process from a graph convolutional network perspective, we adopted a GCN aggregator <ref type="bibr" target="#b14">[15]</ref> (Figure1 𝒟 (𝒜 ′ ) ), that concatenates nodes from the previous layer representation ℎ 𝑘−1 𝑣 with the aggregated neighborhood vectors ℎ 𝑘 𝑁 (𝑣) . Features are updated given the following equation:</p><formula xml:id="formula_11">ℎ 𝑘 𝑁 (𝑣) = 1 |𝑁 (𝑣)| + 1 (ℎ 𝑘−1 𝑣 + ∑︁ 𝑣∈𝑁 (𝑣) ℎ 𝑘−1 𝑣 )<label>(10)</label></formula><p>Forward pass through layer 𝑘 is defined as:</p><formula xml:id="formula_12">ℎ 𝑘 𝑣 = 𝛿(𝑊 𝑘 .ℎ 𝑘 𝑁 (𝑣) + 𝑏 𝑘 )<label>(11)</label></formula><p>Where, 𝑊 𝑘 , is a trainable weight matrix, shared between all nodes 𝑣 ∈ {𝒱 𝑢 ∪ 𝒱 𝑖 }. The size of 𝑊 𝑘 is given as 𝑑 𝑘 × 𝑑 𝑘−1 . The number of trainable parameters in layer 𝑘 for the GCN aggregator is 𝑑 𝑘 .𝑑 𝑘−1 + 𝑑 𝑘 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments</head><p>In this section, we have conducted experiments intending to answer the following research questions:</p><p>RQ1: Are tag-based contextual embeddings efficient representations to be used in a neural CF model compared to static tag-based embedding representations? RQ2: Which extended neural collaborative architecture perform significant improvement and ranking quality for a rating prediction task?</p><p>From there, an underlying research question can be derived, it concerns the various methods used for aggregating tag embeddings. Assuming that, the methods used for aggregating tag embeddings may affect the performance of recommendation models.</p><p>RQ3: Are contextual neural graph embeddings more efficient representations to be used in a neural collaborative filtering architecture ? regarding such process, which aggregator function should leads to better recommendation performance? A mean aggregator function? a convolutional aggregator function?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Experimental Settings</head><p>1. Datasets: The data sets describe 5-stars ratings and free-text tagging from MovieLens, a movie recommendation service. We extracted user annotations from the ML-10M, ML-20M, and ML-25M data sets. Only users that have annotated and rated at least 20 movies were selected. We observed from Table <ref type="table" target="#tab_1">1</ref> an unequal distribution of user rating classes, because of users trend scoring items with good rating values. This can lose models capacity to generalize. To overcome, we over-sample minority classes <ref type="bibr" target="#b32">[33]</ref> by duplicating samples from the minority class and adding them to the training data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Hyper-parameters:</head><p>After splitting the data in each dataset into random 90%, 10% training and testing sets, we hold 10% of the training set for hyper-parameters tuning. Then, we conducted 5 cross-fold validation strategy in each dataset and averaged RMSE measure. We have applied a grid search for hyper-parameters tuning such as the learning rate that we tuned among values ∈ {0.0001, 0.0005, 0.001, 0.005}, latent dimensions ∈ {100, 200, 300, 400, 500, 1000} for both autoencoder and MLP architecture. We handled the Neural Collaborative Autoencoder with a default rating of 2.5 for testing set without training observations. Graph neuronal and convolutional models handled same dataset, except that models derived from these approaches handle edges prediction throw bipartite graph samples. We tuned the dropout ratio<ref type="foot" target="#foot_3">4</ref> from values ∈ {0.0, 0.1, , 0.8}, we have also defined the neighbor nodes embeddings features at a particular layer of 2. The models were optimized thanks to the well known Adam optimizer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Evaluation Metrics:</head><p>We have evaluated rating prediction using two metrics: Mean Absolute Error (MAE) and Root Mean Square Error(RMSE). Both of them are widely used for rating prediction in recommended systems. Given a predicted rating 𝑟 ^𝑢,𝑖 and a ground-truth rating 𝑟 𝑢,𝑖 from the user 𝑢 for item 𝑖, the RMSE is computed as:</p><formula xml:id="formula_13">𝑅𝑀 𝑆𝐸 = √︃ 1 𝑁 ∑︁ 𝑢,𝑖 (𝑟 𝑢,𝑖 − 𝑟 ^𝑢,𝑖 ) 2<label>(12)</label></formula><p>Where 𝑁 indicates the number of ratings between users and items.</p><p>MAE is computed as follows:</p><formula xml:id="formula_14">𝑀 𝐴𝐸 = 1 𝑁 ∑︁ 𝑢,𝑖 |𝑟 𝑢,𝑖 − 𝑟 ^𝑢,𝑖 |<label>(13)</label></formula><p>Indeed, we have also evaluated ranking accuracy using NDCG (Normalized Discounted Cumulative Gain <ref type="bibr" target="#b33">[34]</ref>) at 10. For this purpose, we assumed rating values at 5 as being a good appreciation of a user regarding a movie. In contrast, rating values under 3 are considered as bad. Hence, the rating value of each movie is used as a gained value for its ranked position in the result. The gain is summed from the ranked position from 1 to 𝑛. To compute 𝑁 𝐷𝐶𝐺, relevance scores are set to six(5) points scale from 1 to 5 and denotes the relevance score from low to strong relevance. We set the Ideal DCG for user movies ranked in decreasing order of their ratings. NDCG values presented further are averaged over user testing set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Tag-based embedding representations</head><p>We have considered tag-based embeddings thanks to word vector representations. We have extracted such tag-based embedding representations from pre-trained neural language models. Owing to the users' writing discrepancy, users' tags semantic meaning is often ambiguous.</p><p>Tags can be composed of several words and may contain subjective expressions. They can also be unique words which can occasionally lead to a lack of context. That makes it difficult to integrate tags explicitly in an effective neural CF architecture. Our main objective is to map users, items and their tags' interaction in the same latent space. Rather to exploit straightly dimensional latent space representations of users and items like in most neural collaborative approaches <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b34">35]</ref>, we propose to project first both users' and items' representations into a dense tag space representation. Both previous neural approaches are somehow representative of our objective since they are from CF. We assume that users and items are represented by their corresponding tags. Particularly, they are represented from the aggregate average of their tag embedding representations.</p><p>1. Static Word2vec tag-based embdddings: We have handled static tag-based embedding vectors from Word2vec. We have exploited pre-trained vectors trained on part of Google News dataset (about 100 billion words) and have extracted user's tags embedding by associating them to a vector of a well known fixed size for each tag. However, we found that some tags were out of tag vocabulary, since those user tags represent respectively 8%, 5%, 5% of our Movielens Datasets 10M, 20M and 25 millions ratings. We fixed this issue by initiate those samples with random vector values. The inability to handle unknown or out-of-vocabulary words is one of limitation encountered when using such pre-trained model. Finally, each set of tags per user is represented through a multidimensional vector of 𝑑𝑖𝑚 = 300. 2. Contextualized BERT tag-based embdddings: We have addressed extracting contextualised embeddings from BERT neural language model. For this purpose, we have assumed that the fist token which is '[CLS]' that captures the context is treated as sentence embeddings <ref type="bibr" target="#b35">[36]</ref>. The word embedding sequence corresponding to each set of tags is entered into the pre-trained model. We have then handled the activation from the last layers of BERT model since the features associated with the activation in these layers are far more complex and include more contextual information. These contextual embeddings are used as input to our proposed models. Thus, each set of tags per user is represented through a multidimensional embedding vector of 𝑑𝑖𝑚 = 768. We have implemented the pre-trained bert-base model<ref type="foot" target="#foot_4">5</ref> (12 blocks of hidden dimension 768, 12 heads for attention) and defining the '[CLS]' which indicates the beginning of a sequence as well as the '[SEP]' that we used as a separation between two tags of a same sequence. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Evaluation and Performance comparison</head><p>First, to solve the RQ1, we extended neural models <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b24">25]</ref> by handling static and contextual tag-based embedding representations. We compared those models with recent neural models from CF that we set as baselines. We evaluated rating score accuracy using RMSE (Root Mean Square Error) and MAE (Mean Absolut Error). Then, to address RQ2, we have implemented an MLP and an autoencoder-based CF architecture then, we compared the performance of each neural model according to tag-based embedding representations with which such models were integrated. Moreover, ranking accuracy metric was carried out among the different neural models using NDCG (Normalized Discount Cumulative Gain) at 10. Finally, to answer RQ3, we managed to exploit user/item based tag embeddings thanks to an aggregate function that is learned from training samples of user-item graphs. Such function operates either by performing element wise multiplication between the tag embedding neighbor vectors of a given node or by concatenating tag embedding vectors with their tag embedding neighbor vectors to get the embedding of that node.</p><p>We have detailed bellow all the models that are included in the neural models Comparative study.</p><p>• Neural GMF-MLP <ref type="bibr" target="#b29">[30]</ref>: Is a neural CF approach that exploits a multi-layer perceptron (MLP) to learn the user-item interaction function. The bottom input layer consists of two  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Effects on recommendation quality and ranking (RQ1)</head><p>Results of our experiments are synthesized in Table <ref type="table" target="#tab_2">2</ref>. Initially, as regards to ML-10M dataset, top RMSE and MAE scores are valued from CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 model with 𝑀 𝐴𝐸 = 0.715 and 𝑅𝑀 𝑆𝐸 = 0.791. Our proposed contextual tag embeddings based NGCF model has also achieved top quality ranking to reach 𝑁 𝐷𝐶𝐺@10 = 0.48. We have noticed that the static tag-based embedding extension of this model that is CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝑊 2𝑉 has also achieved good results outperforming most of the baselines except TRSDL model <ref type="bibr" target="#b8">[9]</ref> that has reached 𝑀 𝐴𝐸 = 0.73, 𝑅𝑀 𝑆𝐸 = 0.810 with a ranking metric of 𝑁 𝐷𝐶𝐺@10 = 0.45. Regarding to Hinsage model <ref type="bibr" target="#b14">[15]</ref> that reached 𝑀 𝐴𝐸 = 0.75, 𝑅𝑀 𝑆𝐸 = 0.85 with a ranking score of 𝑁 𝐷𝐶𝐺@10 = 0.48 and CF-GNN ++ 𝑀 𝑒𝑎𝑛 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 model that reached 𝑀 𝐴𝐸 = 0.774 and 𝑅𝑀 𝑆𝐸 = 0.89 with a ranking quality that achieved 𝑁 𝐷𝐶𝐺@10 = 0.451, we might be tempted at first sight to claim that NGCF approaches describe strong performance compared with other neural collaborative approaches no matter which tag embeddings we have integrated to the models. However, by considering the significant performance of the neural models that integrate contextualized tag embeddings such as Neural CF-MLP ++ 𝐵𝑒𝑟𝑡 that has achieved scores valued to 𝑅𝑀 𝑆𝐸 = 0.72 and 𝑀 𝐴𝐸 = 0.93 or even the autoencoder model CF-Autoencoder ++ 𝐵𝑒𝑟𝑡 that has risen 𝑅𝑀 𝑆𝐸 = 0.96 and 𝑀 𝐴𝐸 = 0.76, our thoughts then focused to determine which model's architecture performs better among all the proposed neural architectures that effectively do integrate static/contextual tag embedding representations or those who additionally have aggregated tag-based neighborhood embeddings.</p><p>Furthermore, in ML-20M dataset, the same NGCF model named CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 has shown top RMSE and MAE score with 𝑀 𝐴𝐸 = 0.723 and 𝑅𝑀 𝑆𝐸 = 0.802. This confirms the performance of NGCF approaches combined with contextualized tag embeddings. It also appeared that such models reach top quality ranking, additionally, ranking metric score shown that the most competitive baseline is Hinsage <ref type="bibr" target="#b14">[15]</ref> with a ranking quality that does not exceed 𝑁 𝐷𝐶𝐺@10 = 0.448. Both CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 and CF-GNN ++ 𝑀 𝑒𝑎𝑛 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 models have the highest ranking scores with 𝑁 𝐷𝐶𝐺@10 = 0.47 and 𝑁 𝐷𝐶𝐺@10 = 0.441 respectively. This is the case even if those models do not use the same aggregation technique nor the same tag embeddings process. In this regard, we found that mean aggregator function which is operated with static tag embeddings in a NGCF process named CF-GNN ++ 𝑀 𝑒𝑎𝑛 Agg (𝑘=2) 𝑊 2𝑉 has performed well and obtained 𝑀 𝐴𝐸 = 0.80, 𝑅𝑀 𝑆𝐸 = 0.94 with a ranking quality of 𝑁 𝐷𝐶𝐺@10 = 0.464 which is a score that outperforms the autoencoder-based model extension named CF-Autoencoder++ 𝐵𝑒𝑟𝑡 with 𝑁 𝐷𝐶𝐺@10 = 0.44 since this model has already achieved 𝑀 𝐴𝐸 = 0.811 and 𝑅𝑀 𝑆𝐸 = 0.89.This demonstrates the efficiency of such aggregation function.</p><p>Finally, in ML-25M dataset, impact of contextualized tag embeddings on models is definitely established since both RMSE and MAE scores have shown significant improvements compared to baselines. Such is the case for Neural CF-MLP ++ 𝐵𝑒𝑟𝑡 model that has reached 𝑀 𝐴𝐸 = 0.791, 𝑅𝑀 𝑆𝐸 = 0.83 for a quality ranking of 𝑛𝑑𝑐𝑔@10 = 0.46. It is likewise for CF-Autoencoder++ 𝐵𝑒𝑟𝑡 model with RMSE and MAE scores to 𝑀 𝐴𝐸 = 0.79, 𝑅𝑀 𝑆𝐸 = 0.86 and a ranking quality to 𝑁 𝐷𝐶𝐺@10 = 0.445. On top of that, impact of aggregator functions are also distinguishable through NCGF model scores since we noticed that results were much improved using a convolutional aggregator function applied to contextualized tag embeddings. CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 model performed best RMSE and MAE scores comparing to CF-GNN ++ 𝑀 𝑒𝑎𝑛 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 model which exploits a mean aggregator function despite such model integrates contextualized tag embeddings. We ensure that those results can be strengthened by increasing the training data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Effects on error distribution (RQ2)</head><p>In the following, we have discussed the effectiveness of our approaches on predicting user ratings with an acceptable amount of error. We highlighted impact of exploiting contextualized tag-based embedding representations through studying error distribution when predicting user ratings. Such impact is summarized at the top of Figure <ref type="figure" target="#fig_1">2</ref>. Error distribution values have been presented among testing sets of the data sets ML-10M, ML-20M and ML-25M. This is to propose an overview of the error distributions resulted from baselines compared with those from our predictive models that do integrate tag-based static or contextualized embedding representations and describe specific architectures for each model.</p><p>First, in ML-10M dataset we observe that error distribution values from the models exploiting contextual tag embeddings such as CF-MLP ++ 𝐵𝑒𝑟𝑡 and CF-Autoencoder ++ 𝐵𝑒𝑟𝑡 are most located in the interval ∈ [−1, 1] compared to the error distribution values of the other baselines. We also observe that the NGCF models that are CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 and CF-GNN ++ 𝑀 𝑒𝑎𝑛 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 outperforming all other models with a number of 980 and 890 accurate predictions respectively. Secondly, in ML-20M we notice that CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 model conduct to a large number of accurate predictions which are estimated to be 7220. Such performance is closely followed by the CF-GNN ++ 𝑀 𝑒𝑎𝑛 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 with a number of 4250 accurate predictions. Lastly, in ML-25M the same models reached 7980 and 7740 accurate predictions respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Impact of learning aggregated tag-based functions (RQ3)</head><p>We have given for each model the validation scores after 20 epochs, this allows us to estimate the model's capacity to generalize past the data that it was trained on. From the bottom of the figure <ref type="figure" target="#fig_1">2</ref> we have Analyzed which models perform optimal convergence rate. It appears that among the three collections that are ML-10M, ML-20M and ML-25M, the convergence rate of the models are clearly more significant when it comes from neural graph approaches. Particularly, CF-GNN ++ 𝑀 𝑒𝑎𝑛 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 and CF-GCN ++ 𝑐𝑜𝑛𝑣 Agg (𝑘=2) 𝐵𝑒𝑟𝑡 that are our NGCF models that exploit fine-tuned tag embedding representations. This leaves us to believe that when contextualized tag embeddings are aggregate throw neighborhood embeddings they give more effective representations of users and items and enhance recommendation quality. We argue that our NGCF approaches catch the multiple semantic dimensions that a tags can take have including the abstract formalization of tag neighborhood embeddings that have conducted to fine-gained representations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>Following the experiments, we came to the conclusion that exploiting neural graph models to learn aggregation functions has enabled us to gain quality recommendations and improve ranking quality. We have shown that handling a convolutional aggregator function can generalize an efficient graph-based neural collaborative filtering process. It concatenates contextualized tag embedding representations of user/item nodes from previous layer representations. This has enabled us to gain more refined embedding features and achieved to catch non-trivial tagging behavior.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Extended NCF based on an MLP (on the right) and an Autoencoder (on the left), Graph-based NCF architecture based on tag feature embeddings and aggregator functions</figDesc><graphic coords="8,148.20,84.19,298.90,195.52" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: At the top of this figure, we presented each neural model's error distribution. At the bottom, we gave model's validation scores after 20 epochs</figDesc><graphic coords="14,99.21,84.19,396.85,186.10" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>• Evaluate the impact of static/contextual embedding representations and comparing model architecture. • Evaluate impact of multi-layer neighbor static/contextual embedding representations to be exploited in a neural graph CF model. • Extensive series of experiments on real data from several MovieLens data sets.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>Statistical details of the 10M, 20M and 25M collections from MovieLens</figDesc><table><row><cell>Collection</cell><cell>10M</cell><cell>20M</cell><cell>25M</cell></row><row><cell>Number of users</cell><cell>71567</cell><cell>138000</cell><cell>162541</cell></row><row><cell>Number of movies</cell><cell>10681</cell><cell>27000</cell><cell>62423</cell></row><row><cell>TAS( Tag assignment)</cell><cell>95580</cell><cell>465000</cell><cell>1093360</cell></row><row><cell>Ratings</cell><cell cols="2">10000054 2000000</cell><cell>25000095</cell></row><row><cell>Nodes</cell><cell>7114</cell><cell>20555</cell><cell>35363</cell></row><row><cell>Edges</cell><cell>24564</cell><cell>126080</cell><cell>210725</cell></row><row><cell>Period</cell><cell cols="3">Dec-2015 Oct-2016 Nov-2019</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 A</head><label>2</label><figDesc>synthesis of RMSE and MAE values for each model including 𝑛𝑑𝑐𝑔@10 scores, the best scores are in bold.</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="3">Evaluation measures</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Models</cell><cell></cell><cell>ML-10M</cell><cell></cell><cell></cell><cell>ML-20M</cell><cell></cell><cell></cell><cell>ML-25M</cell><cell></cell></row><row><cell></cell><cell>MAE</cell><cell>RMSE</cell><cell>ndcg@10</cell><cell>MAE</cell><cell>RMSE</cell><cell>ndcg@10</cell><cell>MAE</cell><cell>RMSE</cell><cell>ndcg@10</cell></row><row><cell>Neural CF-MLP ++ 𝑊 2𝑣 Neural CF-MLP ++ 𝐵𝑒𝑟𝑡 CF-Autoencoder ++ 𝑊 2𝑣 CF-Autoencoder ++ 𝐵𝑒𝑟𝑡</cell><cell>0.77 0.72 0.83 0.76</cell><cell>0.98 0.93 1.1 0.96</cell><cell>0.43 0.46 0.411 0.42</cell><cell>0.88 0.791 0.85 0.811</cell><cell>0.96 0.86 0.97 0.89</cell><cell>0.381 0.42 0.39 0.44</cell><cell cols="2">0.84 0.791 0.80 0.798 0.865 1.01 0.83 1.02</cell><cell>0.42 0.46 0.42 0.445</cell></row><row><cell>U-Autorec [25]</cell><cell>0.82</cell><cell>1.09</cell><cell>0.38</cell><cell>0.84</cell><cell>1.07</cell><cell>0.37</cell><cell>0.81</cell><cell>1.01</cell><cell>0.40</cell></row><row><cell>Neural CF-MLP[30]</cell><cell>0.73</cell><cell>0.98</cell><cell>0.44</cell><cell>0.89</cell><cell>1.025</cell><cell>0.39</cell><cell>0.87</cell><cell>0.92</cell><cell>0.43</cell></row><row><cell>CF-GNN ++ 𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2) 𝑊 2𝑣</cell><cell>0.88</cell><cell>1.10</cell><cell>0.47</cell><cell>0.80</cell><cell>1.02</cell><cell>0.49</cell><cell>0.82</cell><cell>1.04</cell><cell>0.44</cell></row><row><cell>CF-GNN ++ 𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2)𝐵 𝑒𝑟𝑡</cell><cell>0.774</cell><cell>0.89</cell><cell>0.451</cell><cell>0.78</cell><cell>0.85</cell><cell>0.441</cell><cell cols="3">0.772 0.799 0.471</cell></row><row><cell>CF-GCN ++ 𝑐𝑜𝑛𝑣 𝐴𝑔𝑔 (𝑘=2)𝑊 2𝑣</cell><cell>0.798</cell><cell>0.821</cell><cell>0.47</cell><cell>0.74</cell><cell cols="2">0.838 0.464</cell><cell>0.79</cell><cell>0.801</cell><cell>0.465</cell></row><row><cell>CF-GCN ++ 𝑐𝑜𝑛𝑣 𝐴𝑔𝑔 (𝑘=2) 𝐵𝑒𝑟𝑡</cell><cell cols="2">0.715 0.791</cell><cell cols="3">0.48 0.723 0.782</cell><cell>0.47</cell><cell cols="2">0.712 0.787</cell><cell>0.48</cell></row><row><cell>HINSAGE [15]</cell><cell>0.75</cell><cell>0.85</cell><cell>0.48</cell><cell cols="3">0.771 0.801 0.448</cell><cell>0.74</cell><cell cols="2">0.791 0.475</cell></row><row><cell>TRSDL [9]</cell><cell>0.73</cell><cell>0.810</cell><cell>0.45</cell><cell>0.74</cell><cell cols="2">0.820 0.461</cell><cell>0.75</cell><cell>0.87</cell><cell>0.44</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://www.facebook.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://www.pinterest.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://www.pinterest.fr/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent over-fitting</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">BERT was pre-trained on a corpus composed of 11,038 unpublished books belonging to 16 different domains and 2,500 million words from English Wikipedia text passages</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>vectors that describe user u and item i in a binarized sparse vector (one-hot encoding), such model employ only the identity of a user and an item as input feature.</p><p>• Neural CF-MLP ++ : Is an extension of Neural CF-MLP, the model integrates in the bottom input layer two feature vectors that are described as tag embedding features of users and items. These features are extracted from word vector representation. User and item feature vectors are extracted from tag-based embeddings, with 300-dimensional word vectors from pre-trained Word2vec model Neural CF-MLP ++ 𝑊 𝑜𝑟𝑑2𝑣𝑒𝑐 and Neural CF-MLP ++ 𝐵𝐸𝑅𝑇 that exploits a 768-dimensional word vectors from pre-trained BERT model.</p><p>• U-Autorec <ref type="bibr" target="#b24">[25]</ref> U-AutoRec is a neural CF framework for rating prediction that exploits an autoencoder architecture. It takes user vectors as input and reconstructs them in the output layer. The values in the reconstructed vectors are the predicted value of the corresponding position. • CF-Autoencoder ++ Our autoencoder-based neural collaborative approach that integrates as input tag embedding features by performing element-wise multiplication on their word vector representations and do concatenate such representations with user/item rating vectors to get the reconstructed ratings. We have termed the autoencoder-based model using static tag vector representations as CF-Autoencoder ++ 𝑊 𝑜𝑟𝑑2𝑣𝑒𝑐 meanwhile CF-Autoencoder ++ 𝐵𝐸𝑅𝑇 stands for autoencoder-based model using contextual tag vectors.</p><p>• CF-GNN ++  𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2) Our NGCF tag-based predictive model that generates node embeddings by sampling and aggregating features (tag embeddings) from nodes local neighborhood using a mean aggregation function that operates at neighborhood of 𝑘 = 2. We distinguish between the NGCF model that handles features extracted from tag-based embeddings using 300-dimensional tag vectors extracted from pre-trained Word2vec model and that we term CF-GNN 𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2) 𝑊 𝑜𝑟𝑑2𝑣𝑒𝑐 and CF-GNN 𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2) 𝐵𝑒𝑟𝑡 that exploits 768-dimensional tag vectors from pre-trained BERT model.</p><p>• CF-GCN ++  𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2) We do consider this NGCF model as being convolutional since it learn convolutional aggregator function that concatenate the node's previous layers representations with the aggregated neighborhood vectors. We differentiate between the model that handles features extracted from tag-based static embeddings with 300-dimensional tag vectors from pre-trained Word2vec model and that we term CF-GCN 𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2) 𝑊 𝑜𝑟𝑑2𝑣𝑒𝑐 and CF-GCN 𝑀 𝑒𝑎𝑛 𝐴𝑔𝑔 (𝑘=2) 𝐵𝑒𝑟𝑡 that exploit 768-dimensional tag vectors from pre-trained BERT model.</p><p>• Hinsage <ref type="bibr" target="#b14">[15]</ref> is a model that employs a technique for computing node representations in an inductive way. This method operates by sampling a fixed-size neighborhood of each user/item node and then performing a specific aggregator over all the sampled neighbors' feature vectors. This model learns general-purpose node embeddings that use the graph structure and particularly node features. It was evaluated for a rating prediction task using demographic users information (no tags information). • TRSDL <ref type="bibr" target="#b8">[9]</ref>: Tag-aware recommender system that uses a deep neural networks (DNNs) and recurrent networks (RNNs) to extract latent features of both users and items. In their model Liang et al., <ref type="bibr" target="#b8">[9]</ref> use Word2Vec for mapping user tags to k-dimensional dense</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Semantic-based tag recommendation in scientific bookmarking systems</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">A M</forename><surname>Hassan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sansonetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Gasparetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Micarelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th ACM Conference on Recommender Systems</title>
				<meeting>the 12th ACM Conference on Recommender Systems</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="465" to="469" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Manotumruksa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Macdonald</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ounis</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1606.07828</idno>
		<title level="m">Modelling user preferences using word embeddings for context-aware venue recommendation</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Rücklé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Eger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Peyrard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1803.01400</idno>
		<title level="m">Concatenated power mean word embeddings as universal cross-lingual sentence representations</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A context-aware user-item representation learning for item recommendation</title>
		<author>
			<persName><forename type="first">L</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Quan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Luo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Information Systems (TOIS)</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="page" from="1" to="29" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Exploiting parallelism opportunities with deep learning frameworks</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">E</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hazelwood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Brooks</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Architecture and Code Optimization (TACO)</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="1" to="23" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Hybrid neural recommendation with joint deep representation learning of ratings and reviews</title>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Gan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Jiao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neurocomputing</title>
		<imprint>
			<biblScope unit="volume">374</biblScope>
			<biblScope unit="page" from="77" to="85" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Word embedding techniques for contentbased recommender systems: An empirical evaluation</title>
		<author>
			<persName><forename type="first">C</forename><surname>Musto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Semeraro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>De Gemmis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lops</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Recsys posters</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Collaborative multi-level embedding learning from reviews for rating prediction</title>
		<author>
			<persName><forename type="first">W</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IJCAI</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="2986" to="2992" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Trsdl: Tag-aware recommender system based on deep learning-intelligent computing systems</title>
		<author>
			<persName><forename type="first">N</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-T</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Sangaiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-Z</forename><surname>Zhao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Sciences</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page">799</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Tagembedsvd: Leveraging tag embeddings for cross-domain collaborative filtering</title>
		<author>
			<persName><forename type="first">M</forename><surname>Vijaikumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shevade</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">N</forename><surname>Murty</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Pattern Recognition and Machine Intelligence</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="240" to="248" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Exploiting pre-trained network embeddings for recommendations in social networks</title>
		<author>
			<persName><forename type="first">L</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-F</forename><surname>Wen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X.-H</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Computer Science and Technology</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="682" to="696" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Distributed representations of sentences and documents</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1188" to="1196" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</title>
				<meeting>the 2014 conference on empirical methods in natural language processing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Inductive representation learning on large graphs</title>
		<author>
			<persName><forename type="first">W</forename><surname>Hamilton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Ying</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Leskovec</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1024" to="1034" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">N</forename><surname>Kipf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Welling</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1609.02907</idno>
		<title level="m">Semi-supervised classification with graph convolutional networks</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Tag-aware recommender systems: a state-of-the-art survey</title>
		<author>
			<persName><forename type="first">Z.-K</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-C</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of computer science and technology</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="page">767</biblScope>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Discriminative embeddings of latent variable models for structured data</title>
		<author>
			<persName><forename type="first">H</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Song</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="2702" to="2711" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">node2vec: Scalable feature learning for networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Grover</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Leskovec</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining</title>
				<meeting>the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="855" to="864" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Grarep: Learning graph representations with global structural information</title>
		<author>
			<persName><forename type="first">S</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th ACM international on conference on information and knowledge management</title>
				<meeting>the 24th ACM international on conference on information and knowledge management</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="891" to="900" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Neural graph collaborative filtering</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-S</forename><surname>Chua</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval</title>
				<meeting>the 42nd international ACM SIGIR conference on Research and development in Information Retrieval</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="165" to="174" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Learning fair representations for recommendation: A graph-based perspective</title>
		<author>
			<persName><forename type="first">L</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Web Conference 2021</title>
				<meeting>the Web Conference 2021</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="2198" to="2208" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Graph convolutional neural networks for web-scale recommender systems</title>
		<author>
			<persName><forename type="first">R</forename><surname>Ying</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Eksombatchai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">L</forename><surname>Hamilton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Leskovec</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</title>
				<meeting>the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="974" to="983" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Autoencoder-based collaborative filtering</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Ouyang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Rong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Xiong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Neural Information Processing</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="284" to="291" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Autorec: Autoencoders meet collaborative filtering</title>
		<author>
			<persName><forename type="first">S</forename><surname>Sedhain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Menon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sanner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Xie</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th international conference on World Wide Web</title>
				<meeting>the 24th international conference on World Wide Web</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="111" to="112" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Joint neural collaborative filtering for recommender systems</title>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Rijke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Information Systems (TOIS)</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="page" from="1" to="30" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Dziugaite</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Roy</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1511.06443</idno>
		<title level="m">Neural network matrix factorization</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Tag-aware recommender systems based on deep neural networks</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zeng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jiao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neurocomputing</title>
		<imprint>
			<biblScope unit="volume">204</biblScope>
			<biblScope unit="page" from="51" to="60" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Joint deep modeling of users and items using reviews for recommendation</title>
		<author>
			<persName><forename type="first">L</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Noroozi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Yu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the tenth ACM international conference on web search and data mining</title>
				<meeting>the tenth ACM international conference on web search and data mining</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="425" to="434" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Neural collaborative filtering</title>
		<author>
			<persName><forename type="first">X</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Liao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Nie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-S</forename><surname>Chua</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 26th international conference on world wide web</title>
				<meeting>the 26th international conference on world wide web</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="173" to="182" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Extracting deep semantic information for intelligent recommendation</title>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-T</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X.-X</forename><surname>Mao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Neural Information Processing</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="134" to="144" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Tnam: A tag-aware neural attention model for top-n recommendation</title>
		<author>
			<persName><forename type="first">R</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Cui</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neurocomputing</title>
		<imprint>
			<biblScope unit="volume">385</biblScope>
			<biblScope unit="page" from="1" to="12" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Smote: synthetic minority over-sampling technique</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">V</forename><surname>Chawla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">W</forename><surname>Bowyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">O</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">P</forename><surname>Kegelmeyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of artificial intelligence research</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="321" to="357" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Cumulated gain-based evaluation of ir techniques</title>
		<author>
			<persName><forename type="first">K</forename><surname>Järvelin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kekäläinen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Information Systems (TOIS)</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="422" to="446" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-S</forename><surname>Chua</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1808.03912</idno>
		<title level="m">Outer product-based neural collaborative filtering</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1908.10084</idno>
		<title level="m">Sentence-bert: Sentence embeddings using siamese bert-networks</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
