<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Tahar-rafik.Boudiba@irit.fr (T. Boudiba); Taoufiq.Dkaki@irit.fr (T. Dkaki)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Tag-based embedding representations in neural collaborative filtering approaches</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tahar-Rafik Boudiba</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Taoufiq Dkaki</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IRIS/IRIT, UMR 5505 CNRS</institution>
          ,
          <addr-line>118 Route de Narbonne, F-31062, TOULOUSE CEDEX 9</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Learning user-item interactions in collaborative systems have become a promising method to improve the performance of collaborative filtering approaches. In such systems, contents surrounding users and items, particularly user tags, have a key role since they are leveraged with collaborative filtering approaches. Tags are commonly represented using the bag of words paradigm, although it is subject to ambiguity due principally to the poor semantic relation between tags. Recent methods suggest the use of deep neuronal architectures as they attempt to learn semantic and contextual word representations. On this basis, we have addressed how to integrate semantically such content into diferent neural collaborative filtering models for rating prediction. Based on efective models initially developed to learn user-item interaction, in this paper, we have extended diferent neural collaborative filtering models for rating prediction to evaluate the impact of using static or contextualized word embeddings within a neural collaborative ifltering strategy. Moreover, the presented models use dense tag-based user and item representations extracted from pre-trained static Word2vec and contextual BERT. In addition, the paper emphasizes the impact of using contextualized tag embedding neighbors in a neural graph collaborative filtering approach that learns an aggregated function. Finally, to determine whether the use of diferent neural architectures can influence the recommendation quality, we adapt neural architectures, including three popular end-to-end learning models that are an MLP an autoencoder, and a Graph Neural Network. We evaluated and compared all the models with recent baselines on several MovieLens datasets.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Learning representation</kwd>
        <kwd>folksonomies</kwd>
        <kwd>deep learning</kwd>
        <kwd>word embedding</kwd>
        <kwd>social tagging</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Deep learning (DL) techniques are the milestones of several recent recommendation engines.
Platforms such as Facebook1 and Pinterest2 have already shared their experience in using DL
for recommender systems (RS). In such platforms, Collaborative Filtering (CF) approaches are
mainly exploited. Such methods enable the users to get recommendations on favourite items.
When such methods are put into practice in RS, it implies being able to predict how users
will rate a particular item. Classical CF approaches are based either on Matrix Factorization
(MF) techniques or on simple user-item vector similarity methods. However, these models
share the property of being essentially linear since they combine user and item latent factors
linearly. In contrast, DL models for RS have the main property of learning multiple level of
representation and hence have enabled the deep integration of several type of content. As result,
recent neural collaborative filtering approaches capture more complex user-item interactions
and enable high-level abstractions for content description. Such content often makes reference
to user’s tags since they are commonly used to describe items and users’ profiles using the
bag of words representation. Although such representations commonly appearing as one-hot
vectors are eficient for computing user-item similarity, many problems such as ambiguity and
vocabulary mismatch have been raised [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In this sens, common NLP techniques suggest the
use of dense representations in the forme of eitheir user or item agregated semantic embedding
vectors extracted from pre-trained Word2vec neural language model [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. However, how
to include eficiently such embedding vectors at the top layer of a neural CF architecture?
A design choice is to combine the two embedding vectors, then feed them through multiple
fully connected layers to get the likelihood that a user interacts with an item. In that way,
multiplying the embedding vectors element-wise with each other or simply concatenating
them might be a raisonnable technique to integrate both user and item dense representations
in a neural CF model. Some works have discussed text embedding aggregation techniques
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] others have suggested the concatenation of mean Word Embedding since they compute
word average embedding representations [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Recent neural approaches for recommendation
consider in addition other relationships such as neighborhood proximity among graph-based
approaches. Such approaches have been proposed to explore multi-layer neighbor embedding
representations. Since these embeddings are integrated with neural CF architectures this has
resulted in Neural Graph CF (NGCF) approaches [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In this paper, we have considered tag
embeddings as the starting point for integrating explicitly a tag-based vocabulary within neural
collaborative filtering models. However, such initiative raises some research issues, such as
determining the most eficient neural architecture to use or defining the best tag embedding
representations. At this end, we handle dense tag-based representations that we exploit within
efective neural CF models for rating prediction. We have developed several neural models that
combine neural CF with tagging information integrated into a training process. For this purpose,
we handled word vector representation to include more valuable tag’ semantic and so to enhance
neural CF models ability to generalize. We compared diferent tag embedding representations
from pre-trained static (Word2vec) and contextual BERT models. Furthermore, we evaluated
the impact of using such tag embeddings through several neuronal model’s architecture that
is an MLP, an autoencoder and a graph-based neural collaborative architecture. We provided
empirical results from MovieLens Dataset 10 M, 20M et 25M. The main contributions of this
paper are summarized as follows:
• Integrate eficiently tag-embedding representations into several neural CF models.
• Evaluate the impact of static/contextual embedding representations and comparing model
architecture.
• Evaluate impact of multi-layer neighbor static/contextual embedding representations to
be exploited in a neural graph CF model.
• Extensive series of experiments on real data from several MovieLens data sets.
      </p>
      <p>The remaining of the paper is organized as follows. The next section presents some
background and reviews recent research works related to content-based recommendation using
neuronal networks and word vector representation. We gathered works that describe neural
approaches from a collaborative filtering point of view, specifying the most used neural
architectures. Section 3 highlights the basis of our proposed models. Section 4 details datasets,
evaluation metrics, and experimental settings. Section 5 gives the evaluation results and
discusses performance comparison with baselines. Following these sections, we will draw our
conclusion in the final section.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and related works</title>
      <p>
        DL methods have made breakthroughs in data representation learning from various data
sources. As result, recent neural recommendation models have been able to handle learning
representations of user preferences, item features and textual interactions [
        <xref ref-type="bibr" rid="ref1 ref6">6, 1</xref>
        ]. Yet, neural
recommendation models attempt to introduce in addition, tag semantic-aware representations
based on distributional tag semantic used as features [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In this area, Musto et al., [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] exploit
Word2vec approach to learn a low dimensional vector space word representation and exploited
it to represent both items and user profiles in a recommendation scenario. Zhang et al., [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
proposed to integrate traditional matrix factorization with Word2vec for user profiling and
rating prediction. Liang et al., [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] exploited pre-trained word embeddings from Word2vec to
represent user tags and construct item and user profiles based on the items’ tags set and users’
tagging behaviors. They use deep neural networks (DNNs) and recurrent neural networks
(RNNs) to extract the latent features of items and users to predict ratings. Moreover,
TagEmbedSVD [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] uses pre-trained word embedding from Word2vec for tags to enhance personalized
recommendations that are integrated to an SVD model in the context of cross-domain CF. Other
works [
        <xref ref-type="bibr" rid="ref1 ref11">11, 1</xref>
        ] take advantage of network embedding techniques to propose embedding-based
recommendation models that exploit CF approaches. Along with learning content representation
for recommendation, exploiting rating patterns often require the use of a neural network-based
embedding model that is first pre-trained. Features are extracted and integrated into a CF model
by fusing those features with latent factors thanks to non-linear transformations that better
leverage abstract content representations and so perform higher quality recommendations.
Since pre-training word embedding from large-scale corpus became widely used in diferent
information retrieval tasks, it was also exploited to generate recommendations by ranking
useritem matrix from users’ similar tags vocabulary. Models such as Word2vec [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] or GloVe [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] for
instance learned meaningful user tag representations by modeling tag co-occurrences. However,
these methods don’t consider the deep contextual information that some single content words
may sufer. Moreover, they do not handle unknown words. In contrast, contextualized word
representations such as BERT [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], have been proposed to overcome the lack of static word
embeddings, since it was shown that such contextual neural language model improves the
performance of many downstream tasks. Yet, graph-based neuronal approaches[
        <xref ref-type="bibr" rid="ref15 ref16 ref17">15, 16, 17</xref>
        ] have
considered heterogeneous graphs as they try to overcome the missing of relationship modeling
in features-based neural recommendation models. Such approaches have been proposed to
explore multi-layer neighbor embedding representations [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Neural graph network models
consider content information features extracted from either graph properties [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] or learned
from node embedding representations [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Particularly, Neural Graph Collaborative Filtering
(NGCF) approaches exploit feature representations of the user-item graph structure by
propagating either user-based or items-based content embeddings on it [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Such process is often
the result of learning aggregation functions that allow deep-based relationship modeling among
both user-item interaction and content features. In this way, Graph Convolutional Networks
(GCNs) have also been exploited through learning aggregator functions which required
additional layers to obtain a convolution neighborhood aggregation by neighborhood’s embeddings
at these layers [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. As result, deep semantic representations are extracted using embeddings
propagation on user-item graph structure. An instance of such method is used in Ying et al,.
[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] since it employs multiple graph convolution layers on an item-item graph in Pinterest 3
image recommendation.
      </p>
      <p>
        In the following, we introduce some recommendation models of the literature that have
handled neural CF approaches [
        <xref ref-type="bibr" rid="ref24 ref25 ref26">24, 25, 26</xref>
        ]. Those models resolved user rating prediction. Some
of them have been adapted to include tagging content [
        <xref ref-type="bibr" rid="ref27 ref28 ref29">27, 28, 29</xref>
        ], they are mostly composite
through which multiple neural building modules compose a single distinguishable function that
is trained end-to-end. Here, we introduce some summary definitions related to tagging that will
allow us to address later most common architectures and topologies giving recommendation
strategies for each of them. A folksonomy  can be defined as a 4-tuple  = (, , , ),
where U is the set of users annotating the set of items ,  = {1, 2, ... } where each 
is a user.  is the set of tags that includes the vocabulary expressed by the folksonomy.  is
the set of tagged items by user  = {1, 2... }.  = {, ,  } ∈  ×  ×  is the set of
annotations of each tag  to an item  by user . We have also considered  as the set of
user ratings , .
      </p>
      <sec id="sec-2-1">
        <title>2.1. MLP-based neural collaborative filtering for Recommendation</title>
        <p>
          Approaches of neural collaborative filtering (NCF) for rating prediction often involves dealing
with binary property of implicit data. Some works [
          <xref ref-type="bibr" rid="ref26 ref30 ref31">30, 31, 26</xref>
          ] have in addition discussed
the choice of the neural architecture to be implemented. A possible instance of the neural CF
approach can be formulated using a multi-layer perceptron (MLP). As addressed in [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] the input
layer (the embedding layer) is a fully connected layer that maps the sparse representations to
dense feature vectors. It consists of two feature vectors () and 
() that describe user () and
item
        </p>
        <p>() represented initially through one-hot encoding. The obtained user (item) embedding
can be seen as the latent vector for user (item). The user embedding and item embedding are
then fed into neural CF layers to map the latent vectors to prediction scores. Final output layer
is the predicted score ^,, and training is performed by minimizing the point wise loss between
^, and its target value ,. NCF predictive model can be formulated as:
^, = MLP( . (),  . ()|, , ΓMLP)
(1)
 ∈ R×  and  ∈ R×  are latent factor matrix for users and items respectively.
Γ denotes the model parameters of the interaction function that is defined as a multi-layer
neural network.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Autoencoder-Based collaborative filtering for Recommendation</title>
        <p>
          Another way to consider neural CF is to approach user-item rating as a matrix  ∈ × 
with partially observable row vectors that form a user  ∈ the set of users  = {1...} given
by the set of user ratings () = {1...} ∈  and column vectors from the set of items
 ∈  = {1...} also given by their corresponding ratings () = {1...}. An eficient
neural method to encode each partially observed vector into law-dimensional latent space is to
handle an autoencoder architecture as suggested in [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] that will reconstruct the output space
to predict missing ratings for recommendation [
          <xref ref-type="bibr" rid="ref24 ref25 ref32">25, 32, 24</xref>
          ]. Given a set of rating vectors ()
and () ∈ R, the autoencoder solves:
ℎ =  (
        </p>
        <p>+ ℎ− 1)
∑︁</p>
        <p>ℎ− 1
∈() | ()|
Where ℎ(;  ) is the reconstruction of input  ∈ R that is defined as:
 (.) and (.) are activation functions associated to the encoder and decoder respectively and
 gather model parameters;  ∈ R×  and  ∈ R×  are weight matrices and  ∈ R,  ∈ R
biases. In an item-based recommendation perspective, the autoencoder applies () as the set of
input vectors. Weights associated to those vectors are updating during backpropagation.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Neural Graph Collaborative Filtering for Recommendation</title>
        <p>
          NGCF approaches are particular in the sens that they exploit embeddings of users and items
represented initially as a graph structure. Most of them adopt a user-item bipartite graph of as it
much represents user-item interactions [
          <xref ref-type="bibr" rid="ref15 ref16 ref20">15, 20, 16</xref>
          ]. Promising recent methods suggest learning
user and item representations from their bipartite associated graph by stacking multiple
embedding propagation layers to allow high-order connectivity from user-item interactions
[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Other works [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] learn aggregator functions that induce the embedding of a new node
given its features and neighborhood. In the following we formalized what can be associated
to a neural graph-based collaborative filtering approach for user rating prediction based on
multiple embedding aggregation layers. This neural graph-oriented approach is designed to
exploit node embeddings from neighborhood aggregation. Given a bipartite weighted graph of
user-item  = (, ℰ , ,  ), with  = { ∪ }, ℰ denotes the set of undirected weighted
edges representing user ratings,  is the adjacency matrix and  ∈ R×  is defined as the
node feature matrix.
        </p>
        <p>Let ℎ0 =  with  ∈  be the user node feature at the 0th layer. Then, At the k-th layer :
(2)
(3)
(4)
ℎ− 1 is the embedding of user node  ∈  from previous layer. | ()| is the number of the
neighbors of node . The sum expressed in the equation enables aggregate neighboring features
of node  from previous layer.  is the activation function (Tanh) that enables non-linearity. 
and  are trainable parameters. The final embedding after K layers ( ∈ {1...}) is extracted
from the output layer:  = ℎ after K layers. This can be expressed as a matrix multiplication
form for the whole graph as:</p>
        <p>+1 =  (0 + ˜1)</p>
        <p>In such a way that ˜ = − 1/2− 1/2 with  represents adjacency matrix and  represents
the degree matrix. Thereafter, after applying similar process to item nodes embeddings to get
 with  ∈ , one way is to employ a concatenated operator ⊕ on both user and item final
embeddings to obtain ⊕  =  ⊕  that represents the edge embedding , between a user
node  and item node , with , = [, ]. These edge embeddings are passed through a
link regression layer to obtain predicted user-item ratings. The model is trained end-to-end by
minimizing a regression loss function (RMSE or root mean square error between predicted and
true ratings) using stochastic gradient descent (SGD) updates of the model parameters, with
minibatches of user-item training edges fed into the model.
(5)
(6)</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Overview of the proposed models</title>
      <p>In this section, we introduce our tag-aware neural models for recommendation. More explicitly,
we integrate tag-based embeddings into CF neural architectures, namely a Multilayer perceptron,
an autoencoder and a neural graph based model. More explicitly, to integrate side information
into predictive neural models a naive approach consists of appending additional user/item bias
to the rating prediction. We estimate that computing those biases can be handled either by
hand-crafted engineering or by implementing an appropriate CF strategy. A simple Neural
collaborating filtering framework architecture implies considering the input layer(embedding
layer) as a fully connected layer that projects sparse representation of users and items to dense
vectors. To integrate explicitly tags vocabulary in a neural model for rating prediction, we have
made use of feature vectors that we have considered as tag vector representations sharing a
common embedding space using projection matrices. The obtained user (item) embedding can
be seen as the latent vector for the user (item) in the tag latent space. Feature vectors () and
() are reconsidered since we have projected tag representations into lower dimension using
projected matrices E and F. Consequently, tag-based vector representation is expressed as a
˜
user feature vector  :
(˜)
˜

(˜) =
1</p>
      <p>∑︁ ()
|| ∈
(˜˜) is expressed as:</p>
      <p>Such as  ∈ R is the embedding vector associated with tag k, and c denotes the embedding
dimension.  denotes the projection matrix with  ∈ R× .</p>
      <p>Similarly, if  denotes the projection matrix with  ∈ R× , then the item feature vector
(˜˜) =
1
1–19
(7)</p>
      <p>We denoted  the set of tags of a user  and  as the set of related tags describing a particular
item. Moreover, we have obtained embeddings for tags from Word2vec and BERT pre-trained
neural model by handling projection matrices E and F ∈ R× .</p>
      <sec id="sec-3-1">
        <title>3.1. CF-based MLP model</title>
        <p>Extended tag-based NCF predictive model can be reformulated relying on the previous NCF
model that has been described in section 3.1 equation (1) as:</p>
        <p>^, = MLP( (˜˜), (˜˜),  MLP) (8)
The user and item embeddings can be fed into a multi-layer neural model.</p>
        <p>Where, ^(, ) is the rating score for a user on an item. Figure 1() details an instance
of the model. Prediction Pipeline exploits user and item vectors extracted from dense space
representation (Figure 1()), hidden layers are added to learn interactions between user and
item latent features, a regressor at the last hidden layer is set to produce the final rating. ( Figure
1()) is a dynamic module in which dense representations are computed through inner product
of user and items embedding’ representations. Tag embedding representations are extracted
from neural pre-trained language model (Figure 1(ℰ)).</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. CF-based Autoencoder model</title>
        <p>
          Following the autoencoder paradigm, instead of encoding user vectors containing user ratings
to be predicted like in Autorec [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], we have extended a multilayered autoencoder architecture
to integrate element wise product of pre-trained tag-based embeddings. Such embeddings are
concatenated with the user rating representations and are projected on a dimensional latent
(hidden) space. As such, user’ rating (, ) of a particular user is reconstructed using an
objective function  that minimizes :
∑︁ ||(, ) ⊕
        </p>
        <p>˜ ˜
((˜) ⊗ (˜)) − ℎ((, ) ⊕
((˜˜) ⊗ (˜˜));  )||2
(9)</p>
        <p>Where ((, ),  ) is the reconstruction of the input (, ) ∈ R. The operator ⊗
denotes element-wise multiplication between user and item feature vectors. The operator ⊕
denotes a concatenation operator. ℎ is the selected activation function. Figure 1(ℬ) presents
a detailed instance of the model. Prediction Pipeline exploits user and item vectors extracted
from dense space representation. Such representations are concatenated with user rating and
fed as input of the autoencoder model. Layers are added to learn interactions between user and
item latent features to be compressed in a dense space. User’s ratings reconstruction from the
dense space produce the final rating.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Neural graph CF-based model</title>
        <p>
          As part of collaborative filtering approaches, neural graph- based networks consider for the
most [
          <xref ref-type="bibr" rid="ref15 ref19 ref20">20, 19, 15</xref>
          ] bipartite graphs of users and items in a recommendation context, where edges
represent the rating interactions between the users and the items. From the bipartite graph 
defined in section 2.1.3 where nodes’ classes are derived from the set of user nodes  and the
set of item nodes  respectively. Each edge corresponds to whatever user’s rates an item. Each
edge , ∈ ℰ is associated to a value (,) ∈ {0, 1}.In order to learn the topological structure of
each class of node neighborhoods, the idea is to aggregate feature information from node’s local
neighborhood [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], however in this paper we handled node’s features from pre-trained static
and contextual tag embeddings model. Users’ nodes features are taken from mean average users’
tags embedding vectors, equivalently items’ nodes features are represented throws the mean
average of their tag embeddings vectors. We have previously explored a simple neighborhood
aggregation process in section 2.0.3. By defining a neighborhood function  (), that is set to a
ifxed-size (in our experiments K=2), the bipartite graph is sampled as the model learn a function
that generates aggregates from tag-based textual feature node neighbors. This method can
be generalized by applying diferent aggregation methods to nodes ∈  by concatenating the
features with the nodes itself. For this purpose, we have associated each node  ∈ { ∪ } to
˜
features from word vector representation by joining tag-based vector representation (˜) and
(˜˜) (Figure 1()). We have designed a mean aggregation function that is commonly used since
it imply element wise mean of the feature vectors in ℎ− 1. We have also designed a convolution
aggregator function that we have detailed next.
        </p>
        <sec id="sec-3-3-1">
          <title>3.3.1. Mean aggregator function</title>
          <p>Since the rating interactions between users and items are represented as a bipartite graph
 = (, , ),  and  corresponds respectively to users and items sets. Thus, aggregation
mean tag embedding features from the neighbors of the node  ∈ { ∪ } is processed given
the following update rule (Figure1 (′)):</p>
          <p>1
ℎ() = | ()|
We give the forward pass through layer  as follows:
[ℎ− 1]
ℎ =  ([[ℎ− 1], ℎℎ ()] + )
Where, ℎ is the output node  at layer ,
 and ℎ are trainable parameters,  is an optional bias,  is node feature
dimensionality at layer ,  is a non linear activation function (Tanh),  is a random dropout with
probability  applied to its argument vector used to reduce model’s over-fitting.  () represent
the neighborhood of a node  ∈ { ∪ } . The number of trainable parameters in layer k for
the mean aggregator is .− 1 + .</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>3.3.2. Convolutional aggregator function</title>
          <p>
            To generalize the collaborative filtering process from a graph convolutional network perspective,
we adopted a GCN aggregator [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ] (Figure1 (′)), that concatenates nodes from the previous
layer representation ℎ− 1 with the aggregated neighborhood vectors ℎ(). Features are
updated given the following equation:
          </p>
          <p>Forward pass through layer  is defined as:</p>
          <p>1
ℎ() = | ()| + 1
(ℎ− 1 +</p>
          <p>∑︁
∈()</p>
          <p>ℎ− 1)
ℎ =  ( .ℎ() + )
(10)
(11)</p>
          <p>Where,  , is a trainable weight matrix, shared between all nodes  ∈ { ∪ }. The
size of   is given as  × − 1. The number of trainable parameters in layer  for the GCN
aggregator is .− 1 + .</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>In this section, we have conducted experiments intending to answer the following research
questions:</p>
      <p>RQ1: Are tag-based contextual embeddings eficient representations to be used in a neural
CF model compared to static tag-based embedding representations?</p>
      <p>RQ2: Which extended neural collaborative architecture perform significant improvement
and ranking quality for a rating prediction task?</p>
      <p>From there, an underlying research question can be derived, it concerns the various methods
used for aggregating tag embeddings. Assuming that, the methods used for aggregating tag
embeddings may afect the performance of recommendation models.</p>
      <p>RQ3: Are contextual neural graph embeddings more eficient representations to be used
in a neural collaborative filtering architecture ? regarding such process, which aggregator
function should leads to better recommendation performance? A mean aggregator function? a
convolutional aggregator function?</p>
      <sec id="sec-4-1">
        <title>4.1. Experimental Settings</title>
        <p>
          1. Datasets: The data sets describe 5-stars ratings and free-text tagging from MovieLens,
a movie recommendation service. We extracted user annotations from the ML-10M,
ML-20M, and ML-25M data sets. Only users that have annotated and rated at least 20
movies were selected. We observed from Table 1 an unequal distribution of user rating
classes, because of users trend scoring items with good rating values. This can lose
models capacity to generalize. To overcome, we over-sample minority classes [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] by
duplicating samples from the minority class and adding them to the training data.
2. Hyper-parameters: After splitting the data in each dataset into random 90%, 10%
training and testing sets, we hold 10% of the training set for hyper-parameters tuning.
Then, we conducted 5 cross-fold validation strategy in each dataset and averaged RMSE
measure. We have applied a grid search for hyper-parameters tuning such as the learning
rate that we tuned among values ∈ {0.0001, 0.0005, 0.001, 0.005}, latent dimensions
∈ {100, 200, 300, 400, 500, 1000} for both autoencoder and MLP architecture. We
handled the Neural Collaborative Autoencoder with a default rating of 2.5 for testing set
without training observations. Graph neuronal and convolutional models handled same
dataset, except that models derived from these approaches handle edges prediction throw
bipartite graph samples. We tuned the dropout ratio 4 from values ∈ {0.0, 0.1, , 0.8}, we
have also defined the neighbor nodes embeddings features at a particular layer of 2. The
models were optimized thanks to the well known Adam optimizer.
3. Evaluation Metrics: We have evaluated rating prediction using two metrics: Mean
Absolute Error (MAE) and Root Mean Square Error(RMSE). Both of them are widely
used for rating prediction in recommended systems. Given a predicted rating ^, and a
ground-truth rating , from the user  for item , the RMSE is computed as:
  = √︃ 1 ∑︁ (, − ^,)2
,
(12)
        </p>
        <p>Where  indicates the number of ratings between users and items.
4The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which
helps prevent over-fitting</p>
        <p>MAE is computed as follows:
  =</p>
        <p>
          ,
1 ∑︁ |, − ^,|
Indeed, we have also evaluated ranking accuracy using NDCG (Normalized Discounted
Cumulative Gain [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ]) at 10. For this purpose, we assumed rating values at 5 as being
a good appreciation of a user regarding a movie. In contrast, rating values under 3 are
considered as bad. Hence, the rating value of each movie is used as a gained value for
its ranked position in the result. The gain is summed from the ranked position from 1
to . To compute  , relevance scores are set to six(5) points scale from 1 to 5 and
denotes the relevance score from low to strong relevance. We set the Ideal DCG for user
movies ranked in decreasing order of their ratings. NDCG values presented further are
averaged over user testing set.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Tag-based embedding representations</title>
        <p>
          We have considered tag-based embeddings thanks to word vector representations. We have
extracted such tag-based embedding representations from pre-trained neural language models.
Owing to the users’ writing discrepancy, users’ tags semantic meaning is often ambiguous.
Tags can be composed of several words and may contain subjective expressions. They can also
be unique words which can occasionally lead to a lack of context. That makes it dificult to
integrate tags explicitly in an efective neural CF architecture. Our main objective is to map
users, items and their tags’ interaction in the same latent space. Rather to exploit straightly
dimensional latent space representations of users and items like in most neural collaborative
approaches [
          <xref ref-type="bibr" rid="ref30 ref35">30, 35</xref>
          ], we propose to project first both users’ and items’ representations into a
dense tag space representation. Both previous neural approaches are somehow representative
of our objective since they are from CF. We assume that users and items are represented by
their corresponding tags. Particularly, they are represented from the aggregate average of their
tag embedding representations.
        </p>
        <p>
          1. Static Word2vec tag-based embdddings: We have handled static tag-based embedding
vectors from Word2vec. We have exploited pre-trained vectors trained on part of Google
News dataset (about 100 billion words) and have extracted user’s tags embedding by
associating them to a vector of a well known fixed size for each tag. However, we found
that some tags were out of tag vocabulary, since those user tags represent respectively 8%,
5%, 5% of our Movielens Datasets 10M, 20M and 25 millions ratings. We fixed this issue
by initiate those samples with random vector values. The inability to handle unknown or
out-of-vocabulary words is one of limitation encountered when using such pre-trained
model. Finally, each set of tags per user is represented through a multidimensional vector
of  = 300.
2. Contextualized BERT tag-based embdddings: We have addressed extracting contextualised
embeddings from BERT neural language model. For this purpose, we have assumed that
the fist token which is ’[CLS]’ that captures the context is treated as sentence embeddings
[
          <xref ref-type="bibr" rid="ref36">36</xref>
          ]. The word embedding sequence corresponding to each set of tags is entered into
the pre-trained model. We have then handled the activation from the last layers of
BERT model since the features associated with the activation in these layers are far more
complex and include more contextual information. These contextual embeddings are
used as input to our proposed models. Thus, each set of tags per user is represented
through a multidimensional embedding vector of  = 768. We have implemented the
pre-trained bert-base model 5 (12 blocks of hidden dimension 768, 12 heads for attention)
and defining the ’[CLS]’ which indicates the beginning of a sequence as well as the ’[SEP]’
that we used as a separation between two tags of a same sequence.
        </p>
        <p>Collection
Number of users</p>
        <p>Number of movies
TAS( Tag assignment)</p>
        <p>Ratings
Nodes
Edges
Period</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Evaluation and Performance comparison</title>
      <p>
        First, to solve the RQ1, we extended neural models [
        <xref ref-type="bibr" rid="ref25 ref30">30, 25</xref>
        ] by handling static and contextual
tag-based embedding representations. We compared those models with recent neural models
from CF that we set as baselines. We evaluated rating score accuracy using RMSE (Root Mean
Square Error) and MAE (Mean Absolut Error). Then, to address RQ2, we have implemented an
MLP and an autoencoder-based CF architecture then, we compared the performance of each
neural model according to tag-based embedding representations with which such models were
integrated. Moreover, ranking accuracy metric was carried out among the diferent neural
models using NDCG (Normalized Discount Cumulative Gain) at 10. Finally, to answer RQ3, we
managed to exploit user/item based tag embeddings thanks to an aggregate function that is
learned from training samples of user-item graphs. Such function operates either by performing
element wise multiplication between the tag embedding neighbor vectors of a given node or
by concatenating tag embedding vectors with their tag embedding neighbor vectors to get the
embedding of that node.
      </p>
      <p>We have detailed bellow all the models that are included in the neural models Comparative
study.</p>
      <p>
        • Neural GMF-MLP[
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]: Is a neural CF approach that exploits a multi-layer perceptron
(MLP) to learn the user–item interaction function. The bottom input layer consists of two
5BERT was pre-trained on a corpus composed of 11,038 unpublished books belonging to 16 diferent domains and
2,500 million words from English Wikipedia text passages
vectors that describe user u and item i in a binarized sparse vector (one-hot encoding),
such model employ only the identity of a user and an item as input feature.
• Neural CF-MLP++: Is an extension of Neural CF-MLP, the model integrates in the
bottom input layer two feature vectors that are described as tag embedding features of
users and items. These features are extracted from word vector representation. User and
item feature vectors are extracted from tag-based embeddings, with 300-dimensional
word vectors from pre-trained Word2vec model Neural CF-MLP++ 2 and Neural
CF-MLP++ that exploits a 768-dimensional word vectors from pre-trained BERT
model.
• U-Autorec [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] U-AutoRec is a neural CF framework for rating prediction that exploits
an autoencoder architecture. It takes user vectors as input and reconstructs them in
the output layer. The values in the reconstructed vectors are the predicted value of the
corresponding position.
• CF-Autoencoder++ Our autoencoder-based neural collaborative approach that
integrates as input tag embedding features by performing element-wise multiplication on
their word vector representations and do concatenate such representations with user/item
rating vectors to get the reconstructed ratings. We have termed the autoencoder-based
model using static tag vector representations as CF-Autoencoder ++ 2 meanwhile
CF-Autoencoder++ stands for autoencoder-based model using contextual tag
vectors.
• CF-GNN++ (=2) Our NGCF tag-based predictive model that generates node
embeddings by sampling and aggregating features (tag embeddings) from nodes local
neighborhood using a mean aggregation function that operates at neighborhood of  = 2.
We distinguish between the NGCF model that handles features extracted from tag-based
embeddings using 300-dimensional tag vectors extracted from pre-trained Word2vec
model and that we term CF-GNN (=2) 2 and CF-GNN (=2)
that exploits 768-dimensional tag vectors from pre-trained BERT model.
• CF-GCN++ (=2) We do consider this NGCF model as being convolutional since it
learn convolutional aggregator function that concatenate the node’s previous layers
representations with the aggregated neighborhood vectors. We diferentiate between the model
that handles features extracted from tag-based static embeddings with 300-dimensional tag
vectors from pre-trained Word2vec model and that we term CF-GCN (=2) 2
and CF-GCN (=2) that exploit 768-dimensional tag vectors from pre-trained
BERT model.
• Hinsage [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] is a model that employs a technique for computing node representations in
an inductive way. This method operates by sampling a fixed-size neighborhood of each
user/item node and then performing a specific aggregator over all the sampled neighbors’
feature vectors. This model learns general-purpose node embeddings that use the graph
structure and particularly node features. It was evaluated for a rating prediction task
using demographic users information (no tags information).
• TRSDL [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]: Tag-aware recommender system that uses a deep neural networks (DNNs)
and recurrent networks (RNNs) to extract latent features of both users and items. In
their model Liang et al., [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] use Word2Vec for mapping user tags to k-dimensional dense
vectors in order to represent tags with word embeddings. Their model have the ability to
construct item and user profiles based on the item’s tags and the user’s tagging behaviors.
They then utilizes deep neural networks (DNNs) and recurrent neural networks (RNNs)
to extract the latent features of the item and the user, respectively.
      </p>
      <p>
        Models
Neural CF-MLP++2
Neural CF-MLP++
CF-Autoencoder++2
CF-Autoencoder++
U-Autorec [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]
Neural CF-MLP[
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]
CF-GNN++ (=2)  2
CF-GNN++ (=2)
CF-GCN++ (=2)2
CF-GCN++ (=2)
HINSAGE [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
TRSDL [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
      </p>
      <sec id="sec-5-1">
        <title>5.1. Efects on recommendation quality and ranking (RQ1)</title>
        <p>
          Results of our experiments are synthesized in Table 2. Initially, as regards to ML-10M dataset,
top RMSE and MAE scores are valued from CF-GCN++ Agg(=2) model with   =
0.715 and   = 0.791. Our proposed contextual tag embeddings based NGCF model
has also achieved top quality ranking to reach  @10 = 0.48. We have noticed that the
static tag-based embedding extension of this model that is CF-GCN++ Agg(=2) 2 has
also achieved good results outperforming most of the baselines except TRSDL model [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] that
has reached   = 0.73,   = 0.810 with a ranking metric of  @10 = 0.45.
Regarding to Hinsage model [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] that reached   = 0.75,   = 0.85 with a ranking
score of  @10 = 0.48 and CF-GNN++ Agg(=2) model that reached   =
0.774 and   = 0.89 with a ranking quality that achieved  @10 = 0.451, we
might be tempted at first sight to claim that NGCF approaches describe strong performance
compared with other neural collaborative approaches no matter which tag embeddings we
have integrated to the models. However, by considering the significant performance of the
neural models that integrate contextualized tag embeddings such as Neural CF-MLP++ that
has achieved scores valued to   = 0.72 and   = 0.93 or even the autoencoder
model CF-Autoencoder++ that has risen   = 0.96 and   = 0.76, our thoughts then
focused to determine which model’s architecture performs better among all the proposed neural
architectures that efectively do integrate static/contextual tag embedding representations or
those who additionally have aggregated tag-based neighborhood embeddings.
        </p>
        <p>
          Furthermore, in ML-20M dataset, the same NGCF model named
CFGCN++ Agg(=2) has shown top RMSE and MAE score with   = 0.723
and   = 0.802. This confirms the performance of NGCF approaches combined
with contextualized tag embeddings. It also appeared that such models reach top quality
ranking, additionally, ranking metric score shown that the most competitive baseline is
Hinsage [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] with a ranking quality that does not exceed  @10 = 0.448. Both
CF-GCN++ Agg(=2) and CF-GNN++ Agg(=2) models have the highest
ranking scores with  @10 = 0.47 and  @10 = 0.441 respectively. This is
the case even if those models do not use the same aggregation technique nor the same
tag embeddings process. In this regard, we found that mean aggregator function which is
operated with static tag embeddings in a NGCF process named CF-GNN++ Agg(=2) 2
has performed well and obtained   = 0.80,   = 0.94 with a ranking quality
of  @10 = 0.464 which is a score that outperforms the autoencoder-based model
extension named CF-Autoencoder++ with  @10 = 0.44 since this model has
already achieved   = 0.811 and   = 0.89.This demonstrates the eficiency of such
aggregation function.
        </p>
        <p>Finally, in ML-25M dataset, impact of contextualized tag embeddings on models is
definitely established since both RMSE and MAE scores have shown significant improvements
compared to baselines. Such is the case for Neural CF-MLP++ model that has reached
  = 0.791,   = 0.83 for a quality ranking of @10 = 0.46. It is likewise for
CF-Autoencoder++ model with RMSE and MAE scores to   = 0.79,   = 0.86
and a ranking quality to  @10 = 0.445. On top of that, impact of aggregator
functions are also distinguishable through NCGF model scores since we noticed that results were
much improved using a convolutional aggregator function applied to contextualized tag
embeddings. CF-GCN++ Agg(=2) model performed best RMSE and MAE scores comparing
to CF-GNN++ Agg(=2) model which exploits a mean aggregator function despite
such model integrates contextualized tag embeddings. We ensure that those results can be
strengthened by increasing the training data.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Efects on error distribution (RQ2)</title>
        <p>In the following, we have discussed the efectiveness of our approaches on predicting user
ratings with an acceptable amount of error. We highlighted impact of exploiting contextualized
tag-based embedding representations through studying error distribution when predicting
user ratings. Such impact is summarized at the top of Figure 2. Error distribution values have
been presented among testing sets of the data sets ML-10M, ML-20M and ML-25M. This is to
propose an overview of the error distributions resulted from baselines compared with those
from our predictive models that do integrate tag-based static or contextualized embedding
representations and describe specific architectures for each model.</p>
        <p>
          First, in ML-10M dataset we observe that error distribution values from the models exploiting
contextual tag embeddings such as CF-MLP++ and CF-Autoencoder++ are most located
in the interval ∈ [
          <xref ref-type="bibr" rid="ref1">− 1, 1</xref>
          ] compared to the error distribution values of the other baselines.
We also observe that the NGCF models that are CF-GCN++ Agg(=2) and
CFGNN++ Agg(=2) outperforming all other models with a number of 980 and 890
accurate predictions respectively. Secondly, in ML-20M we notice that CF-GCN++ Agg(=2)
model conduct to a large number of accurate predictions which are estimated to be 7220. Such
performance is closely followed by the CF-GNN++ Agg(=2) with a number of 4250
accurate predictions. Lastly, in ML-25M the same models reached 7980 and 7740 accurate
predictions respectively.
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Impact of learning aggregated tag-based functions (RQ3)</title>
        <p>We have given for each model the validation scores after 20 epochs, this allows us to estimate
the model’s capacity to generalize past the data that it was trained on. From the bottom of the
ifgure 2 we have Analyzed which models perform optimal convergence rate. It appears that
among the three collections that are ML-10M, ML-20M and ML-25M, the convergence rate of the
models are clearly more significant when it comes from neural graph approaches. Particularly,
CF-GNN++ Agg(=2) and CF-GCN++ Agg(=2) that are our NGCF models
that exploit fine-tuned tag embedding representations. This leaves us to believe that when
contextualized tag embeddings are aggregate throw neighborhood embeddings they give more
efective representations of users and items and enhance recommendation quality. We argue
that our NGCF approaches catch the multiple semantic dimensions that a tags can take have
including the abstract formalization of tag neighborhood embeddings that have conducted to
ifne-gained representations.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Following the experiments, we came to the conclusion that exploiting neural graph models to
learn aggregation functions has enabled us to gain quality recommendations and improve
ranking quality. We have shown that handling a convolutional aggregator function can generalize
an eficient graph-based neural collaborative filtering process. It concatenates contextualized
tag embedding representations of user/item nodes from previous layer representations. This has
enabled us to gain more refined embedding features and achieved to catch non-trivial tagging
behavior.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H. A. M.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sansonetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gasparetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Micarelli</surname>
          </string-name>
          ,
          <article-title>Semantic-based tag recommendation in scientific bookmarking systems</article-title>
          ,
          <source>in: Proceedings of the 12th ACM Conference on Recommender Systems</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>465</fpage>
          -
          <lpage>469</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Manotumruksa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Ounis</surname>
          </string-name>
          ,
          <article-title>Modelling user preferences using word embeddings for context-aware venue recommendation</article-title>
          ,
          <source>arXiv preprint arXiv:1606.07828</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rücklé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Eger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Peyrard</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Concatenated power mean word embeddings as universal cross-lingual sentence representations</article-title>
          , arXiv preprint arXiv:
          <year>1803</year>
          .
          <volume>01400</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Quan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <article-title>A context-aware user-item representation learning for item recommendation</article-title>
          ,
          <source>ACM Transactions on Information Systems (TOIS) 37</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y. E.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-J. Wu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Hazelwood</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Brooks</surname>
          </string-name>
          ,
          <article-title>Exploiting parallelism opportunities with deep learning frameworks</article-title>
          ,
          <source>ACM Transactions on Architecture and Code Optimization (TACO) 18</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jiao</surname>
          </string-name>
          ,
          <article-title>Hybrid neural recommendation with joint deep representation learning of ratings and reviews</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>374</volume>
          (
          <year>2020</year>
          )
          <fpage>77</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Musto</surname>
          </string-name>
          , G. Semeraro,
          <string-name>
            <surname>M. De Gemmis</surname>
          </string-name>
          , P. Lops,
          <article-title>Word embedding techniques for contentbased recommender systems: An empirical evaluation</article-title>
          ., in: Recsys posters,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yuan</surname>
          </string-name>
          , J. Han,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Wang,
          <article-title>Collaborative multi-level embedding learning from reviews for rating prediction</article-title>
          .,
          <source>in: IJCAI</source>
          , volume
          <volume>16</volume>
          ,
          <year>2016</year>
          , pp.
          <fpage>2986</fpage>
          -
          <lpage>2992</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Liang</surname>
          </string-name>
          , H.-T. Zheng,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Sangaiah</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-Z. Zhao</surname>
          </string-name>
          ,
          <article-title>Trsdl: Tag-aware recommender system based on deep learning-intelligent computing systems</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>8</volume>
          (
          <year>2018</year>
          )
          <fpage>799</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Vijaikumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shevade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Murty</surname>
          </string-name>
          , Tagembedsvd:
          <article-title>Leveraging tag embeddings for cross-domain collaborative filtering</article-title>
          ,
          <source>in: International Conference on Pattern Recognition and Machine Intelligence</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>240</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-F.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Exploiting pre-trained network embeddings for recommendations in social networks</article-title>
          ,
          <source>Journal of Computer Science and Technology</source>
          <volume>33</volume>
          (
          <year>2018</year>
          )
          <fpage>682</fpage>
          -
          <lpage>696</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Distributed representations of sentences and documents</article-title>
          ,
          <source>in: International conference on machine learning</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1188</fpage>
          -
          <lpage>1196</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , Glove:
          <article-title>Global vectors for word representation</article-title>
          ,
          <source>in: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>W.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ying</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <article-title>Inductive representation learning on large graphs</article-title>
          ,
          <source>in: Advances in neural information processing systems</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1024</fpage>
          -
          <lpage>1034</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>T. N.</given-names>
            <surname>Kipf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          ,
          <article-title>Semi-supervised classification with graph convolutional networks</article-title>
          ,
          <source>arXiv preprint arXiv:1609.02907</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Z.-K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T. Zhou, Y.-C. Zhang,
          <article-title>Tag-aware recommender systems: a state-of-the-art survey</article-title>
          ,
          <source>Journal of computer science and technology 26</source>
          (
          <year>2011</year>
          )
          <fpage>767</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dai</surname>
          </string-name>
          , L. Song,
          <article-title>Discriminative embeddings of latent variable models for structured data</article-title>
          ,
          <source>in: International conference on machine learning</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>2702</fpage>
          -
          <lpage>2711</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Grover</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          , node2vec:
          <article-title>Scalable feature learning for networks</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>855</fpage>
          -
          <lpage>864</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Grarep: Learning graph representations with global structural information</article-title>
          ,
          <source>in: Proceedings of the 24th ACM international on conference on information and knowledge management</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>891</fpage>
          -
          <lpage>900</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Feng</surname>
          </string-name>
          , T.-S. Chua,
          <article-title>Neural graph collaborative filtering</article-title>
          ,
          <source>in: Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>165</fpage>
          -
          <lpage>174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Learning fair representations for recommendation: A graph-based perspective</article-title>
          ,
          <source>in: Proceedings of the Web Conference</source>
          <year>2021</year>
          ,
          <year>2021</year>
          , pp.
          <fpage>2198</fpage>
          -
          <lpage>2208</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ying</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Eksombatchai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <article-title>Graph convolutional neural networks for web-scale recommender systems</article-title>
          ,
          <source>in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>974</fpage>
          -
          <lpage>983</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          , W. Liu,
          <string-name>
            <given-names>W.</given-names>
            <surname>Rong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <article-title>Autoencoder-based collaborative filtering</article-title>
          ,
          <source>in: International Conference on Neural Information Processing</source>
          , Springer,
          <year>2014</year>
          , pp.
          <fpage>284</fpage>
          -
          <lpage>291</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sedhain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Menon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sanner</surname>
          </string-name>
          , L. Xie, Autorec:
          <article-title>Autoencoders meet collaborative ifltering</article-title>
          ,
          <source>in: Proceedings of the 24th international conference on World Wide Web</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Rijke</surname>
          </string-name>
          ,
          <article-title>Joint neural collaborative filtering for recommender systems</article-title>
          ,
          <source>ACM Transactions on Information Systems (TOIS) 37</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Dziugaite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <article-title>Neural network matrix factorization</article-title>
          ,
          <source>arXiv preprint arXiv:1511.06443</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gong</surname>
          </string-name>
          , L. Jiao,
          <article-title>Tag-aware recommender systems based on deep neural networks</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>204</volume>
          (
          <year>2016</year>
          )
          <fpage>51</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Noroozi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Joint deep modeling of users and items using reviews for recommendation</article-title>
          ,
          <source>in: Proceedings of the tenth ACM international conference on web search and data mining</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>425</fpage>
          -
          <lpage>434</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          , T.-S. Chua,
          <article-title>Neural collaborative filtering</article-title>
          ,
          <source>in: Proceedings of the 26th international conference on world wide web</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>173</fpage>
          -
          <lpage>182</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , H.-T. Zheng,
          <string-name>
            <given-names>X.-X.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <article-title>Extracting deep semantic information for intelligent recommendation</article-title>
          ,
          <source>in: International Conference on Neural Information Processing</source>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>134</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>R.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <article-title>Tnam: A tag-aware neural attention model for top-n recommendation</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>385</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. W.</given-names>
            <surname>Bowyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. O.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. P.</given-names>
            <surname>Kegelmeyer</surname>
          </string-name>
          ,
          <article-title>Smote: synthetic minority over-sampling technique</article-title>
          ,
          <source>Journal of artificial intelligence research 16</source>
          (
          <year>2002</year>
          )
          <fpage>321</fpage>
          -
          <lpage>357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>K.</given-names>
            <surname>Järvelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kekäläinen</surname>
          </string-name>
          ,
          <article-title>Cumulated gain-based evaluation of ir techniques</article-title>
          ,
          <source>ACM Transactions on Information Systems (TOIS) 20</source>
          (
          <year>2002</year>
          )
          <fpage>422</fpage>
          -
          <lpage>446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          , T.-S. Chua,
          <article-title>Outer product-based neural collaborative ifltering</article-title>
          , arXiv preprint arXiv:
          <year>1808</year>
          .
          <volume>03912</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          , arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>10084</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>