<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GInRec: A Gated Architecture for Inductive Recommendation using Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Theis E. Jendal</string-name>
          <email>tjendal@cs.aau.dk</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Lissandrini</string-name>
          <email>matteo@cs.aau.dk</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Dolog</string-name>
          <email>dolog@cs.aau.dk</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katja Hose</string-name>
          <email>2katja.hose@tuwien.ac.at</email>
          <email>khose@cs.aau.dk</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>We have witnessed increasing interest in exploiting KGs to integrate contextual knowledge in recommender systems in addition to user-item interactions, e.g., ratings. Yet, most methods are transductive, i.e., they represent instances seen during training as low-dimensionality vectors but cannot do so for unseen instances. Hence, they require heavy retraining every time new items or users are added. Conversely, inductive methods promise to solve these issues. KGs enhance inductive recommendation by ofering information on item-entity relationships, whereas existing inductive methods rely purely on interactions, which makes recommendations for users with few interactions sub-optimal and even impossible for new items. In this work, we investigate the actual ability of inductive methods exploiting both the structure and the data represented by KGs. Hence, we propose GInRec, a state-of-the-art method that uses a graph neural network with relation-specific gates and a KG to provide better recommendations for new users and items than related inductive methods. As a result, we re-evaluate state-of-the-art methods, identify better evaluation protocols, highlight unwarranted conclusions from previous proposals, and showcase a novel, stronger architecture for this task. The source code is available at: https://github.com/theisjendal/kars2023-recommendation-framework.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In Recommender Systems (RSs), an item is recommended
to a user based on their preferences. Usually, these
preferences are extracted from a user’s historic interactions
with items, such as clicks or purchases. A RS can either
recommend based on user-item interactions, based on
descriptive features of the items, or both. In the first
case, for example in the movie domain, the system would
assume that users that watch the same movies are likely
to do so also in the future; this approach is commonly
referred to as Collaborative Filtering (CF) [
        <xref ref-type="bibr" rid="ref22">1, 2, 3, 4, 5</xref>
        ]. In
the second case, instead, the system would assume that
the user is likely to watch movies of similar genres and
plots of movies they watched in the past, i.e. a
contentbased method. Challenges arise for the former approach
when, for a given user, only very few interactions are
known; a similar challenge arises when information
describing items is scarce. The idea is then to combine both
kinds of information. In this regard, recently, RSs have
been proposed to model knowledge about items derived
from a KG [6, 7, 8, 9, 10, 11, 12]. A KG represents entities
and their attributes as nodes and edges within a graph
model, e.g., taxonomies, item descriptions, or categories
attached to items. These models further integrate
useritem interactions into the graph, obtaining in this way a
Collaborative KG (CKG), as in Figure 1.A. Allowing for
recommendations to users that have only a few ratings
or with newly added items that have none at all.
Inception
      </p>
      <p>Don Jon
Alex</p>
      <p>Aiden</p>
      <p>Max
GInRec (
GNN
GInRec (
GNN
;
;</p>
      <p>Max
The Prestige</p>
      <p>American
Hustle
The Prestige
) =
) =</p>
      <p>Heist
Sci-Fi</p>
      <p>Crime
Fiction</p>
      <p>Action</p>
      <p>Fiction
Tragedy</p>
      <p>Drama
× = likes?</p>
      <sec id="sec-1-1">
        <title>For example, in Figure 1.B, we are making predictions</title>
        <p>for a user for whom we do not have any information
(empty the embedding vector) except for a few rated
items. The KG connects directors, genres, and actors to
the rated movies, where some of these entities are
described by textual information, e.g., bios and synopses.</p>
        <p>We can use the connections and data to infer user
preferences beyond the collaborative signals.</p>
        <p>
          Many existing methods only work in a transductive
setting; that is, it is assumed that all users and items have
been seen during training [13, 14], meaning transductive ∈ℐ and I, =0 if we do not have any information about
models require retraining whenever new users or items the specific pair, e.g., if the user has never interacted
are introduced. Instead, some models try to ofer induc- with the item. Then, given the matrix I and the KG
tive capabilities [13, 15, 14]. In an inductive setting, users  , the CKG  c∶⟨ c, ℛc, ℒc⟩, is an extension of  ,
havand items exist that are not in the training set; therefore, ing  c=∪ , ℒc=ℒ∪{Likes}, and ℛc=ℛ∪{(, Likes, ) |
they extract information from the data to incorporate ∀∈ , ∈ℐ s.t. I, =1}.
local structures to obtain an inductive bias. Nonetheless, Finally, every node ∈ c is associated with a set of
existing methods usually model only user-item interactions, node features, assuming a function ∶ c↦ℝ exists,
ignoring the KG [
          <xref ref-type="bibr" rid="ref30">16, 13, 14, 17</xref>
          ]. Our analysis of existing called the feature function, assigning to each node a
feaworks (see Section 3) identified four important limita- ture vector of dimension  . Typically, this vector provides
tions: (i) the reliance on user metadata [
          <xref ref-type="bibr" rid="ref26 ref30">17, 18</xref>
          ], (ii) the a  -dimensional encoding of the node’s contents, e.g., in
tendency to rely exclusively on collaborative informa- this and other works [13] the word embedding of the
textion and to bias preference over popular items [13, 15, 14], tual description obtained by literal values attached to the
(iii) the poor scalability of methods that create user-item nodes are used. However, since we do not want to use, or
subgraphs for every rating [19, 15], and (iv) the missed have, any user information, with the exception of their
opportunity to exploit item metadata and KG structure. ratings, then we have ∀∈ .  () = 0⃗. Therefore, given
        </p>
        <p>
          Hence, we first propose a new architecture for induc- an interaction matrix I, a KG  with feature function
tive recommendation using KGs. In our design, we strive  , a user  , and an item  such that I, =0, we model the
for simplicity by adopting the eficiency and expressivity recommendation problem as the problem of predicting
of Graph Neural Networks (GNNs) to aggregate struc- the likelihood of I, =1 if we present the item  to the user
tural information of each node’s neighborhood, but going  . In practice, we model our task as a top-k
recommenbeyond trivial extensions of the GraphSAGE architecture dation problem. Thus, we aim at learning a model Θ to
as well as any other existing inductive method [16, 19, 20, parametrize a transformation function ℱ∶×ℐ↦[0, 1] ,
17, 15, 14, 21] due to its gated architecture that more ef- such that ℱΘ(, ) ≥ ℱ Θ(, ) , imposes a partial order on
fectively extrapolates inductive biases from the semantic ℐ for every user in  , if it is more likely that the user 
and structural information encoded in the CKG. Further- would like  over  than vice versa.
more, by reviewing the experimental evaluation of exist- Finally, in the recommendation setting, we define two
ing works, we have identified problematic methodologies types of users: those for which preferences across some
on which we report here together with our results. Thus, items in ℐ were known when learning Θ, i.e., at training
we propose the Gated Inductive Recommendation (GIn- time, and those for which no item rating was known
Rec), a new architecture that fully exploits the semantic during training, but for which some rating is known at
information of real-world KGs for inductive predictions inference time. We refer to the former as to the
warmin a scalable way. start users   and to the latter as cold-start users   , with
=  ∪   . Transductive methods can only recommend
2. Problem Formulation for users in the warm-start set, while inductive methods
can recommend for users in both sets. Typically, when
a new user joins a platform, it is common practice to
present them with an initial set of items to be rated. Thus,
we consider cold-start users for which, at inference time,
we have some ratings, even though those ratings are
usually few and sparse [
          <xref ref-type="bibr" rid="ref26">18, 23, 20</xref>
          ].
        </p>
        <p>Formally, we define a KG as a directed labeled multigraph
identified by the triple ⟨ , ℛ, ℒ ⟩ , where  is the set of
entities (nodes) in the graph, ℒ, ℒ∩=∅ , is the set of
labels for the relations, and the edges between entities
are represented as ℛ⊆ ×ℒ× . Consider, for instance,
the top-right portion of the example in Figure 1. Here,
nodes represent movies, actors, directors, and a
taxonomy of genres, while edges represent how nodes are
connected, e.g., in the triple (Inception, hasGenre, Heist).</p>
        <p>As common in the literature [22], we split entities into
two sets: the set of recommendable entities ℐ⊂ , being
the entities that the system can recommend to a user (e.g.
movies); and the set of descriptive entities ℰ ⊂ (e.g.,
actors, genres, classes), such that =ℐ∪ℰ  .</p>
        <p>Furthermore, we adopt the concept of a CKG [6], i.e. a
KG augmented with users’ interactions with items, also
shown in Figure 1. Formally, given the set of users  ,
the interaction matrix I∈{0, 1}||×|ℐ | is a matrix of size
| |×|ℐ | , having I, =1 if user ∈ has liked the item</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Related Work</title>
      <p>Most existing recommendation methods either only use
bipartite graphs of user interactions with items [3, 5, 16]
or are transductive [6, 7]. Instead, inductive learning
models generate predictions for unseen nodes by directly
reasoning over the features that describe them, but
existing methods do not exploit KG data. Here, we provide an
overview of inductive methods, detailing their limitations
compared to our proposal (as summarized in Table 1) and
describe the advantages of relational gates.</p>
      <sec id="sec-2-1">
        <title>Inductiveness. GraphSAGE [13] was the first inductive</title>
        <p>GNN capable of eficiently generating embeddings for
unseen nodes by leveraging pretrained node features for Table 1
node classification. It was later expanded to a scalable Related methods, whether they use User Metadata, whether
item-item recommendation method, meaning no explicit they handle Relational information (i.e., KG), the Task they
modeling of user-item ratings [16]. support among (C) Node Classification, (R) Ranking, and (P)</p>
        <p>
          Other methods have been proposed for inductive ma- Rating Prediction, and whether the method constructs a
Subtrix completion by extracting subgraphs around each graph from user-item pairs.
user-item pair to obtain the necessary representations [24, Inductive User
25, 19, 15]. These approaches are designed for the sin- NMGoCdeFl[5] Us7er Ite7m Meta7data Relat7ional TaRsk Subg7raph
gle rating prediction task and not for the ranking task. KGAT [6] 7 7 7 4 R 7
Generating these sub-graphs is prohibitively space- and KKGPRCNN-[L7S] [8] 77 77 77 44 RR 77
time-consuming. Thus, they cannot eficiently produce MeLU [
          <xref ref-type="bibr" rid="ref26">18</xref>
          ] 7 7 4 7 R 7
user-item rankings, since a subgraph is generated for RuleRec [28] 7 7 7 4 R 7
all user-item pairs [20]. Furthermore, these methods LMGGCANT [[39]] 77 77 77 47 RR 77
do not use KG information; thus, they cannot provide GraphSAGE [13] (4) 4 7 7 C 7
predictions for new items with no interactions. There- PBiEnRSTA4GREec[1[62]6] (44) 47 77 77 RR 77
fore, instead of constructing subgraphs, GInRec employs IGMC [19] 4 4 7 7 P 4
subsampling of neighboring nodes to obtain a scalable IDCF [20] 4 7 7 7 P 7
prediction mechanism [16, 13] and uses KGs to gain infor- IPCGPD[1[147]] 47 44 47 77 RR 77
mation about items with few user interactions. Several GIMC [15] 4 4 7 7 P 4
methods exploit user metadata, e.g., gender and age infor- RGeIBnKRCec[27] 44 47 77 44 RP 77
mation [
          <xref ref-type="bibr" rid="ref26 ref30">17, 18</xref>
          ]. Yet, this information is rarely available,
making it impossible to use these methods in practice. rather than the vector level.
        </p>
        <p>Hence, in our method, we assume no user metadata, learn- For KGs, gates have been used to capture long-term
ing instead how to aggregate information. Some methods path relations [7] and for aggregating neighbors for
multiare made for sequential recommendations [26], or can- model graphs [9]. Multi-modal information and relations
not recommend for new users or for new items [14, 27], in KGs difer in both semantic meaning and practical
and in general cannot capture high-order connectivities application, requiring diferent aggregation techniques.
between users, making them less relevant for our study. Therefore, GInRec adopts new relational-specific gates</p>
        <p>
          Additionally, some methods are quasi-inductive since as an addition to the neighborhood aggregation.
they consist of two parts: (1) a transductive part to obtain Thus, GInRec proposes a scalable inductive method for
some initial embeddings and (2) an inductive part where user-personalized recommendation that learns to extract
the method learns to generate embeddings for new users knowledge from a KG using relational gates without
reor items [21, 20]. GInRec is fully inductive since it uses quiring any user metadata.
the extracted node textual features instead and thus does
not need to learn the initial embeddings. Finally, many 4. Methodology
methods target the prediction of a user rating, which
underperforms in the ranking task, even compared to We now present our model Gated Inductive
Recommendanon-personalized methods [
          <xref ref-type="bibr" rid="ref22">2</xref>
          ]. Thus, existing works (Ta- tion (GInRec). The model consists of three components
ble 1) either: (i) create subgraphs, which do not scale in (as shown in Figure 2): (i) embedding layer, where we
the ranking task; (ii) use personal user data, which is al- compress node feature information to create node
emmost never available; or (iii) solve a rating-prediction task beddings (Figure 2 A); (ii) gated propagation layer, which
that ofers sub-optimal performances in practice. There- chooses which information to propagate from the
emfore, we select GraphSAGE [13] and PinSAGE [16] as beddings of neighboring nodes in a CKG to produce a
the only inductive recommenders fitting our recommen- high-order representation of each node and its neighbors
dation setting and select IDCF [20] as a representative (Figure 2 C and E); and (iii) prediction layer, creating a
baseline for quasi-inductive methods. user and an item embedding given all propagation layers,
Gates. Gates were originally used in Recurrent Neu- outputting a ranking score (Figure 2 D). Hence, our
arral Networks (RNNs) to learn long-term dependencies chitecture learns to recommend for users based on their
in time series [29, 30]. A gate limits the amount of in- interactions alone, introducing a gating mechanism to
formation passed by learning a scalar in [0, 1], for each adaptively select information during aggregation along
dimension in a vector. On the contrary, the attention with an autoencoder regularization measure.
mechanism, which is often used in GNN aggregators,
learns a single scalar for the entire vector [6, 8, 16]. Hence,
the gates allow for diferentiation at the dimension level
        </p>
        <sec id="sec-2-1-1">
          <title>4.1. Embedding Layer</title>
          <p>GInRec is designed as an inductive relational graph
neural network. Therefore, given a target node ∈ c, and</p>
          <p>A</p>
          <p>Feature autoencoder</p>
          <p>Encode</p>
          <p>Decode
Extracted features</p>
          <p>X ∈ R|E'|×d</p>
          <p>Encoded features: X' ∈ R|E'|×d'
where AEen:ℛ| ′|× ↦ℛ| ′|× ′, with  ′≪ , is the
encoding function mapping the initial feature vector for each
node to a set of lower dimensionality vectors through
multiple fully connected layers with the Leaky ReLU
activation [35]. Analogously, AEde:ℛ| ′|× ′↦ℛ| ′|× is a
decoding function mapping the lower dimension
embeddings back to the original vectors. Therefore, we produce
a matrix X′ = AEen(X) of low dimensionality embedding
movie plot or biography of the actor. Similar to semi- for the initial nodes in  ′
. Moreover, in our
architecnal works [31], we use Sentence-BERT [32] to process
the textual description of each entity and produce
senture, AE is jointly learned with the final ranking loss (as
described later in subsection 4.4). Thus, X′ provides a
tence embeddings such that sentences of similar semantic
ifne-tuned compressed representation of the extracted
meaning are close to each other in a vector space. For
features. Since the initial embedding has no range limits,
multi-sentence descriptions, the average sentence embed- no activation is used for the final decode layer.
ding is used. When textual descriptions for descriptive
entities are not available, we use ComplEX [33] to train
entity embeddings for the descriptive entity in the KG,
since these are static or very slowly changing. The initial
vector is a concatenation of the textual embedding and
the structural data, e.g., node degree. We standardize
the features by removing the mean and scaling to unit
variance as in other works [13]. Our approach can be
extended to include additional features, such as item
pictures for multi-modal descriptions, but we leave their
study as future work. Thus, defining the initial matrix
as X∈ℝ| ′|× , where the  ’th entity has the embedding of
the  ’th row and users are initialized as zero vectors in the
embedding. As such, we only require a few interactions
to represent users and textual descriptions for items.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>The size of the initial feature vectors are usually large</title>
        <p>(in our model, we have  &gt; 756 ), making subsequent
toEncoder (AE) layer to reduce the dimensionality [34]
(shown in Figure 2 A). The loss of the AE is defined as:
L

= MSE(X, AEde(AEen(X)))</p>
        <sec id="sec-2-2-1">
          <title>4.2. Gated Propagation Layer</title>
          <p>The core of GNNs is the ability to aggregate information
from its neighborhood. Relation types could allow the
model to diferentiate between the relation interactions
and aggregate information dependent upon the
combinations of edges in the CKG. Thus, we explore the efect of
gates in our GNN’s architecture and extend these with
relation-specific weights [ 9, 36]. In the following, we
ifrst describe the individual parts for a single step, i.e.,
relation-specific gating, information propagation, and
aggregation, and then how to generalize the process to
high-order propagations.</p>
          <p>Relation-specific gates:</p>
          <p>We design two relation-specific
gates that control the information flow during message
passing: (i) Inner Product and (ii) Concatenation. The
Inner Product gate uses the inner product of the ℎ and
the afinity between the two. We take into account
different relations (similar to TransR [37]) by first
transforming the entities’ embeddings into a relation-specific
vector space before finding the afinities:
  (ℎ,  , ) =
computations infeasible. Therefore, we introduce an Au-  entities as the gate, making the gate dependent on
(1)
 ((W eℎ)⊤W e ). where  is the sigmoid activation func- 4.3. Prediction
tion, W</p>
          <p>is a relation-specific transformation matrix, and
(ℎ,  , ) ∈  . The Concatenation gate works as the
original reset and update gate mechanisms used by GRU [30].</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Here, we utilize a relation-specific linear transforma</title>
      </sec>
      <sec id="sec-2-4">
        <title>At each layer, information from increasingly distant enti</title>
        <p>ties is aggregated, and we, therefore, have multiple
representations of the entities after  layers of propagation.
tion, which learns which parts of the tail entity’s em- as it is passed to the prediction step. Similar to previous
bedding are important in the aggregation step given
both the head and the tail with ‖ being concatenation
as:   (ℎ,  , )= (W (eℎ‖e )).</p>
      </sec>
      <sec id="sec-2-5">
        <title>Information Propagation: Given the direct neighbors</title>
        <p>apporaches [39, 6, 9], we concatenate the output after
each layer for a user  and item  as:
e∗ = e1‖ … ‖e ,</p>
        <p>e∗ = e1‖ … ‖e
This approach is able to retain information of the
difof entity ℎ as  ℎ={(ℎ,  , )|(ℎ,  , )∈ }</p>
        <p>, also called its ego- ferent representations at all steps. For the final
predicformation propagation is vital for GInRec’s performance, items should be ranked higher than others [9, 6] as:
), where  is an
of training triples with item  being rated higher than
where ℬ
= {(, , )
| I
= 1 and I
≠ 1} is a set
aggregator function. We identify four common aggrega-  and  is the sigmoid function. The final loss
functors used in other architectures, namely: Bi-interaction
tion is a combination of autoencoder loss in Equation 1
aggregator [6], GCN aggregator [39], GraphSAGE ag- and the BPR loss in Equation 5, so to learn an encoded
tion, learned, non-linear similarity measures are usually
outperformed by a simple inner product [40] that also
(2) reduces complexity. Hence, our prediction is computed
as follows:  ̂ =e∗⊤e∗.</p>
        <p />
        <sec id="sec-2-5-1">
          <title>4.4. Optimization</title>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>We use Bayesian Personalized Ranking (BPR) as the collaborative loss function, assuming previously interacted</title>
        <p>L
=</p>
        <p>∑
(,,)∈ℬ
−ln  ( ̂ 
−  ̂ )
(4)
(5)
network [38], we can define the neighborhood
aggregation vector of ℎ as:
e ℎ =</p>
        <p>1
| ℎ| (ℎ,,)∈ ℎ
∑
(ℎ,  , ) e .</p>
        <p>In contrast to other gated networks [36, 9], our model’s
gates are relation-specific, allowing it to propagate
different information from diferent parts of an entity’s
embedding based on the relation to it. This fine-grained
inespecially when not relying on the initial user features.
e ℎ, formally defined as</p>
        <p>e′ℎ= ( eℎ, e ℎ</p>
      </sec>
      <sec id="sec-2-7">
        <title>Aggregation: The final part combines an entity’s cur</title>
        <p>rent embedding eℎ with the aggregated ego embedding
gregator [13], and LightGCN aggregator [3], finding the</p>
      </sec>
      <sec id="sec-2-8">
        <title>LightGCN aggregator to be the best performing through</title>
        <p>hyperparameter tuning. The LightGCN aggregator can
be defined as  
tions or non-linear activations.</p>
        <p>= e ℎ, not having any
transformaHigh-order propagation: To propagate information
from n-hop neighbors and utilize high-order connectivity
information, we stack the model in layers [6, 9, 13]. As
illustrated by the arrow from A to B in Figure 2, we use the
output of the embedding layer X′ ∈ ℝ| |×
′ as the initial
embedding in the propagation layers. We thus define
the next representation layer  + 1 , recursively using the
previous layer  and the neighborhood representation as:
The weight matrices in the aggregator functions are in
ℎ
∈ℝ (−1) ×2 (−1) depending on whether they refer to
the   or   gate, respectively.</p>
        <p>(3)
ℎ
(−1) ,
embedding suitable for recommendation while
containing enough information to reconstruct, computed as:
L = L
{W( ′), W( ′)

+  L

+  ‖Θ‖ 22 with Θ={W()</p>
        <p>, W() |∀∈{1, ..., }}∪
|∀ ′∈{1, ...,   }} is the set of learnable
parameters,  is a parameter for tuning the  2 regularization,
and  is a parameter to tune the autoencoder loss. The
autoencoder loss also works as a regularizer while also
being recommender-specific. Generating embeddings for
all nodes, with MovieLens Subsampled (ML-S) shown in
Table 2 took 0.48s, and ranking items for all users took
0.078s, compared to PinSAGE’s 0.446s and 0.966s, with a</p>
      </sec>
      <sec id="sec-2-9">
        <title>RTX 2070 Super and Intel i9-9900, averaged over 5 runs.</title>
        <p>Training: We use mini-batch training sampled from ℬ
of size 1024, limiting the computation graph by having
a fixed size ego-network of 10 and starting construction
from the last layer [13]. The entities used in the first layer
of the gated propagation are used for the autoencoder
loss, such that we learn to represent not only users and
items but also entities like genres and actors.</p>
        <p>Scalability. In our embedding approach, both the
calculation of the aggregation (eℎ+1 ) and of the prediction
( )̂ are all bounded by the number of nodes in the graph,
while the calculation of the ego-network (e(+1) ) is bounded
by the number of edges. As these steps arℎe applied
sequentially and | |≪|ℰ | , we know that the complexity of
our method is bounded by the ego-network aggregation
complexity, more specifically, the linear transformation
of the gate calculation. When naïvely applying the gates
over all edges, the complexity is (|ℰ |) , where  is the
largest dimension utilized during graph convolutions
– we note that |ℰ | is bounded by (| | 2|ℛ|). Yet, as
W (eℎ‖e ) is equivalent to W1eℎ + W2
 e we only need to
compute the transformation for each unique (ℎ,  ) and
( , ) pair instead of each unique (ℎ,  , ) triple. Therefore,
we can apply a MapReduce computation [16] to have
at most 2| ||ℛ| calculations, leading to the
complexity (| ||ℛ|)≪(|ℰ |) . Finally, our prediction is a dot
product after graph convolutions; hence, our method can
predict in (| e∗| ⋅ |ℐ |) for a single user as the vector dot
product complexity is (| e∗|), which we do |ℐ | number
of times, which is less than existing architectures with
comparable approaches, e.g., PinSAGE.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Experiments</title>
      <p>Inductive approaches are designed to provide
recommendations in a cold-start setting, where ratings for new
users are only known at inference time. Yet, as we will
show, these baselines do not perform in this setting due
to poor selection of learning metrics, evaluation
methodologies, or other complexities. In the following, we aim at
answering the questions: RQ1) Which design decisions
afect the prediction performance compared to state of
the art? RQ2) What is the efect of the negative sampling
strategy in the evaluation? RQ3) How does relational
gates afect performance?, and finally RQ4) How do the
structure and data of the KG afect performance?
defined in Section 4, we sampled 12,500 users from
ML20m and 60,000 users from AB for training, named ML-S
and Amazon-Book Subsampled (AB-S), respectively. We
sample more users for AB-S due to few ratings per user.
We then created two cold-start scenarios on ML-S: one
adding 10% new users (i.e., 1250); and one where we treat
all users not in ML-S as cold-start users, being ∼90% of
the users in the original ML-20m dataset, allowing us to
test the scalability of the inductive methods. For AB-S,
we create one scenario adding the remaining users from
the original dataset, corresponding to an additional ∼15%
of the total number of users.</p>
      <p>
        Methods. We compare to five methods: TopPop [
        <xref ref-type="bibr" rid="ref22">2</xref>
        ],
a non-personalized common baseline [10] that
recommends the most popular items; GraphSAGE [13],
modified to recommend using cosine similarity between a
user’s rated items and new items; PinSAGE[16], with a
semi-supervised objective, i.e., items co-rated should be
similar, analogous to the pin / board setting; IDCF [20], a
two-step learning method, using key user embeddings to
initialize new users; and BPR-MF [4], which we report as
a reference transductive method retrained on the dataset
including also the cold-start users, since it is fast to train
and is competitive to state-of-the-art methods without
requiring sequential data and shown to outperform the
standard kNN method.
      </p>
      <p>All models are implemented in PyTorch and optimized
using the Adam optimizer . We save the best-performing
state based on the validation set and stop after 50
successive epochs without improvement. For hyperparameter
tuning of all models, we employ Asynchronous
Successive Halving (ASHA) [44], all hyperparameter options
and ASHA parameters available in our source code. We
know that compared to PinSAGE, GInRec has only one
extra hyperparameter, in the form of  , for which we
tune, as in subsection 4.4.</p>
      <sec id="sec-3-1">
        <title>Evaluation metrics. Following other evaluation meth</title>
        <p>
          ods [6], for each user in the test set, we rank all items not
interacted with in the train and validation sets, only
treating ratings in the test set as positive items. We measure
Datasets. We adopt two real-world datasets: (i) MovieLens- NDCG, recall, precision, and coverage at 20 for each user,
20m (ML-20m) [41], a dataset with ratings on movies and reporting the average performance over all users. Let ℐ
(ii) Amazon-Book (2014) (AB) [42], a dataset with reviews to be the top-k items recommended to a user  , then we
foonrebouoskest.hNeeMitihnedrRdeaatdaesretKhGas[2a2n]afsosrotchiaeteMdLK-2G0.mWdeatthaesreet-, cHaenndceefin,ae ncoaviveerargeecoams: me@nd=er as T|⋃op∈Popℐis|/ex|ℐpe|.cted to
and for the AB dataset the KG constructed to evaluate perform poorly due to recommending the same set of
KGAT [6]. These two graphs link reviewed items to items each time [45]. We further include I-NDCG [20]
nodes in popular open-domain KGs such as DBpedia and which is the metric used to evaluate IDCF in the original
WikiData. In both cases, we keep only items mapped to work, where X negative items (X=5 in IDCF) are sampled
the KG leading to the statistics shown in Table 2. We per positive item in the test set instead of all possible
adopt splitting ratios 0.8∶0.1∶0.1 for train, validation, and negative items as for NDCG. For all metrics, we remove
test sets, respectively. We note that diferent versions items seen by the user during training from the set of
of the AB dataset exist, and results cannot necessarily negative samples. We use ‘*’ to represent a statistically
be directly compared between related works and our significant increase in performance using student t-test.
dataset [20, 6, 3, 5, 43]. In our cold-start experiments, as
RQ1 As Table 3 shows, GInRec is able to outperform than TopPop on both ML-S dataset with 1250 new users
all methods on all metrics with statistical significance. and the AB-S dataset, its performance decreases when
We also see contrasting results w.r.t. the original IDCF adding a large number of users to the ML-S dataset. We
evaluation. IDCF was originally evaluated in a ranking ifnd our method to have similar increase in performance
setting; however, the learned embeddings of IDCF are over all k’s in set {1, 5, 10, 20, 50}, but these results are not
learned towards Cross-Entropy, a non-ranking, point- included here due to space constraints.
wise learning objective. Such learning methodologies RQ2. When evaluating ranking performance, NDCG
have been shown to perform poorly, with a similar or is the metric commonly adopted, but there are two
alworse ranking than TopPop [
          <xref ref-type="bibr" rid="ref22">2, 46</xref>
          ]. Yet, IDCF uses the ternatives on which set of items to rank: either rank
Cross-Entropy loss [47]. In the original work, IDCF out- all items in the dataset or just a subset. In other
evalperforms PinSAGE by a small margin, yet we observe the uations [
          <xref ref-type="bibr" rid="ref22">20, 1, 2</xref>
          ], instead of ranking all items, only 
opposite in our evaluation. PinSAGE’s increased perfor- negative items are randomly sampled per each positive.
mance in our evaluation is due to (i) a more appropriate While this would aim at making it equally hard to rank
early stopping based on the evaluation metric instead of positive items for all test users, it has been proven to
the loss function and (ii) our evaluation adopting a better produce unreliable comparisons of performances across
learning objective for PinSAGE. methods [47]. This has also been witnessed when other
        </p>
        <p>Figure 3 also shows GInRec outperforms all models in works re-evaluated BERT4Rec [26], which also utilized
diferent user splits, we leave out the result on AB-S for negative subsampling, though using popularity-biased
brevity, noting we get similar results. We demonstrate sampling instead of uniform sampling. Also in this
verthat using KG information and relational gates provides sion, negative sampling leads to unreliable results [48]
superior predictive power in all cases; given the improved ifnding even simple baselines outperforming this
stateperformance over all popularity and sparsity groups. of-the-art method. Yet, in the original IDCF evaluation,</p>
        <p>To study the method’s ability to make personalized this faulty method (here labeled I-NDCG) is adopted. Thus,
recommendations, we utilize coverage [45]. We note hard-to-rank items are often missing from the evaluation
that we are not able to perform statistical significance when subsampling negative items, and thus I-NDCG does
testing with coverage as we only generate a single score not test the actual performance of the method as if all
per dataset instead of one score per user. The metric is possible negative items are available. This presents an
not useful by itself; a random model would have close to issue when considering the experimental evaluation of
1 in coverage. Having higher coverage but a far lower previous works. Therefore, here we once more compare
ranking indicates more random recommendations while the two evaluation techniques: (a) ranking all items, (b)
having low coverage but a high ranking score means subsampling negative items, and verify once more that
low personalization and a popularity-biased dataset. In the latter methodology should be avoided since it produces
Table 4, we can see a clear improvement over all other biased results. In Table 3, we see GraphSAGE
outpermethods. Only GraphSAGE gets higher performance, yet, forms IDCF on I-NDCG in ML-S+1250, though clearly
it is unable to make high-quality recommendations. Its performing worse in the appropriate NDCG@20. Hence,
performance is therefore more random. IDCF performs this negative subsampling (I-NDCG) unfairly favors
Toppoorly on all datasets w.r.t. coverage; this is correlated
as to why it performs better on NDCG-I compared to Table 4
NDCG as we will discuss later. We also test with the Gini Coverage at 20 for all datasets.</p>
        <p>Coeficient on the ML-S+1250 dataset, where 0 would be ML-S + 1250 ML-S + 90% AB-S + 10%
an equal (uniform) distribution, and 1 is unequal. Here TBoPpRP-oMpF 00.0.071553847 00..0023222127 00..0000612245
GInRec get 0.959, BPR-MF 0.989, PinSAGE 0.991, Top- GraphSAGE 0.06307 0.31471 0.14243
Pop 0.993, and random having 0.241. Thus using this PinSAGE 0.02857 0.04169 0.12439
metric GInRec gets ≥3% more diverse distribution over IDCF 0.01566 0.02201 0.04348
PinSAGE and BPR-MF. While BPR-MF performs better GInRec 0.18540 0.42646 0.21460
Pop even above IDCF and IDCF over other methods. In- increase when using the semantic information carried by
stead, when appropriately considering all items (NDCG) KG, both over the bipartite model, but also related works.
as recommended [47], then GInRec, and other methods, Summarised, our gated aggregators can exploit the
perform up to 3x times better than TopPop. relational information, as either removing the KG or the
RQ3 &amp; RQ4. The method’s results with diferent gat- relational information lead to a decrease in performance.
ing mechanisms can be seen in Table 5. In the table, When adding many users, we even see a large decrease
‘w/o relation’ is the gating mechanism without relation in performance for the bipartite method, illustrating the
type, i.e., efectively ignoring edge types, and ‘w/o gates’ scalability of our method and gated aggregation.
is the method without the gating mechanism. Overall,
the gating mechanism improves performance, as it adap- 6. Conclusion and future work
tively selects information from neighboring nodes, and In this work, we devise a scalable gated GNN architecture
it outperforms the two other models in all metrics. Dis- to perform inductive recommendation, with the ability
regarding relation types leads to worse performance on to utilize high-order connectivities in CKGs. We show
all datasets, and completely removing the gates leads that our method outperforms existing approaches.
Furto dramatically lower performance. Thus, it is vital to ther, we showcase methodological limitations in previous
design models that can exploit the semantic information evaluations. We conclude that this kind of architecture
modeled by KGs. The Inner Product gate scales its neigh- deserves further study, especially given its ability to: (1)
bors’ embeddings instead of selecting diferent parts as scale to large graphs and large numbers of users (easily
the Concatenate gate and thus limits the flow from cer- extensible to distributed frameworks), and (2) maintain
tain nodes. Yet, it is not able to select which part of the good prediction with new users and items despite its
neighbor’s embeddings to propagate, thereby achieving lightweight inference methodology.
a similar performance to the ’w/o gates’ method. Having
the ability to limit flow for each dimension of the
neighbor’s embedding is, therefore crucial for our method. Acknowledgments
GInRec without gates is worse than PinSAGE, though
still better than IDCF. Hence, only using the user’s inter- Matteo Lissandrini is supported by the European Union’s
actions without a gating mechanism is still better than Horizon 2020 Research and Innovation Programme under
the reconstruction used in IDCF. Even without relation the Marie Skłodowska-Curie grant agreement no. 838216.
types, we see a large and statistically significant increase Katja Hose and Theis Jendal are supported by the Poul
in performance. When looking at Figure 3, we see in Due Jensen Foundation and the Independent Research
all cases that GInRec performs better than the bipartite Fund Denmark (DFF) under grant agreement no.
DFFversion. In the first bin of all plots (i.e., the bins with 8048- 00051B.
fewer or less popular ratings), we see a large performance</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>networks for web-scale recommender systems</article-title>
          , in: [1]
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          , T.-S. Chua, SIGKDD'
          <volume>18</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>Neural collaborative filtering</article-title>
          , in: TheWebConf'
          <fpage>17</fpage>
          , [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Wu, H. Ma,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          2017.
          <article-title>Privileged graph distillation for cold start recom</article-title>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cremonesi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Turrin</surname>
          </string-name>
          ,
          <article-title>Performance of mendation</article-title>
          ,
          <source>in: SIGIR'21</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>recommender algorithms on top-n recommenda-</article-title>
          [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Im</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chung</surname>
          </string-name>
          , Melu:
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>tion tasks</article-title>
          ,
          <source>in: RecSys'10</source>
          ,
          <year>2010</year>
          .
          <article-title>Meta-learned user preference estimator for cold[3</article-title>
          ]
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , M. Wang, start recommendation,
          <source>in: SIGKDD'19</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>Lightgcn: Simplifying and powering graph convo-</article-title>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Y. Chen, Inductive matrix completion
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>lution network for recommendation, in: SIGIR'20, based on graph neural networks</article-title>
          ,
          <source>in: ICLR'19</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <year>2020</year>
          . [20]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zha</surname>
          </string-name>
          , Towards [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rendle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Freudenthaler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gantner</surname>
          </string-name>
          , L. Schmidt
          <string-name>
            <surname>-</surname>
          </string-name>
          open-world
          <source>recommendation: An inductive model-</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Thieme</surname>
          </string-name>
          , Bpr:
          <article-title>Bayesian personalized ranking from based collaborative filtering approach</article-title>
          , in: ICML'
          <fpage>21</fpage>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>implicit feedback</article-title>
          ,
          <source>in: UAI'09</source>
          ,
          <year>2009</year>
          .
          <year>2021</year>
          . [5]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Feng</surname>
          </string-name>
          , T.-S. Chua, Neu- [21]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tao</surname>
          </string-name>
          , X. Cheng, INMO: A
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <article-title>ral graph collaborative filtering</article-title>
          ,
          <source>in: SIGIR'19</source>
          ,
          <year>2019</year>
          .
          <article-title>model-agnostic and scalable module for inductive [6</article-title>
          ]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          , M. Liu, T.-S. Chua,
          <article-title>Kgat: collaborative filtering</article-title>
          ,
          <source>in: SIGIR'22</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <article-title>Knowledge graph attention network for recommen-</article-title>
          [22]
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Brams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Jakobsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. E.</given-names>
            <surname>Jendal</surname>
          </string-name>
          , M. Lissan-
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          dation, in: SIGKDD'
          <fpage>19</fpage>
          ,
          <year>2019</year>
          . drini, P. Dolog,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          , Mindreader: Recommen[7]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          , T.-S. Chua,
          <article-title>dation over knowledge graph entities with explicit</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <article-title>Explainable reasoning over knowledge graphs for user ratings</article-title>
          ,
          <source>in: CIKM'20</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          recommendation, in: AAAI'
          <fpage>19</fpage>
          ,
          <year>2019</year>
          . [23]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zha</surname>
          </string-name>
          , Functional matrix [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Leskovec,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Zhao, factorizations for cold-start recommendation</article-title>
          , in:
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Knowledge-aware graph neural</article-title>
          <source>SIGIR'11</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <article-title>networks with label smoothness regularization for</article-title>
          [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , Speedup matrix comple-
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <article-title>recommender systems</article-title>
          ,
          <source>in: SIGKDD'19</source>
          ,
          <year>2019</year>
          .
          <article-title>tion with side information: Application to multi</article-title>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          , T.-S. label learning,
          <source>in: NeurIPS'13</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Chua</surname>
            , Mgat: Multimodal graph attention network [25]
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>I. S.</given-names>
          </string-name>
          <string-name>
            <surname>Dhillon</surname>
          </string-name>
          , Provable inductive matrix
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>for recommendation, Information Processing &amp; completion, arXiv preprint arXiv:1306.0626</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Management</surname>
          </string-name>
          (
          <year>2020</year>
          ). [26]
          <string-name>
            <given-names>F.</given-names>
            <surname>Sun</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ou</surname>
          </string-name>
          , P. Jiang, [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Palumbo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Monti</surname>
          </string-name>
          , G. Rizzo,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Baralis,</surname>
          </string-name>
          <article-title>Bert4rec: Sequential recommendation with bidirec-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>entity2rec: Property-specific knowledge graph em- tional encoder representations from transformer,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <article-title>beddings for item recommendation</article-title>
          ,
          <source>Expert Systems in: CIKM'19</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>with Applications</surname>
          </string-name>
          (
          <year>2020</year>
          ). [27]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Nian</surname>
          </string-name>
          , Person[11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          <article-title>Xie, alized recommendation system based on knowledge</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <article-title>on the knowledge graph for recommender systems</article-title>
          , (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>in: CIKM'18</source>
          ,
          <year>2018</year>
          . [28]
          <string-name>
            <given-names>W.</given-names>
            <surname>Ma</surname>
          </string-name>
          , M. Zhang,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          , Y. Liu, [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dong</surname>
          </string-name>
          , Hagerec: hierarchical atten- S. Ma,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <article-title>Jointly learning explainable rules</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <article-title>knowledge graph for explainable recommendation</article-title>
          ,
          <source>TheWebConf'19</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Knowledge-Based Systems</surname>
          </string-name>
          (
          <year>2020</year>
          ). [29]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <string-name>
            <surname>Long</surname>
            short-term [13]
            <given-names>W. L.</given-names>
          </string-name>
          <string-name>
            <surname>Hamilton</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Ying</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Leskovec</surname>
          </string-name>
          , Induc- memory, Neural computation (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <article-title>tive representation learning on large graphs</article-title>
          , in: [30]
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. van Merriënboer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gulcehre</surname>
          </string-name>
          , D. Bah-
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>NeurIPS'17</source>
          ,
          <year>2017</year>
          . danau,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bougares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schwenk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , Learn[14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          , H. Chen,
          <article-title>ing phrase representations using RNN encoder-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <article-title>tion learning for personalization</article-title>
          ,
          <source>ACM Transac- EMNLP'14</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <source>tions on Information Systems</source>
          (
          <year>2021</year>
          ). [31]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          , I. Ounis,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Meng</surname>
          </string-name>
          , A hetero[15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Xu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <article-title>Geomet- geneous graph neural model for cold-start recom-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <article-title>ric inductive matrix completion: A hyperbolic ap- mendation</article-title>
          , in: SIGIR'
          <fpage>20</fpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <article-title>proach with unified message passing</article-title>
          , in: WSDM'
          <fpage>22</fpage>
          , [32]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          , Sentence-bert: Sen-
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          2022.
          <article-title>tence embeddings using siamese bert-networks</article-title>
          , in: [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ying</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Eksombatchai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. L.</given-names>
            <surname>EMNLP-IJCNLP</surname>
          </string-name>
          '
          <fpage>19</fpage>
          ,
          <year>2019</year>
          . [33]
          <string-name>
            <given-names>T.</given-names>
            <surname>Trouillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Welbl</surname>
          </string-name>
          , S. Riedel, É. Gaussier,
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          prediction, in: ICML'
          <fpage>16</fpage>
          ,
          <year>2016</year>
          . [34]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Kramer</surname>
          </string-name>
          , Nonlinear principal component anal-
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>journal</surname>
          </string-name>
          (
          <year>1991</year>
          ). [35]
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Hahnloser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sarpeshkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Mahowald</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <article-title>silicon circuit</article-title>
          ,
          <source>Nature</source>
          <volume>405</volume>
          (
          <year>2000</year>
          ). [36]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tarlow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brockschmidt</surname>
          </string-name>
          , R. S. Zemel,
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <article-title>Gated graph sequence neural networks</article-title>
          , in: ICLR'
          <fpage>16</fpage>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <year>2016</year>
          . [37]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , Learning en-
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          completion, in: AAAI'
          <fpage>15</fpage>
          ,
          <year>2015</year>
          . [38]
          <string-name>
            <given-names>J.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          , H. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          learning,
          <source>in: SIGKDD'18</source>
          ,
          <year>2018</year>
          . [39]
          <string-name>
            <given-names>T. N.</given-names>
            <surname>Kipf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          , Semi-supervised classifica-
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <article-title>tion with graph convolutional networks (</article-title>
          <year>2017</year>
          ). [40]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rendle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Krichene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J.
          <string-name>
            <surname>Anderson</surname>
          </string-name>
          , Neu-
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          revisited, in: RecSys'
          <fpage>20</fpage>
          ,
          <year>2020</year>
          . [41]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Harper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          , The movielens datasets:
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <source>tive intelligent systems (tiis)</source>
          (
          <year>2015</year>
          ). [42]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. McAuley</surname>
          </string-name>
          , Justifying recommendations
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          aspects, in: EMNLP-IJCNLP'
          <fpage>19</fpage>
          ,
          <year>2019</year>
          . [43]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ma</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>G.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <article-title>recommender systems</article-title>
          ,
          <source>in: SIGIR'22</source>
          ,
          <year>2022</year>
          . [44]
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jamieson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rostamizadeh</surname>
          </string-name>
          , E. Gonina,
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <string-name>
            <surname>hyperparameter tuning</surname>
          </string-name>
          (
          <year>2018</year>
          ). [45]
          <string-name>
            <given-names>G.</given-names>
            <surname>Adomavicius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kwon</surname>
          </string-name>
          , Improving aggregate rec-
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          <string-name>
            <surname>niques</surname>
          </string-name>
          ,
          <source>IEEE Trans. Knowl. Data Eng</source>
          . (
          <year>2012</year>
          ). [46]
          <string-name>
            <surname>J. Wu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Fu</surname>
          </string-name>
          , T. Qiu,
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          <source>arXiv:2201.02327</source>
          (
          <year>2022</year>
          ). [47]
          <string-name>
            <given-names>W.</given-names>
            <surname>Krichene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rendle</surname>
          </string-name>
          ,
          <article-title>On sampled metrics for</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          <article-title>item recommendation</article-title>
          ,
          <source>in: SIGKDD'20</source>
          ,
          <year>2020</year>
          . [48]
          <string-name>
            <given-names>S.</given-names>
            <surname>Latifi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferraro</surname>
          </string-name>
          , Sequential rec-
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          <article-title>neighbors and sampled metrics</article-title>
          ,
          <source>Inf. Sci</source>
          .
          <volume>609</volume>
          (
          <year>2022</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>