Reshaping Graph Recommendation with Edge Graph
Collaborative Filtering and Customer Reviews
Vito Walter Anelli1 , Yashar Deldjoo1 , Tommaso Di Noia1 , Eugenio Di Sciascio1 ,
Antonio Ferrara1 , Daniele Malitesta1,* and Claudio Pomo1,*
1
    Politecnico di Bari, via Orabona, 4, 70126 Bari, Italy


                                          Abstract
                                          Graph collaborative filtering approaches learn refined users’ and items’ node representations by iteratively aggregating the
                                          informative content (called messages) coming from neighbor nodes into each ego node. Unfortunately, not all interactions
                                          (i.e., graph edges) may be equally important to the users and items involved. As this indiscriminate message aggregation leads
                                          to multi-hop representation errors, recent strategies have used attention mechanisms to weight neighbors’ importance to the
                                          ego node. Despite their success, such solutions seem to disregard the potentially critical impact users’ reviews may play on
                                          this weighting process. Reviews convey the multi-faceted user’s opinion about items and provide a fundamental tool to group
                                          like-minded customers. In this work, we first formally show the causes of node error representation in graph collaborative
                                          filtering and demonstrate how existing neighborhood weighting procedures (e.g., attention mechanisms) may alleviate the
                                          issue at the expense of limited hop exploration. Second, we correct the representation error through an additional graph
                                          network where we enrich graph edge embeddings through opinion-aware review embeddings to smooth each neighbor
                                          node’s importance on its ego node. We call our solution Edge Graph Collaborative Filtering (EGCF). Extensive experiments on
                                          three e-commerce datasets show that EGCF competes successfully with traditional, graph- and review-based approaches on
                                          accuracy and beyond-accuracy objectives, while a study on the number of explored hops justifies the adopted configuration
                                          for EGCF. Code and datasets are available at: https://github.com/sisinflab/Edge-Graph-Collaborative-Filtering.

                                          Keywords
                                          Collaborative Filtering, Recommendation, Graph Convolutional Networks, Reviews


                                                                                                                                         𝑖!                             𝑢"
1. Introduction                                                                                                                                "They were too
                                                                                                               "Very comfortable. They         narrow and hurt my

Recommender systems constitute the backbone of several                                                𝑢!       also wear well for an           feet so I returned
                                                                                                               active lifestyle. Love          them."

online platforms (e.g., Amazon), offering consumers lists                                                      them."

of products that might meet their needs and tastes. Rec-
ommendation algorithms are traditionally designed and
                                                                                                              "Nothing really wrong
                                                                                                              with the belt just wider   𝑖"                             𝑢#
                                                                                                              and thicker than I like.
trained to find preference patterns in user-item recorded                                                     Good quality."                  "Great belt, nice color
                                                                                                                                              and holding up very
interactions. Optionally, this learning process may be                                                                                        well"

enriched through additional informative data constantly
                                                                                                   Figure 1: A subset of users, items, and reviews users wrote
updated on those platforms, which may captivate cus-                                               about items, along with the expressed ratings (in the range
tomer’s attention towards items’ characteristics (e.g.,                                            1-5). Despite being connected to the same items, users 𝑢1 -
product images) or provide a tool to share opinions about                                          𝑢2 , and users 𝑢1 -𝑢3 do not share similar opinions about the
purchased items to guide other customers during their                                              interacted items.
decision-making process (e.g., reviews).
   Collaborative filtering (CF) [1], one of the most promi-
nent recommendation paradigms in recent years, pro-                                                    combines these embeddings linearly (e.g., inner prod-
motes the intuition of similar users interacting with sim-                                             uct [2]) or non-linearly (e.g., neural networks [3] and
ilar items. CF-based models usually map users and items                                                probabilistic models [4]). While focusing on improving
to embeddings in the latent space, and learn to predict                                                the user-item prediction step, such techniques have long
user interactions by optimizing an objective function that                                             underestimated the importance of deriving informative
DL4SR’22: Workshop on Deep Learning for Search and Recommen- features to describe users and items suitably.
dation, co-located with the 31st ACM International Conference on                                         Recently, graph convolutional networks (GCNs) [5]
Information and Knowledge Management (CIKM), October 17-21, 2022, have taken over CF-based recommendation, thanks to
Atlanta, USA                                                                                           their capability of mining user-item high-order relation-
*
  Authors are listed in alphabetical order. Corresponding authors:
  Daniele Malitesta and Claudio Pomo.
                                                                                                       ships. Unlike prior techniques, these models explicitly
$ daniele.malitesta@poliba.it (D. Malitesta);                                                          incorporate user and item relationships into their embed-
claudio.pomo@poliba.it (C. Pomo)                                                                       ding representations. Concretely, the embedding of each
          © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
          Attribution 4.0 International (CC BY 4.0).                                                   node (defined as ego node) is refined by aggregating its
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
neighbors’ node embeddings (i.e., whose contribution is         novel and diverse items from the catalog. In this work,
called messages). This step is repeated iteratively to propa-   we first formally define the problem of nodes’ represen-
gate the collaborative signal over multiple hops. These         tation error in graph collaborative filtering. After that,
models are becoming the de facto standard in personal-          we show how existing weighting techniques (such as
ized recommendation, reaching remarkable recommen-              attention mechanisms) may alleviate the described issue
dation performance as in the pioneer works presented            at the expense of limiting the hop exploration depth to
in [6, 7], and more recently, in the solutions [8, 9, 10].      reduce the effect of oversmoothing. Thus, to address
    The message-passing pattern, by design, may still           such drawback, we propose a lighter-weighting proce-
present some limitations despite being successful. An           dure that exploits the informative content extracted from
argument could be made that not all user-item interac-          reviews (i.e., opinions and comments about interacted
tions (i.e., graph edges) have the same relative impor-         items) to enhance graph edge representation. Such edge-
tance. To clarify this, consider the motivating scenario        enriched features are eventually used to derive the sim-
in Figure 1, where we depict a subset of users and items        ilarity between the ego node and its neighbors, which
from a real-world e-commerce platform (i.e., the Amazon         we re-interpret as the importance of the neighbor node
catalog) and enrich their interactions with ratings and         on the ego node. Our proposed weighting procedure is
reviews. Both user 𝑢1 and 𝑢2 interacted with item 𝑖1 ,          applied to a GCN acting as the correction to another tra-
thus inferring that they might share similar interests and      ditional (but error-affected) GCN. We call our solution
preferences. However, careful analysis of the correspond-       Edge Graph Collaborative Filtering (EGCF).
ing reviews reveals that their opinions about item 𝑖1              After formalizing the theoretical basis for EGCF and its
are opposite (the expressed ratings are 5 and 2, respec-        rationale, we assess its efficacy on three popular product
tively). Following a similar reasoning schema, users 𝑢1         categories from the Amazon catalog [18]. Given their
and 𝑢3 have both interacted with item 𝑖2 but their com-         similar intuitions and rationale to EGCF, we compare the
ments, while being generally similar (the item is rated 3       method with four families of CF-based recommendation,
and 5, respectively), show slight shades of disagreement        i.e., traditional, review-based [19, 20], and graph-based
(i.e., 𝑢1 is not completely satisfied with the belt size). As   approaches (both leveraging attention mechanisms and
the message-passing pattern works by indiscriminately           not). We seek to answer these research questions about
aggregating the neighbor nodes at multiple hops, the            our proposed approach:
node representation of 𝑢1 is ultimately influenced by the            • RQ1. Can the correction to the node error rep-
representations of both 𝑢2 and 𝑢3 after two propagation                resentation help EGCF produce more accurate
hops. In the long term, such behavior may lead to what                 recommendations than state-of-the-art baselines?
we could define as a node representation error.
    Weighting the importance of neighborhood while ag-               • RQ2. Considering the high impact that novel and
gregating the incoming messages into the ego node                      diverse recommendation lists may have on both
is among the prominent solutions to the abovemen-                      users and companies, how effective is EGCF when
tioned issue. Following the same direction path in [11],               evaluated on beyond-accuracy metrics, given its
other popular and recent works in recommendation                       strategy for neighborhood exploration?
such as [12, 13, 14, 15] leverage attention mechanisms               • RQ3. What is the effect of changing the hop
(i.e., a neural network) to perform the weighting proce-               exploration number on recommendation perfor-
dure. Even if these models have widely demonstrated                    mance, and how can we justify such behaviors
to provide superior accuracy recommendation perfor-                    for the adopted architecture?
mance, they are still affected by oversmoothing, the
phenomenon according to which node embedded rep-                   The extensive experimental evaluation shows that the
resentations tend to get closer and closer in the latent        correction to the node representation error and the pos-
space after multiple propagation hops, thus flattening          sibility of propagating messages across multiple hops
the existing differences in the neighborhood [16, 17]. For      permits EGCF to outperform state-of-the-art baselines
this reason, attention-based approaches usually propa-          on accuracy and beyond-accuracy metrics. Finally, the
gate messages for only one or two hops, but this does           study on the hop propagation number proves the sound-
not help access wider portions of the user-item graph.          ness of our proposed architectural configuration while
    In this respect, we believe attention-based techniques      shading interesting direction paths for future work.
generally disregard other potential sources of informa-
tion (e.g., users’ generated reviews) whose contribution        2. Related Work
may positively impact the neighborhood weighting pro-
cess. Opinions and comments about interacted items              Graph-based recommendation. The approach pro-
constitute the basis on which like-minded users gather          posed in [21] is the first attempt to address the recom-
on online platforms, as they promote the discovery of           mendation task through a graph-based architecture. The
authors implement a graph autoencoder that labels its          representing them through the extracted embeddings.
edges with users’ ratings to perform link prediction. Ying     Review-based recommendation. Reviews convey a
et al. [6] design a graph convolutional network for a web-     rich source of information to access users’ multi-faceted
scale recommendation to produce high-quality image             opinions about interacted items. For this reason, sev-
recommendations for the Pinterest platform, efficiently        eral existing works propose to extract valuable knowl-
exploiting random walk and item’s multimodal side in-          edge from them to produce better-tailored recommen-
formation. Wang et al. [7] present neural graph collabo-       dations [19, 20]. Among the pioneer works, Wang et al.
rative filtering (NGCF), whose propagation layer aggre-        [29] adopt a stacked denoising autoencoder to approxi-
gates the messages from the neighborhood considering           mate the user-item rating matrix starting from textual re-
the similarity between each neighbor node and its ego          views, Almahairi et al. [30] introduce two neural network-
node. While providing higher performance to previous           based approaches built upon bag-of-words and recurrent
state-of-the-art solutions, NGCF (and GCN more gener-          neural networks, and Kim et al. [31] present convolu-
ally) show limitations later addressed by He et al. [8].       tional matrix factorization (ConvMF), where a convolu-
Their idea is to lighten GCN’s traditional layer structure     tional neural network is merged with probabilistic matrix
and reach superior accuracy performance by removing            factorization to learn the context of review documents.
non-linearities and node embedding transformation in              Reviews are textual documents composed of words,
the propagation layer (LightGCN). The latest approaches        which may further be grouped into sentences. To exploit
try to take a step further to the LightGCN strategy by         such hierarchical structure, Zheng et al. [32] design a
allowing theoretically unlimited propagation layers [9]        convolutional neural network on top of a factorization
and revisiting the concept of graph convolution for rec-       machine prediction model to extract from review’s words
ommendation and node embedding smoothness under                a unique embedded representation for users and items.
the lens of graph signal processing [10].                      The adoption of attention mechanisms may help refine
   While aggregating messages from neighbor nodes into         each review component’s importance on the recommen-
the ego node, not all received contributions have the          dation profile of users and items. In this respect, Liu
same importance. The pioneering work by Velickovic             et al. [33] improve the previous approach by weighting
et al. [11], called graph attention network (GAT), takes ad-   the importance of convolutionally-embedded reviews for
vantage of attention mechanisms to weight the different        both users and items for the sake of explanation. Simi-
influences of neighbor nodes on the ego node. Inspired         larly, Lu et al. [34] learn users’ and items’ attention fea-
by this rationale, several recent works in recommenda-         tures by exploring different review components such as
tion seek to assess the relative importance of interacted      words, sentences, and topics via a GRU-based network,
items on users involved in those interactions. In the last     while Liu et al. [33] (based upon the solution described
few years, recommendation tasks such as session-based          in [35]) augment users’ and items’ collaborative latent
recommendation [22, 23, 12] and sequential recommen-           factors through features extracted from their generated
dation [13, 24] have been widely addressed by using at-        ratings and reviews. Wang et al. [36] leverage common
tention mechanisms on graphs. Attention mechanisms             review properties (e.g., how helpful the reviews were for
may also be beneficial when the informative content con-       other users) to assess its importance on users and items.
veyed by the bipartite user-item graph is enhanced by             Only recently, very few works have injected the infor-
additional side information, like knowledge graphs [25],       mative content of reviews into graph-based networks for
heterogeneous information networks [14], or multimodal         recommendation. Wu et al. [37] propose a model named
items’ content [26]. Exploiting attention to disentangle       reviews meet graphs (RMG), a multi-view framework
the aspects underlying node interactions may represent         that learns users’ and items’ representation by consid-
a fundamental step toward explainability [27]. Follow-         ering the word- and sentence-level of reviews and ex-
ing this direction, the work by Wang et al. [15] named         ploring two hops of the user-item graphs to access also
disentangled graph collaborative filtering (DGCF), and         user-user and item-item relations. Gao et al. [38] present
the method presented in Wu et al. [28], propose to disen-      a three-structured architecture that catches the short-
tangle user-item connections into possible user intents.       and long-term user preferences and item features, along
   State-of-the-art attention-based approaches provide         with the collaborative information encoded in the bipar-
an efficient neighborhood weighting strategy. However,         tite user-item graph. Shi et al. [39] introduce a dual GCN
their multi-hop exploration is usually limited to prevent      model, where one extracts and propagates review aspects,
nodes in the neighborhood from getting too much similar        and the other reuses the aspect for the graph.
in the latent space (see Section 3.2). Conversely, EGCF           Despite addressing recommendation through differ-
leverages additional information (i.e., reviews) whose ex-     ent strategies, the presented algorithms generally work
tracted opinion-aware features do not flatten differences      by grouping reviews on both users and items profiles
among nodes while easing the weighting process. More-          but, in fact, limiting the exploration of users and items
over, in contrast to prior works, EGCF enriches edges by       neighbors at one hop (i.e., the nearest neighborhood).
Conversely, our proposed approach exploits reviews as                3.2. A limitation in the message-passing
edge side information to describe user-item interactions
                                                                     The user formulation in Equation (2) can be expanded
and propagate their informative content at multiple hops
                                                                     through Equation (1):
to overcome theoretical issues in the way graph-based
recommender systems are usually designed (see later).                  (2)
                                                                                      𝜔 e𝑢′ , ∀𝑢′ ∈ 𝒩 (𝑖) ∖ {𝑢} , ∀𝑖 ∈ 𝒩 (𝑢)
                                                                                  (︀{︀ (︀{︀                    }︀)︀         }︀)︀
                                                                      e𝑢 = 𝜔
                                                                                                                              (3)
                                                                     What emerges is that, by propagating messages at two
3. Methodology                                                       hops, the node embedding of user 𝑢 is eventually refined
                                                                     through the contributions from other users who inter-
The section presents and motivates our proposed method,
                                                                     acted with the same items as 𝑢. In other words, after two
Edge Graph Collaborative Filtering (EGCF). We first in-
                                                                     hops, each user profile is influenced by the profiles
troduce some notation and preliminaries to graph models
                                                                     of other users who rated the same items.
for collaborative filtering. Then, we highlight a poten-
                                                                        Indeed, this assumption is aligned with the rationale
tially critical issue in the message-passing schema. Even
                                                                     behind collaborative filtering, i.e., similar users are likely
if weighting the importance of each neighbor node may
                                                                     to interact with the same items. However, not all user-
alleviate the problem, we discuss the insights and propose
                                                                     item interactions (i.e., graph edges) may be equally im-
an enhanced application of the importance weighting.
                                                                     portant to the users and items involved. Thus, indis-
                                                                     criminately aggregating neighbor node embeddings into
3.1. Notation and preliminaries                                      the ego node could, after multiple hops, harm the node
                                                                     updating process by bringing all contributions from the
Let 𝒰 = {𝑢1 , 𝑢2 , . . . , 𝑢𝑁 } and ℐ = {𝑖1 , 𝑖2 , . . . , 𝑖𝑀 } be
                                                                     neighborhood, even the noisy ones. We interpret this
the sets of 𝑁 users and 𝑀 items in the system, respec-
                                                                     as a node representation error, propagating with the
tively. Then, let us consider a bipartite and undirected
                                                                     exploration hops in the graph.
user-item graph that connects pairs of nodes when there
                                                                        For this reason, contributions coming from each neigh-
exists a recorded interaction among them. User and item
                                                                     bor node are usually weighted before aggregating them
nodes are represented through embeddings in the latent
                                                                     into the ego nodes, modifying the presented formula:
space, i.e., e𝑢 ∈ R𝑑 , ∀𝑢 ∈ 𝒰 and e𝑖 ∈ R𝑑 , ∀𝑖 ∈ ℐ.
   Inspired by popular approaches [5], current graph-
                                                                                    (︁     {︁ (︁{︁
                                                                               (2)     (2)         (1)
                                                                              e𝑢 = 𝜔 𝛼𝑖→𝑢 𝜔 𝛼𝑢′ →𝑖 e𝑢′ ,
based recommender systems refine users’ and items’ node                                               }︁)︁        }︁)︁             (4)
embeddings by exploring their multi-hop interconnec-                                ∀𝑢′ ∈ 𝒩 (𝑖) ∖ {𝑢} , ∀𝑖 ∈ 𝒩 (𝑢)
tions represented in the graph. Let 𝑢 and 𝑖 be the nodes                        (𝑙)
for a user and an item to be updated (i.e., the ego nodes),          where 𝛼𝑗→𝑘 stands for the importance that the neigh-
and let 𝒩 (𝑢) and 𝒩 (𝑖) be the sets of nodes at one hop              bor node 𝑗 has on the ego node 𝑘 after 𝑙 hops. These
from 𝑢 and 𝑖, respectively (i.e., their neighborhood). The           weights are generally calculated by means of attention
ego node embeddings e𝑢 and e𝑖 are updated by aggre-                  mechanisms, and depend on the embeddings of the neigh-
                                                                                                                          (𝑙)
gating their neighborhoods (i.e., messages):                         bor and the ego nodes they refer to, e.g., 𝛼𝑗→𝑘 =
                                                                      (︁                )︁
                                                                         (𝑙−1)    (𝑙−1)
                  (1)
                                                                     𝜙 e𝑗      , e𝑘        , where 𝜙(·, ·) is a neural network:
                e𝑢 = 𝜔 ({e𝑖 , ∀𝑖 ∈ 𝒩 (𝑢)})
                  (1)
                                                              (1)                             (□)
                e𝑖      = 𝜔 ({e𝑢 , ∀𝑢 ∈ 𝒩 (𝑖)})                                                                  (△)
                                                                                   (︁ ⏞ (︁    ⏟    )︁ {︁ (︁{︁ ⏞   ⏟
                                                                              (2)          (1) (1)
                                                                             e𝑢 = 𝜔 𝜙 e𝑖 , e𝑢           𝜔     𝜙(e𝑢′ , e𝑖 ) e𝑢′ ,   (5)
         (1)          (1)
where e𝑢 and e𝑖 are the refined embedding versions                                                      }︁)︁             }︁)︁
of user 𝑢 and item 𝑖 after one hop, while 𝜔(·) indicates                           ∀𝑢′ ∈ 𝒩 (𝑖) ∖ {𝑢} , ∀𝑖 ∈ 𝒩 (𝑢)
the aggregation function. This message-passing pattern                          (2)
                                                                     that is, e𝑢 depends on (□) the importance each neigh-
may be iterated 𝐿 times, thus exploring wider and wider
                                                                     bor item node 𝑖 has on the ego user node 𝑢 after one hop,
neighborhoods of the ego nodes. After two hops, the
                                                                     and (△) the importance all users interacting with the
refined embeddings of user 𝑢 and item 𝑖 are:
                                                                     same items as 𝑢 have on their items. Note that (□) may
                (2)
                            (︁{︁
                            (1)
                                          }︁)︁                       be further expanded:
               e𝑢 = 𝜔      e𝑖 , ∀𝑖 ∈ 𝒩 (𝑢)
                                                              (2)
                                                                      (︁         )︁       (︁ (︁{︁                             }︁)︁
                                                                         (1) (1)                  (1)
                (2)
                      (︁{︁
                            (1)
                                          }︁)︁                       𝜙 e𝑖 , e𝑢      = 𝜙 𝜔 𝛼𝑢′ →𝑖 e𝑢′ , ∀𝑢′ ∈ 𝒩 (𝑖) ∖ {𝑢} ,
               e𝑖 = 𝜔      e𝑢 , ∀𝑢 ∈ 𝒩 (𝑖)                                           (︁{︁                           }︁)︁)︁
                                                                                            (1)
                                                                                   𝜔 𝛼𝑖′ →𝑢 e𝑖′ , ∀𝑖′ ∈ 𝒩 (𝑢) ∖ {𝑖}
                                                                                   (︁ (︁{︁                                  }︁)︁
                                                                             = 𝜙 𝜔 𝜙(e𝑢′ , e𝑖 )e𝑢′ , ∀𝑢′ ∈ 𝒩 (𝑖) ∖ {𝑢} ,
                                                                                     (︁{︁                                }︁)︁)︁
                                                                                   𝜔 𝜙(e𝑖′ , e𝑢 )e𝑖′ , ∀𝑖′ ∈ 𝒩 (𝑢) ∖ {𝑖}
                                                                                                                                   (6)
When merging Equation (5) and Equation (6):                        mapped to word embeddings, which are injected into an
                                                                   opinion-based model pretrained to predict the rating ex-
                                                                   pressed by the user through specific terms in the review.
                 (□)           (△)
           ⏞   ⏟          ⏞   ⏟
           𝜙(e𝑢′ , e𝑖 ) 𝜙(e𝑖′ , e𝑢 )       (□)                     While the output model carries the single information
                                                                   about the predicted review score, the activation of a hid-
               ⏞        ⏟
                                                             (7)
             (︁ (︁           )︁ {︁ (︁{︁ ⏞    ⏟
       (2)           (1) (1)
      e𝑢 = 𝜔 𝜙 e𝑖 , e𝑢            𝜔     𝜙(e𝑢′ , e𝑖 ) e𝑢′ ,
                                                                   den layer would unveil a richer source of textual features
                                                                   (i.e., an embedding) which drove the opinion-based model
                                  }︁)︁             }︁)︁
             ∀𝑢′ ∈ 𝒩 (𝑖) ∖ {𝑢} , ∀𝑖 ∈ 𝒩 (𝑢)
                                                                   to predict that score. High-level features extracted from
The node embedding for user 𝑢 after two hops depends               pretrained deep learning models can boost the recommen-
on (□) the importance of all users interacting with the            dation performance of recommender systems leveraging
same items as 𝑢 on those items, and (△) the importance             items’ side information (e.g., visual-based recommender
of all items interacted by 𝑢 on user 𝑢. In other words,            systems [40, 41]). We deem these textual features to de-
weighting the importance of each neighbor node on the              serve a pivotal role in this weighting process.
ego node before the aggregation allows, after two propa-               Let r𝑢𝑖 ∈ R𝑓 be the textual embedding extracted from
gation hops, to calculate to what extent each user pro-            the review of user 𝑢 about item 𝑖 through the pretrained
file is influenced by the profiles of the other users              opinion-based model. First, we project r𝑢𝑖 ∈ R𝑓 to the
who rated the same items. Without loss of generality,              same latent space as e𝑢 ∈ R𝑑 and e𝑖 ∈ R𝑑 with a one-
a similar consideration could be made after a number of            layer neural network:
hops greater than two.
                                                                                p𝑢𝑖 = LeakyReLU (Wr𝑢𝑖 + b)                (8)

3.3. Enhancing neighborhood weighting                              where p𝑢𝑖 ∈ R is the projected review embedding,
                                                                                    𝑑

                                                                   while W ∈ R𝑓 ×𝑑 and b ∈ R𝑑 are the projection matrix
     through reviews                                               and the bias, respectively. We seek to retain only those
As known, graph-based models in machine learning are               textual features of review r𝑢𝑖 which can be significant
affected by oversmoothing [16, 17]. This phenomenon                to later calculate the interdependence between this
leads node embeddings, after multiple propagation hops,            embedding and user/item ones.
to become closer and closer in their representation in                Then, we propose to enhance the neighborhood
the latent space, eventually flattening their existing dif-        weighting procedure at hop 𝑙 by conditioning the im-
ferences. As this behavior would profoundly weaken                 portance weights also on the projected embedding of the
models’ performance, exploration of the neighborhood               review connecting user 𝑢 and item 𝑖. For instance, the
generally tends to be constrained to very few hops (e.g.,          importance of the neighbor item node 𝑖 on the ego user
a maximum of two hops in attention-based weighting).               node 𝑢 after 𝑙 hops is calculated as:
However, in recommendation scenarios, limiting                                  (𝑙)
                                                                                       (︁
                                                                                          (𝑙−1) (𝑙−1)
                                                                                                            )︁
the exploration of the user-item bipartite graph                               𝛼𝑖→𝑢 = 𝜙 e𝑖     , e𝑢   , p𝑢𝑖               (9)
may represent an inconsistency to the idea of col-
                                                                   Note that, since p𝑢𝑖 cannot increase the impact of the
laborative filtering, where users are connected to share
                                                                   oversmoothing effect (because it is not dependent
preferences and tastes for similar items.
                                                                   on the hop 𝑙), its usage in the importance weight
   Under this assumption, we believe the neighborhood
                                                                   formula becomes even more beneficial. Let us focus
weighting process could be further enhanced by exploit-
                                                                   on the weighting function 𝜙(·, ·, ·). Many approaches
ing other sources of information that are not usually
                                                                   from the literature propose to leverage attention mecha-
taken into account. In the majority of popular online
                                                                   nisms, usually implemented as a neural network trained
platforms for e-commerce (e.g., Amazon), reviews are
                                                                   in the downstream task to predict the importance of the
fundamental tools to share opinions and comments
                                                                   neighbor node on the ego node. In our solution, we opt
about interacted items, as they convey the multi-faceted
                                                                   for a simplified and lightweight formulation that seeks
aspects that drove a user to interact with an item. Lever-
                                                                   to calculate the similarity between the neighbor and the
aging such side information on the connections exist-
                                                                   ego nodes, conditioned on the opinion embedding
ing among users and items in the bipartite graph (i.e.,
                                                                   of the review connecting them. Specifically:
graph edges) can improve the learning of the importance
weights by reducing the oversmoothing effect because
                                                                                    (︁                            )︁
                                                                           (𝑙)         (𝑙−1)          (𝑙−1)
                                                                          𝛼𝑖→𝑢 = cos e𝑖      ⊙ p𝑢𝑖 , e𝑢     ⊙ p𝑢𝑖        (10)
each user/item node embedding is conditioned on the
opinion conveyed by the review.                                    where ⊙ is the element-wise multiplication, and cos(·, ·)
   Let 𝒲𝑢𝑖 = {𝑤1 , 𝑤2 , . . . , 𝑤𝑅 } be the set of 𝑅 words         is the cosine similarity. Note that we suppress nega-
that compose the review written by user 𝑢 about item               tive similarities to zero as such weights are usually non-
𝑖. After an initial tokenization step, the sets of tokens          negative. Multiplying both node embeddings by the re-
for 𝒲𝑢𝑖 is defined as 𝒯𝑢𝑖 = {𝑡1 , 𝑡2 , . . . , 𝑡𝑇 }. Tokens are    view opinion embedding provides the interplay between
each node feature and the opinion features, thus produc-
ing a modified version of the node representation
that conveys a richer source of information. No
trainable projection weight is learned in the presented
formulation since the contribution of the review embed-                                                     𝐺𝐶𝑁!
                                                                                                                                    𝑙+3
                                                                                                                                                                                  𝐺𝐶𝑁"
                                                                                                                                                                                                                                       𝑙+3


ding is meaningful enough.                                                                                                 𝑙
                                                                                                                                 𝑙+2
                                                                                                                               𝑙+1                                          𝒑𝒖𝒊                                              𝑙
                                                                                                                                                                                                                                    𝑙+2
                                                                                                                                                                                                                                  𝑙+1

                                                                                       (𝒍&𝟏)                                                                  𝒓𝒖𝒊                          (𝒍&𝟏)
                                                                                                                                                                                          𝒄 𝒊𝟑                        𝛼 (%)!→#
                                                                                     𝒆 𝒊𝟑
                                                                                                      (𝒍&𝟏)
                                                                                                                      𝛼!→#                                                                                 (𝒍&𝟏)
                                                                                                     𝒆 𝒊𝟐                                                                                                 𝒄 𝒊𝟐
                                                                                               (𝒍)                                                                                                  (𝒍)
                                                                                           𝒆𝒖                      (𝒍&𝟏)                                                                           𝒄𝒖                (𝒍&𝟏)
                                                                                                               𝒆𝒖 𝟑            (𝒍&𝟏)                                                                                𝒄𝒖 𝟑         (𝒍&𝟏)

3.4. A double message-passing schema                                                 (𝒍&𝟏)
                                                                                    𝒆 𝒊𝟏
                                                                                                                           𝒆𝒖 𝟐

                                                                                                                            (𝒍)
                                                                                                                           𝒆𝒊                               Opinion-based
                                                                                                                                                                                           (𝒍&𝟏)
                                                                                                                                                                                         𝒄 𝒊𝟏
                                                                                                                                                                                                                  (𝒍&𝟏)
                                                                                                                                                                                                                             𝒄𝒖 𝟐

                                                                                                                                                                                                                              (𝒍)
                                                                                                                                                                                                                             𝒄𝒊
                                                                                                             (𝒍&𝟏)                                                                                               𝒄𝒖 𝟏
                                                                                                            𝒆𝒖 𝟏                                               model


The proposed neighborhood weighting procedure can
help correct the representation error generated in the
                                                                                                             (a)                                                                   (b)


traditional message-passing schema. However, the idea                       Figure 2: Overview of the node refining algorithm proposed
is not to completely replace it, as several recent works                    for EGCF. A statically-weighted GCN network affected by
from the literature have demonstrated its efficacy, espe-                   node representation error (a) is corrected through another
cially in producing accurate recommendations [8]. The                       GCN network (b), where an opinion-based embedding is ex-
                                                                            tracted from each review as edge side information to weight
proposed approach involves a double message-passing
                                                                            the importance of the neighbor nodes on their ego nodes.
schema, where two graph models are trained to refine
their own user/item node representations. While
the first one aggregates the contributions coming from                      Table 1
the neighbor nodes into the ego nodes by weighting the                      Statistics of the tested datasets.
neighborhood importance on the ego node statically,                                                                                                                                                                Average
the second one aggregates the neighborhood’s messages                          Datasets                       #Users                      #Items     #Interactions           Density                             interactions

which are also weighted through the opinion embed-
                                                                                                                                                                                                                   per user
                                                                                Baby                            4,669                     5,435            29,214             0.00115                                      6.3
dings from reviews.                                                          Boys & Girls                       8,806                     4,165            57,928             0.00158                                      6.6
   We define the two graph convolutional networks as                            Men                             3,218                     7,605            60,299             0.00246                                      18.7

GCN𝑒 (error-affected) and GCN𝑐 (correction) and assign
the node embeddings e* to GCN𝑒 , and the node embed-
dings c* to GCN𝑐 . As for the aggregation function, in                      of the node refining algorithm proposed for EGCF is dis-
both cases, we sum the weighted messages coming from                        played in Figure 2.
the neighbor nodes. As such, the update of the user node                       Given the learned error-affected and correction embed-
embedding 𝑢 after 𝑙 hops is calculated as:                                  dings from above, EGCF predicts if a user 𝑢 may interact
                                                         (𝑙−1)
                                                                            with item 𝑖 through the following formulation:
  (𝑙)
            ∑︁            (𝑙−1)
                                       ∑︁               e𝑖
e𝑢 =                𝛼𝑖→𝑢 e𝑖       =             √︀             √︀                                                               𝑦
                                                                                                                                ^𝑢𝑖 =              e⊤ e𝑖        + c⊤ c𝑖                                                                      (13)
          𝑖∈𝒩 (𝑢)                     𝑖∈𝒩 (𝑢)
                                                     |𝒩 (𝑢)|     |𝒩 (𝑖)|                                                                           ⏟ 𝑢⏞           ⏟ 𝑢⏞
                                                                                                                                             error-affected           correction
  (𝑙)                     (𝑙)     (𝑙−1)
            ∑︁
c𝑢 =                𝛼𝑖→𝑢 𝛼𝑖→𝑢 c𝑖          =
          𝑖∈𝒩 (𝑢)
                                                                            Thus, we apply the error correction to the user/item
                  (︁                             )︁                         embedding representation only when predicting the
                     (𝑙−1)          (𝑙−1)
            ∑︁ cos e𝑖      ⊙ p𝑢𝑖 , e𝑢      ⊙ p𝑢𝑖
                                                     (𝑙−1)
                                                                            user/item interaction. We optimize EGCF with the state-
        =              √︀        √︀                 c𝑖                      of-the-art Bayesian Personalized Ranking (BPR) [42].
          𝑖∈𝒩 (𝑢)
                         |𝒩 (𝑢)|   |𝒩 (𝑖)|
                                                                     (11)
Note that 𝛼𝑖→𝑢 is static and only depends on the topol-                     4. Experiments and Discussion
                                   (𝑙)
ogy of the bipartite graph, while 𝛼𝑖→𝑢 varies along with
the exploration hop and depends on the embeddings of
                                                                            4.1. Experimental Setup
ego/neighbor nodes, and the opinion review embedding.      Datasets. We use three popular [43, 44] datasets from
After 𝐿 propagation hops, the final embedding represen-    Amazon’s Baby, Boys & Girls, and Men categories [18]
tation is obtained as:                                     which contain historical user-item interactions and re-
               𝐿
              ∑︁ 1 (𝑙)               𝐿
                                    ∑︁ 1 (𝑙)               views. We retain only interactions with non-empty re-
         e𝑢 =          e𝑢 , e𝑖 =            e𝑖             views, then keep the 20k and 10k most popular items for
                  1+𝑙                   1+𝑙
                                                     (12) Baby and Boys & Girls/Men, respectively. Finally, we ap-
              𝑙=0                   𝑙=0
               𝐿
              ∑︁   1     (𝑙)
                                     𝐿
                                    ∑︁   1    (𝑙)
                                                           ply the 5- and 15-core on items and users on Baby/Boys
         c𝑢 =
                  1+𝑙
                       c 𝑢 , c𝑖 =
                                        1+𝑙
                                            c𝑖             & Girls and Men, respectively. Statistics are in Table 1.
              𝑙=0                   𝑙=0
                                                           Baselines. We compare our approach with eight state-of-
where we apply the scaling factor 1/(1+𝑙) to further alle- the-art models spanning several families: (i) traditional
viate the oversmoothing problem. A schematic overview CF (BPRMF [42] and MultiVAE [4]); (ii) review-based
CF (ConvMF [31] and RMG [37]); (iii) graph-based CF                    Table 2
(NGCF [7] and LightGCN [8]); (iv) graph-based CF with                  Accuracy metrics, i.e., 𝑅𝑒𝑐𝑎𝑙𝑙, 𝑛𝐷𝐶𝐺, and 𝑀 𝐴𝑅, for top-10
attention (GAT [11] and DGCF [15]).                                    lists. Best value is in bold, while second-to-best is underlined.
Reproducibility. We adopt the temporal leave-one-out                    Models                      Baby                              Boys & Girls                        Men

to split the datasets, where the last two recorded inter-               MostPop
                                                                                       𝑅𝑒𝑐𝑎𝑙𝑙
                                                                                        0.0940
                                                                                                   𝑛𝐷𝐶𝐺
                                                                                                    0.0520
                                                                                                                𝑀 𝐴𝑅
                                                                                                                 0.0627
                                                                                                                           𝑅𝑒𝑐𝑎𝑙𝑙
                                                                                                                            0.1195
                                                                                                                                        𝑛𝐷𝐶𝐺
                                                                                                                                         0.0647
                                                                                                                                                     𝑀 𝐴𝑅
                                                                                                                                                      0.0776
                                                                                                                                                               𝑅𝑒𝑐𝑎𝑙𝑙
                                                                                                                                                                0.0702
                                                                                                                                                                         𝑛𝐷𝐶𝐺
                                                                                                                                                                          0.0590
                                                                                                                                                                                   𝑀 𝐴𝑅
                                                                                                                                                                                    0.0672
actions are included in the validation and test. We tune                BPRMF           0.1377      0.0785       0.0980     0.1821       0.1446       0.1666    0.1662    0.1314    0.1527

hyper-parameters with [45] and follow the baselines pa-
                                                                        MultiVAE        0.1768      0.1262       0.1455     0.2224       0.1695       0.1990    0.2091    0.1656    0.1898
                                                                        ConvMF          0.1230      0.0647       0.0800     0.1146       0.0831       0.0972    0.0838    0.0524    0.0584
pers, and fix the batch size to 256 and epochs to 400. As               RMG
                                                                        NGCF
                                                                                        0.1272
                                                                                        0.1411
                                                                                                    0.0911
                                                                                                    0.0916
                                                                                                                 0.1059
                                                                                                                 0.1092
                                                                                                                            0.1512
                                                                                                                            0.2006
                                                                                                                                         0.1065
                                                                                                                                         0.1523
                                                                                                                                                      0.1325
                                                                                                                                                      0.1783
                                                                                                                                                                0.1067
                                                                                                                                                                0.1969
                                                                                                                                                                          0.0727
                                                                                                                                                                          0.1461
                                                                                                                                                                                    0.0867
                                                                                                                                                                                    0.1722
for EGCF, we extract review embeddings through a pop-                   LightGCN        0.1892      0.1362       0.1590     0.2305       0.1743       0.2054    0.2124    0.1605    0.1882

ular pre-trained model1 . Datasets and codes are publicly
                                                                        GAT             0.1595      0.1051       0.1233     0.2069       0.1573       0.1846    0.1695    0.1254    0.1476
                                                                        DGCF            0.1874      0.1352       0.1558     0.2249       0.1716       0.2023    0.2070    0.1554    0.1823

available2 . All models are implemented in Elliot [46].                 EGCF             0.1944* 0.1402* 0.1623*             0.2325
                                                                        *statistically significant differences (p-value ≤ 0.05).
                                                                                                                                       0.1792*       0.2089*   0.2195*   0.1703*   0.1988*


Evaluation protocol. We measure the model accuracy
by adopting the recall (𝑅𝑒𝑐𝑎𝑙𝑙@𝑘), the normalized dis-
counted cumulative gain (𝑛𝐷𝐶𝐺@𝑘), and the mean av-      to weight the importance of neighbor nodes is rewarded
erage recall (𝑀 𝐴𝑅@𝑘) [8, 15]. Additionally, consider-  in Baby and Boys & Girls, where GAT always outper-
ing the influence of novel and diverse recommendation   forms NGCF, reaching remarkable results such as the
lists [47, 48] on both user’s and business’s interests, we
                                                        𝑅𝑒𝑐𝑎𝑙𝑙 on Baby (i.e., 0.1595 vs. 0.1411) and the 𝑀 𝐴𝑅 on
also assess beyond-accuracy metrics such as the expectedBoys & Girls (i.e., 0.1846 vs. 0.1783). Disentangling users’
popularity complement (𝐸𝑃 𝐶@𝑘) and the expected free    intents on interacted items (i.e., DGCF) produces even
discovery (𝐸𝐹 𝐷@𝑘), along with indices measuring con-   more accurate recommendations to NGCF on all datasets.
centration and coverage, i.e., the 1’s complement of theNevertheless, LightGCN always performs better than
Gini (𝐺𝑖𝑛𝑖@𝑘), the Shannon entropy (𝑆𝐸@𝑘), and the      DGCF apart from very few cases (i.e., 𝑛𝐷𝐶𝐺 and 𝑀 𝐴𝑅
item coverage (𝑖𝐶𝑜𝑣@𝑘). Specifically, the 𝐸𝑃 𝐶@𝑘 and    on Men), even though DGCF’s calculated accuracy values
the 𝐸𝐹 𝐷@𝑘 refer to long-tail items and stand for the ex-
                                                        do not substantially differ from LightGCN’s ones (e.g., see
pected number of recommended unknown items which        the 𝑀 𝐴𝑅 on Baby). Noticeably, the proposed model (i.e.,
are also relevant, and the expected number of recom-    EGCF) outperforms the other baselines under all settings
mended known items which are also relevant, respec-     and datasets, with near 100% statistical hypothesis tests
tively. Furthermore, the 𝐺𝑖𝑛𝑖@𝑘 and the 𝑆𝐸@𝑘 are        (i.e., paired t-test) showing that the results significantly
used to assess items’ distributional inequality, i.e., how
                                                        differ. This finding further motivates the goodness of
unequally a recommender system shows different items    the solution. While we observe a substantial accuracy
to users, and the 𝑖𝐶𝑜𝑣@𝑘 quantifies the number of items improvement in traditional and review-based approaches
that the model recommends. For all metrics, higher val- (e.g., +12% to MultiVAE for the 𝑀 𝐴𝑅 on Boys & Girls
ues mean better performance. We leave the assessment    and +53% to RMG for the 𝑅𝑒𝑐𝑎𝑙𝑙 on Baby), introducing
of complexity measures for the proposed model in future an additional GCN-like network guided by users’ reviews
extensions of the work.                                 is even more beneficial to correct the representation error
                                                        observable in unweighted graph approaches. Particularly,
4.2. Results and Discussion                             results show that such correction may lead to small accu-
                                                        racy improvements in some cases (e.g., see the 𝑅𝑒𝑐𝑎𝑙𝑙 on
Recommendation accuracy (RQ1). Table 2 reports Boys & Girls when correcting LightGCN) but also larger
the results for accuracy measures on the top-10 recom- ones in other cases (e.g., see the 𝑛𝐷𝐶𝐺 on Men when
mendation lists. Surprisingly, the sole introduction of correcting LightGCN). Such outcomes suggest that while
reviews does not seem to produce a consistent accuracy keeping the error-affected contribution in the final predic-
boost. For instance, the strongest review-based method tion formula is useful to preserve the superior performance
(i.e., RMG) surpasses BPRMF only for the 𝑛𝐷𝐶𝐺 and of graph-based models to traditional and review-based ap-
the 𝑀 𝐴𝑅 on Baby (i.e., 0.0911 vs. 0.0785 and 0.1059 proaches, the introduced correction term is useful to gain
vs. 0.0980, respectively). Contrarily, adopting a graph even more accurate preference predictions than unweighted
model can increase the accuracy to traditional CF. When graph architectures.
comparing LightGCN with MultiVAE, which obtain the Recommendation novelty and diversity (RQ2). We
best performance in their respective recommendation also assess how novel and diverse recommendation lists
families, we observe that the former improves, on Baby, are. The two novelty metrics in Table 3 (i.e., the 𝐸𝑃 𝐶@𝑘
the 𝑅𝑒𝑐𝑎𝑙𝑙 of 7% and the 𝑀 𝐴𝑅 of 9%. However, the and the 𝐸𝐹 𝐷@𝑘, left side) are discussed with concentra-
observed difference even reverts on Men for the 𝑛𝐷𝐶𝐺 tion and coverage indices (i.e., the 𝐺𝑖𝑛𝑖@𝑘, the 𝑆𝐸@𝑘,
and the 𝑀 𝐴𝑅. The application of attention mechanisms and the 𝑖𝐶𝑜𝑣@𝑘, right side) as in an ideal recommender
1
  Please refer to our GitHub repository.                system, a loosely concentrated and large set of recom-
2
    https://github.com/sisinflab/Edge-Graph-Collaborative-Filtering.
Table 3
Calculated novelty metrics, i.e., 𝐸𝑃 𝐶 and 𝐸𝐹 𝐷, on the left side, and diversity indices, i.e., 𝐺𝑖𝑛𝑖, 𝑆𝐸 , and 𝑖𝐶𝑜𝑣 , on the right
side, for top-10 lists. Best value is in bold, while second-to-best is underlined.
     Models               Baby                 Boys & Girls                     Men                   Models                  Baby                                 Boys & Girls                     Men
                   𝐸𝑃 𝐶          𝐸𝐹 𝐷         𝐸𝑃 𝐶     𝐸𝐹 𝐷       𝐸𝑃 𝐶                𝐸𝐹 𝐷                           𝐺𝑖𝑛𝑖       𝑆𝐸       𝑖𝐶𝑜𝑣             𝐺𝑖𝑛𝑖         𝑆𝐸         𝑖𝐶𝑜𝑣     𝐺𝑖𝑛𝑖         𝑆𝐸       𝑖𝐶𝑜𝑣
     MostPop        0.0108       0.0728       0.0135    0.0913     0.0112             0.0904          MostPop        0.0018    3.5313       18            0.0023       3.5724         18   0.0015       3.9332       32
     BPRMF          0.0164       0.1153       0.0306    0.2282     0.0259             0.2167          BPRMF          0.0019    3.7819       40            0.0031       4.0921       203    0.0037       5.2991     192
     MultiVAE       0.0268       0.2088       0.0360    0.2874     0.0333             0.2912          MultiVAE       0.2139   9.9160     4,143            0.2671      10.2463     3,824    0.1085       9.8988   3,014
     ConvMF         0.0135       0.0930       0.0174    0.1219     0.0102             0.0857          ConvMF         0.0018    3.5933       18            0.0030       3.9745       220    0.0029    4.6783        265
     RMG            0.0193       0.1488       0.0226    0.1787     0.0144             0.1226          RMG            0.1059    9.4892    2,130            0.1567       9.7193     2,538    0.1146   10.0344      2,549
     NGCF           0.0194       0.1463       0.0323    0.2510     0.0292             0.2531          NGCF           0.0948    8.8700    2,641            0.3031     10.5595      3,668    0.1749   10.7116      3,651
     LightGCN       0.0289       0.2271       0.0371    0.3012     0.0323             0.2856          LightGCN       0.1405    9.3105    3,417            0.2398      10.1586     3,647    0.2051   10.8815      4,384
     GAT            0.0223       0.1708       0.0334    0.2616     0.0248             0.2106          GAT            0.1370    9.2024    3,102            0.2496      10.2821     3,449    0.1235    9.7802      3,530
     DGCF           0.0287       0.2228       0.0365    0.2945     0.0311             0.2734          DGCF           0.0673    8.3193    2,325            0.1800       9.7617     3,208    0.1304   10.2011      3,378
     EGCF             0.0298* 0.2359* 0.0382* 0.3120*            0.0343*          0.3066*             EGCF            0.2294       9.8535 4,490 0.3037              10.4545 4,030 0.2208            10.8876      4,920
     *statistically significant differences (p-value ≤ 0.05)                                          Statistical significance is not reported since it is calculated only on user level.


                                                                                                                                 0.312
           0.194                                        0.350                   0.232                                                                                                                        0.306
                                                                                                                                                          0.219
                                                                                                                                 0.310
  Recall


                                                                       Recall


                                                                                                                                                 Recall
           0.193                                                                0.230
                                                                 EFD


                                                                                                                                          EFD


                                                                                                                                                                                                                      EFD
                                                        0.300                                                                    0.308                    0.218                                              0.304
           0.192                                                                0.228                                            0.306
                                                        0.250
                                                                                                                                                          0.217
           0.191                                                                                                                 0.304                                                                       0.302
                                                                                0.226
                    1        2            3       4                                            1       2         3        4                                              1        2        3        4

                             (a) Baby                                                              (b) Boys & Girls                                                               (c) Men

Figure 3: Recommendation performance of EGCF, i.e., 𝑅𝑒𝑐𝑎𝑙𝑙@𝑘 (histogram bars in teal blue) and 𝐸𝐹 𝐷@𝑘 (histogram bars
in lime green), on top-10 recommendation lists, when varying the number of explored hops from 1 to 4.


mended items should equally span different ranges of                                                           without retaining less popular items from the long-tail
popularity. As previously observed, EGCF is again the                                                          (observing the same models, +3% for the 𝐸𝑃 𝐶 on Baby
best or second-to-best technique. While NGCF is not                                                            and +6% for the 𝐸𝐹 𝐷 on Boys & Girls). Such outcomes
as capable as LightGCN of proposing long-tail items on                                                         demonstrate that the content enrichment brought by the
Boys & Girls (e.g., 0.2510 vs. 0.3012 for the 𝐸𝐹 𝐷), the                                                       extracted review features (injected into the representation
former surpasses the latter for the concentration indices                                                      error correction) allows to explore user-item interactions at
on the same dataset (e.g., 10.5595 vs. 10.1586 for the 𝑆𝐸).                                                    multiple hops, leading to more heterogeneous recommen-
Since NGCF adopts an ego-neighbor interaction compo-                                                           dation lists which also include items from the long-tail.
nent, the concentration of explored and recommended                                                            Effect of hop exploration number (RQ3). Figure 3
near items gets loose. Moreover, neighborhood weight-                                                          displays, for EGCF, the 𝑅𝑒𝑐𝑎𝑙𝑙@𝑘 and 𝐸𝐹 𝐷@𝑘 perfor-
ing leads to recommend items from the long tail (e.g.,                                                         mance variation on top-10 recommendation lists when
comparing GAT with NGCF, we observe a +17% for the                                                             exploring a number of hops in the range 1-4, where even
𝐸𝐹 𝐷 on Baby). However, such a finding is not consis-                                                          numbers stand for same node type connections (e.g., user-
tent with the trend recognized for the concentration and                                                       user), while odd numbers refer to opposite node type con-
coverage indices (e.g., when comparing LightGCN with                                                           nections (i.e., user-item). As evident from the histograms
DGCF, we notice 0.1304 vs. 0.2051 for the 𝐺𝑖𝑛𝑖 on Men),                                                        of Baby and Boys & Girls, the 𝑅𝑒𝑐𝑎𝑙𝑙@𝑘 consistently
as the neighborhood weighting procedure comes at the                                                           increases from 1 to 4 hops (this is why we adopt four
expense of a limited hop exploration, not allowing such                                                        hop explorations for EGCF on those datasets). The same
models to explore wider catalog portions. Conversely,                                                          trend is not observable for Men, where two explored hops
injecting user-generated reviews brings new informative                                                        seem to provide the highest accuracy boost, motivating
content (e.g., RMG recommends a broader and less con-                                                          the adoption of 2 hop explorations for EGCF on the same
centrated range of items from the catalog than DGCF on                                                         dataset. Such behavior could be due to the average num-
the Baby dataset). Finally, weighting the neighborhood                                                         ber of users’ interacted items in Men (approximately 19,
importance and exploring long-distant user-item inter-                                                         see Table 1). The node refining probably does not re-
actions through reviews-enriched content (i.e., EGCF)                                                          quire a broad exploration of its neighborhood. As for the
allows to retrieve larger portions of heterogeneous items                                                      𝐸𝐹 𝐷@𝑘, the Baby and the Men datasets seem to agree
(e.g., EGCF outperforms LightGCN for the 𝐺𝑖𝑛𝑖 by +63%                                                          on two exploration hops to produce the most diverse
on Baby and DGCF for the 𝑆𝐸 by +7% on Boys & Girls),                                                           item lists of recommendations because they leverage (as
previously recalled) user-user and item-item intercon-         [6] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L.
nections (and similarities). The trend is also aligned with        Hamilton, J. Leskovec, Graph convolutional neural
the Boys & Girls dataset, where user-user and item-item            networks for web-scale recommender systems, in:
links are exploited even at a higher depth (i.e., four ex-         KDD, ACM, 2018, pp. 974–983.
ploration hops). The emerged insights shed light on two        [7] X. Wang, X. He, M. Wang, F. Feng, T. Chua, Neural
main contributions: (i) with the modified neighborhood             graph collaborative filtering, in: SIGIR, ACM, 2019,
weighting process, which makes use of reviews to enhance           pp. 165–174.
the informative content carried by user-item interactions,     [8] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, M. Wang,
EGCF is less limited in the hop exploration, thus providing        Lightgcn: Simplifying and powering graph convolu-
more accurate recommendations, and (ii) user-user and              tion network for recommendation, in: SIGIR, ACM,
item-item connections are the keystones on which building          2020, pp. 639–648.
more diverse item recommendation lists.                        [9] K. Mao, J. Zhu, X. Xiao, B. Lu, Z. Wang, X. He, Ul-
                                                                   tragcn: Ultra simplification of graph convolutional
                                                                   networks for recommendation, in: CIKM, ACM,
5. Conclusion and Future Work                                      2021, pp. 1253–1262.
                                                              [10] Y. Shen, Y. Wu, Y. Zhang, C. Shan, J. Zhang, K. B.
This work proposes Edge Graph Collaborative Filtering
                                                                   Letaief, D. Li, How powerful is graph convolution
(EGCF), which incorporates users’ opinions extracted
                                                                   for recommendation?, in: CIKM, ACM, 2021, pp.
from reviews into the edges of a GCN to weight the
                                                                   1619–1629.
neighborhood importance on the ego node. Extensive
                                                              [11] P. Velickovic, G. Cucurull, A. Casanova, A. Romero,
experimental evaluation shows that EGCF outperforms
                                                                   P. Liò, Y. Bengio, Graph attention networks, in:
traditional, review- and graph-based models. The work
                                                                   ICLR (Poster), OpenReview.net, 2018.
complements with an analysis of beyond-accuracy per-
                                                              [12] Y. Xie, Z. Li, T. Qin, F. Tseng, J. Kristinsson, S. Qiu,
formance and an extensive study on the number of lay-
                                                                   Y. L. Murphey, Personalized session-based recom-
ers. Leveraging the importance of graph edges through
                                                                   mendation using graph attention networks, in:
node-node side information (e.g., users’ reviews) opens
                                                                   IJCNN, IEEE, 2021, pp. 1–8.
to future directions, namely: (i) study the impact of this
                                                              [13] M. Zhang, C. Guo, J. Jin, M. Pan, J. Fang, Sequential
re-weighting by making it a hyper-parameter, and (ii) an-
                                                                   recommendation with context-aware collaborative
alyze the possible application of the proposed technique
                                                                   graph attention networks, in: IJCNN, IEEE, 2021,
to different tasks other than recommendation.
                                                                   pp. 1–8.
                                                              [14] Y. Wang, S. Tang, Y. Lei, W. Song, S. Wang,
Acknowledgments                                                    M. Zhang, Disenhan: Disentangled heterogeneous
                                                                   graph attention network for recommendation, in:
The authors acknowledge partial support from the                   CIKM, ACM, 2020, pp. 1605–1614.
projects PASSEPARTOUT, ServiziLocali2.0, Smart Rights         [15] X. Wang, H. Jin, A. Zhang, X. He, T. Xu, T. Chua,
Management Platform, BIO-D, and ERP4.0.                            Disentangled graph collaborative filtering, in: SI-
                                                                   GIR, ACM, 2020, pp. 1001–1010.
                                                              [16] D. Chen, Y. Lin, W. Li, P. Li, J. Zhou, X. Sun, Measur-
References                                                         ing and relieving the over-smoothing problem for
                                                                   graph neural networks from the topological view,
 [1] M. D. Ekstrand, J. Riedl, J. A. Konstan, Collabora-
                                                                   in: AAAI, AAAI Press, 2020, pp. 3438–3445.
     tive filtering recommender systems, Found. Trends
                                                              [17] K. Zhou, X. Huang, Y. Li, D. Zha, R. Chen, X. Hu,
     Hum. Comput. Interact. 4 (2011) 175–243.
                                                                   Towards deeper graph neural networks with differ-
 [2] Y. Koren, R. M. Bell, C. Volinsky, Matrix factoriza-
                                                                   entiable group normalization, in: NeurIPS, 2020.
     tion techniques for recommender systems, Com-
                                                              [18] J. Ni, J. Li, J. J. McAuley, Justifying recommen-
     puter 42 (2009) 30–37.
                                                                   dations using distantly-labeled reviews and fine-
 [3] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, T. Chua,
                                                                   grained aspects, in: EMNLP/IJCNLP (1), Associa-
     Neural collaborative filtering, in: WWW, ACM,
                                                                   tion for Computational Linguistics, 2019, pp. 188–
     2017, pp. 173–182.
                                                                   197.
 [4] D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara,
                                                              [19] L. Chen, G. Chen, F. Wang, Recommender systems
     Variational autoencoders for collaborative filtering,
                                                                   based on user reviews: the state of the art, User
     in: WWW, ACM, 2018, pp. 689–698.
                                                                   Model. User Adapt. Interact. 25 (2015) 99–154.
 [5] T. N. Kipf, M. Welling, Semi-supervised classifica-
                                                              [20] M. Srifi, A. Oussous, A. A. Lahcen, S. Mouline, Rec-
     tion with graph convolutional networks, in: ICLR
                                                                   ommender systems based on collaborative filtering
     (Poster), OpenReview.net, 2017.
                                                                   using review texts - A survey, Inf. 11 (2020) 317.
[21] R. van den Berg, T. N. Kipf, M. Welling, Graph con-      [36] X. Wang, I. Ounis, C. Macdonald, Leveraging re-
     volutional matrix completion, CoRR abs/1706.02263             view properties for effective recommendation, in:
     (2017).                                                       WWW, ACM / IW3C2, 2021, pp. 2209–2219.
[22] W. Song, Z. Xiao, Y. Wang, L. Charlin, M. Zhang,         [37] C. Wu, F. Wu, T. Qi, S. Ge, Y. Huang, X. Xie, Re-
     J. Tang, Session-based social recommendation via              views meet graphs: Enhancing user and item repre-
     dynamic graph attention networks, in: WSDM,                   sentations for recommendation with hierarchical
     ACM, 2019, pp. 555–563.                                       attentive graph neural network, in: EMNLP/IJC-
[23] C. Xu, P. Zhao, Y. Liu, V. S. Sheng, J. Xu, F. Zhuang,        NLP (1), Association for Computational Linguistics,
     J. Fang, X. Zhou, Graph contextualized self-                  2019, pp. 4883–4892.
     attention network for session-based recommenda-          [38] J. Gao, Y. Lin, Y. Wang, X. Wang, Z. Yang, Y. He,
     tion, in: IJCAI, ijcai.org, 2019, pp. 3940–3946.              X. Chu, Set-sequence-graph: A multi-view ap-
[24] Y. Wu, J. Yang, Dual sequential recommendation                proach towards exploiting reviews for recommen-
     integrating high-order collaborative relations via            dation, in: CIKM, ACM, 2020, pp. 395–404.
     graph attention networks, in: IJCNN, IEEE, 2021,         [39] L. Shi, W. Wu, W. Hu, J. Zhou, J. Chen, W. Zheng,
     pp. 1–8.                                                      L. He, Dualgcn: An aspect-aware dual graph con-
[25] X. Wang, X. He, Y. Cao, M. Liu, T. Chua, KGAT:                volutional network for review-based recommender,
     knowledge graph attention network for recommen-               Knowl. Based Syst. 242 (2022) 108359.
     dation, in: KDD, ACM, 2019, pp. 950–958.                 [40] R. He, J. J. McAuley, VBPR: visual bayesian person-
[26] Z. Tao, Y. Wei, X. Wang, X. He, X. Huang, T. Chua,            alized ranking from implicit feedback, in: AAAI,
     MGAT: multimodal graph attention network for                  AAAI Press, 2016, pp. 144–150.
     recommendation, Inf. Process. Manag. 57 (2020)           [41] Y. Deldjoo, T. D. Noia, D. Malitesta, F. A. Merra,
     102277.                                                       Leveraging content-style item representation for
[27] J. Ma, P. Cui, K. Kuang, X. Wang, W. Zhu, Disen-              visual recommendation, in: ECIR (2), volume 13186
     tangled graph convolutional networks, in: ICML,               of Lecture Notes in Computer Science, Springer, 2022,
     volume 97 of Proceedings of Machine Learning Re-              pp. 84–92.
     search, PMLR, 2019, pp. 4212–4221.                       [42] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-
[28] J. Wu, W. Shi, X. Cao, J. Chen, W. Lei, F. Zhang,             Thieme, BPR: bayesian personalized ranking from
     W. Wu, X. He, Disenkgat: Knowledge graph embed-               implicit feedback, in: UAI, AUAI Press, 2009, pp.
     ding with disentangled graph attention network,               452–461.
     in: CIKM, ACM, 2021, pp. 2140–2149.                      [43] X. Chen, H. Chen, H. Xu, Y. Zhang, Y. Cao, Z. Qin,
[29] H. Wang, N. Wang, D. Yeung, Collaborative deep                H. Zha, Personalized fashion recommendation with
     learning for recommender systems, in: KDD, ACM,               visual explanations based on multimodal attention
     2015, pp. 1235–1244.                                          network: Towards visually explainable recommen-
[30] A. Almahairi, K. Kastner, K. Cho, A. C. Courville,            dation, in: SIGIR, ACM, 2019, pp. 765–774.
     Learning distributed representations from reviews        [44] Z. Wang, W. Ye, X. Chen, W. Zhang, Z. Wang, L. Zou,
     for collaborative filtering, in: RecSys, ACM, 2015,           W. Liu, Generative session-based recommendation,
     pp. 147–154.                                                  in: WWW, ACM, 2022, pp. 2227–2235.
[31] D. H. Kim, C. Park, J. Oh, S. Lee, H. Yu, Convolu-       [45] J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algo-
     tional matrix factorization for document context-             rithms for hyper-parameter optimization, in: NIPS,
     aware recommendation, in: RecSys, ACM, 2016, pp.              2011, pp. 2546–2554.
     233–240.                                                 [46] V. W. Anelli, A. Bellogín, A. Ferrara, D. Malitesta,
[32] L. Zheng, V. Noroozi, P. S. Yu, Joint deep modeling           F. A. Merra, C. Pomo, F. M. Donini, T. D. Noia, El-
     of users and items using reviews for recommenda-              liot: A comprehensive and rigorous framework for
     tion, in: WSDM, ACM, 2017, pp. 425–434.                       reproducible recommender systems evaluation, in:
[33] H. Liu, Y. Wang, Q. Peng, F. Wu, L. Gan, L. Pan,              SIGIR, ACM, 2021, pp. 2405–2414.
     P. Jiao, Hybrid neural recommendation with joint         [47] S. Vargas, Novelty and diversity enhancement and
     deep representation learning of ratings and reviews,          evaluation in recommender systems and informa-
     Neurocomputing 374 (2020) 77–85.                              tion retrieval, in: SIGIR, ACM, 2014, p. 1281.
[34] Y. Lu, R. Dong, B. Smyth, Coevolutionary recom-          [48] S. Vargas, P. Castells, Rank and relevance in novelty
     mendation model: Mutual learning between ratings              and diversity metrics for recommender systems, in:
     and reviews, in: WWW, ACM, 2018, pp. 773–782.                 RecSys, ACM, 2011, pp. 109–116.
[35] H. Liu, F. Wu, W. Wang, X. Wang, P. Jiao, C. Wu,
     X. Xie, NRPA: neural recommendation with person-
     alized attention, in: SIGIR, ACM, 2019, pp. 1233–
     1236.