1. Introduction

Reshaping Graph Recommendation with Edge Graph Collaborative Filtering and Customer Reviews

Vito Walter Anelli

Yashar Deldjoo

Tommaso Di Noia

Eugenio Di Sciascio

Antonio Ferrara

Daniele Malitesta

Claudio Pomo

0 0 Politecnico di Bari , via Orabona, 4, 70126 Bari , Italy

Graph collaborative filtering approaches learn refined users' and items' node representations by iteratively aggregating the informative content (called messages) coming from neighbor nodes into each ego node. Unfortunately, not all interactions (i.e., graph edges) may be equally important to the users and items involved. As this indiscriminate message aggregation leads to multi-hop representation errors, recent strategies have used attention mechanisms to weight neighbors' importance to the ego node. Despite their success, such solutions seem to disregard the potentially critical impact users' reviews may play on this weighting process. Reviews convey the multi-faceted user's opinion about items and provide a fundamental tool to group like-minded customers. In this work, we first formally show the causes of node error representation in graph collaborative ifltering and demonstrate how existing neighborhood weighting procedures (e.g., attention mechanisms) may alleviate the issue at the expense of limited hop exploration. Second, we correct the representation error through an additional graph network where we enrich graph edge embeddings through opinion-aware review embeddings to smooth each neighbor node's importance on its ego node. We call our solution Edge Graph Collaborative Filtering (EGCF). Extensive experiments on three e-commerce datasets show that EGCF competes successfully with traditional, graph- and review-based approaches on accuracy and beyond-accuracy objectives, while a study on the number of explored hops justifies the adopted configuration for EGCF. Code and datasets are available at: https://github.com/sisinflab/Edge-Graph-Collaborative-Filtering.

eol>Collaborative Filtering Recommendation Graph Convolutional Networks Reviews

1. Introduction

"Very comfortable. They also wear well for an active lifestyle. Love them." Recommender systems constitute the backbone of several ! online platforms (e.g., Amazon), ofering consumers lists of products that might meet their needs and tastes. Rec- "Nothing really wrong ommendation algorithms are traditionally designed and awnitdhtthhicekbeerltthjuasntIwliikdee.r " # trained to find preference patterns in user-item recorded Good quality." "aGnrdehaotlbdeinltg, nuipcevecroylor interactions. Optionally, this learning process may be well" enriched through additional informative data constantly Figure 1: A subset of users, items, and reviews users wrote updated on those platforms, which may captivate cus- about items, along with the expressed ratings (in the range tomer’s attention towards items’ characteristics (e.g., 1-5). Despite being connected to the same items, users 1product images) or provide a tool to share opinions about 2, and users 1-3 do not share similar opinions about the purchased items to guide other customers during their interacted items. decision-making process (e.g., reviews).

Collaborative filtering (CF) [ 1 ], one of the most prominent recommendation paradigms in recent years, pro- combines these embeddings linearly (e.g., inner prodmotes the intuition of similar users interacting with sim- uct [ 2 ]) or non-linearly (e.g., neural networks [ 3 ] and ilar items. CF-based models usually map users and items probabilistic models [ 4 ]). While focusing on improving to embeddings in the latent space, and learn to predict the user-item prediction step, such techniques have long user interactions by optimizing an objective function that underestimated the importance of deriving informative features to describe users and items suitably.

Recently, graph convolutional networks (GCNs) [ 5 ] DL4SR’22: Workshop on Deep Learning for Search and Recommendation, co-located with the 31st ACM International Conference on Information and Knowledge Management (CIKM), October 17-21, 2022, have taken over CF-based recommendation, thanks to Atlanta, USA their capability of mining user-item high-order relation* Authors are listed in alphabetical order. Corresponding authors: ships. Unlike prior techniques, these models explicitly Daniele Malitesta and Claudio Pomo. $ daniele.malitesta@poliba.it (D. Malitesta); incorporate user and item relationships into their embedclaudio.pomo@poliba.it (C. Pomo) ding representations. Concretely, the embedding of each © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License node (defined as ego node) is refined by aggregating its CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) !

" "They were too narrow and hurt my feet so I returned them." neighbors’ node embeddings (i.e., whose contribution is novel and diverse items from the catalog. In this work, called messages). This step is repeated iteratively to propa- we first formally define the problem of nodes’ represengate the collaborative signal over multiple hops. These tation error in graph collaborative filtering. After that, models are becoming the de facto standard in personal- we show how existing weighting techniques (such as ized recommendation, reaching remarkable recommen- attention mechanisms) may alleviate the described issue dation performance as in the pioneer works presented at the expense of limiting the hop exploration depth to in [ 6, 7 ], and more recently, in the solutions [ 8, 9, 10 ]. reduce the efect of oversmoothing. Thus, to address

The message-passing pattern, by design, may still such drawback, we propose a lighter-weighting procepresent some limitations despite being successful. An dure that exploits the informative content extracted from argument could be made that not all user-item interac- reviews (i.e., opinions and comments about interacted tions (i.e., graph edges) have the same relative impor- items) to enhance graph edge representation. Such edgetance. To clarify this, consider the motivating scenario enriched features are eventually used to derive the simin Figure 1, where we depict a subset of users and items ilarity between the ego node and its neighbors, which from a real-world e-commerce platform (i.e., the Amazon we re-interpret as the importance of the neighbor node catalog) and enrich their interactions with ratings and on the ego node. Our proposed weighting procedure is reviews. Both user 1 and 2 interacted with item 1, applied to a GCN acting as the correction to another trathus inferring that they might share similar interests and ditional (but error-afected) GCN. We call our solution preferences. However, careful analysis of the correspond- Edge Graph Collaborative Filtering (EGCF). ing reviews reveals that their opinions about item 1 After formalizing the theoretical basis for EGCF and its are opposite (the expressed ratings are 5 and 2, respec- rationale, we assess its eficacy on three popular product tively). Following a similar reasoning schema, users 1 categories from the Amazon catalog [ 18 ]. Given their and 3 have both interacted with item 2 but their com- similar intuitions and rationale to EGCF, we compare the ments, while being generally similar (the item is rated 3 method with four families of CF-based recommendation, and 5, respectively), show slight shades of disagreement i.e., traditional, review-based [ 19, 20 ], and graph-based (i.e., 1 is not completely satisfied with the belt size). As approaches (both leveraging attention mechanisms and the message-passing pattern works by indiscriminately not). We seek to answer these research questions about aggregating the neighbor nodes at multiple hops, the our proposed approach: node representation of 1 is ultimately influenced by the • RQ1. Can the correction to the node error reprepresentations of both 2 and 3 after two propagation resentation help EGCF produce more accurate hops. In the long term, such behavior may lead to what recommendations than state-of-the-art baselines? we could define as a node representation error.

Weighting the importance of neighborhood while ag- • RQ2. Considering the high impact that novel and gregating the incoming messages into the ego node diverse recommendation lists may have on both is among the prominent solutions to the abovemen- users and companies, how efective is EGCF when tioned issue. Following the same direction path in [ 11 ], evaluated on beyond-accuracy metrics, given its other popular and recent works in recommendation strategy for neighborhood exploration? such as [ 12, 13, 14, 15 ] leverage attention mechanisms • RQ3. What is the efect of changing the hop (i.e., a neural network) to perform the weighting proce- exploration number on recommendation perfordure. Even if these models have widely demonstrated mance, and how can we justify such behaviors to provide superior accuracy recommendation perfor- for the adopted architecture? mance, they are still afected by oversmoothing, the phenomenon according to which node embedded rep- The extensive experimental evaluation shows that the resentations tend to get closer and closer in the latent correction to the node representation error and the posspace after multiple propagation hops, thus flattening sibility of propagating messages across multiple hops the existing diferences in the neighborhood [ 16, 17 ]. For permits EGCF to outperform state-of-the-art baselines this reason, attention-based approaches usually propa- on accuracy and beyond-accuracy metrics. Finally, the gate messages for only one or two hops, but this does study on the hop propagation number proves the soundnot help access wider portions of the user-item graph. ness of our proposed architectural configuration while

In this respect, we believe attention-based techniques shading interesting direction paths for future work. generally disregard other potential sources of information (e.g., users’ generated reviews) whose contribution 2. Related Work may positively impact the neighborhood weighting process. Opinions and comments about interacted items Graph-based recommendation. The approach proconstitute the basis on which like-minded users gather posed in [21] is the first attempt to address the recomon online platforms, as they promote the discovery of mendation task through a graph-based architecture. The authors implement a graph autoencoder that labels its representing them through the extracted embeddings. edges with users’ ratings to perform link prediction. Ying Review-based recommendation. Reviews convey a et al. [ 6 ] design a graph convolutional network for a web- rich source of information to access users’ multi-faceted scale recommendation to produce high-quality image opinions about interacted items. For this reason, sevrecommendations for the Pinterest platform, eficiently eral existing works propose to extract valuable knowlexploiting random walk and item’s multimodal side in- edge from them to produce better-tailored recommenformation. Wang et al. [ 7 ] present neural graph collabo- dations [ 19, 20 ]. Among the pioneer works, Wang et al. rative filtering (NGCF), whose propagation layer aggre- [29] adopt a stacked denoising autoencoder to approxigates the messages from the neighborhood considering mate the user-item rating matrix starting from textual rethe similarity between each neighbor node and its ego views, Almahairi et al. [30] introduce two neural networknode. While providing higher performance to previous based approaches built upon bag-of-words and recurrent state-of-the-art solutions, NGCF (and GCN more gener- neural networks, and Kim et al. [31] present convolually) show limitations later addressed by He et al. [ 8 ]. tional matrix factorization (ConvMF), where a convoluTheir idea is to lighten GCN’s traditional layer structure tional neural network is merged with probabilistic matrix and reach superior accuracy performance by removing factorization to learn the context of review documents. non-linearities and node embedding transformation in Reviews are textual documents composed of words, the propagation layer (LightGCN). The latest approaches which may further be grouped into sentences. To exploit try to take a step further to the LightGCN strategy by such hierarchical structure, Zheng et al. [32] design a allowing theoretically unlimited propagation layers [ 9 ] convolutional neural network on top of a factorization and revisiting the concept of graph convolution for rec- machine prediction model to extract from review’s words ommendation and node embedding smoothness under a unique embedded representation for users and items. the lens of graph signal processing [ 10 ]. The adoption of attention mechanisms may help refine

While aggregating messages from neighbor nodes into each review component’s importance on the recommenthe ego node, not all received contributions have the dation profile of users and items. In this respect, Liu same importance. The pioneering work by Velickovic et al. [33] improve the previous approach by weighting et al. [ 11 ], called graph attention network (GAT), takes ad- the importance of convolutionally-embedded reviews for vantage of attention mechanisms to weight the diferent both users and items for the sake of explanation. Simiinfluences of neighbor nodes on the ego node. Inspired larly, Lu et al. [34] learn users’ and items’ attention feaby this rationale, several recent works in recommenda- tures by exploring diferent review components such as tion seek to assess the relative importance of interacted words, sentences, and topics via a GRU-based network, items on users involved in those interactions. In the last while Liu et al. [33] (based upon the solution described few years, recommendation tasks such as session-based in [35]) augment users’ and items’ collaborative latent recommendation [ 22, 23, 12 ] and sequential recommen- factors through features extracted from their generated dation [ 13, 24 ] have been widely addressed by using at- ratings and reviews. Wang et al. [36] leverage common tention mechanisms on graphs. Attention mechanisms review properties (e.g., how helpful the reviews were for may also be beneficial when the informative content con- other users) to assess its importance on users and items. veyed by the bipartite user-item graph is enhanced by Only recently, very few works have injected the inforadditional side information, like knowledge graphs [25], mative content of reviews into graph-based networks for heterogeneous information networks [ 14 ], or multimodal recommendation. Wu et al. [37] propose a model named items’ content [26]. Exploiting attention to disentangle reviews meet graphs (RMG), a multi-view framework the aspects underlying node interactions may represent that learns users’ and items’ representation by consida fundamental step toward explainability [27]. Follow- ering the word- and sentence-level of reviews and exing this direction, the work by Wang et al. [ 15 ] named ploring two hops of the user-item graphs to access also disentangled graph collaborative filtering (DGCF), and user-user and item-item relations. Gao et al. [38] present the method presented in Wu et al. [28], propose to disen- a three-structured architecture that catches the shorttangle user-item connections into possible user intents. and long-term user preferences and item features, along

State-of-the-art attention-based approaches provide with the collaborative information encoded in the biparan eficient neighborhood weighting strategy. However, tite user-item graph. Shi et al. [39] introduce a dual GCN their multi-hop exploration is usually limited to prevent model, where one extracts and propagates review aspects, nodes in the neighborhood from getting too much similar and the other reuses the aspect for the graph. in the latent space (see Section 3.2). Conversely, EGCF Despite addressing recommendation through diferleverages additional information (i.e., reviews) whose ex- ent strategies, the presented algorithms generally work tracted opinion-aware features do not flatten diferences by grouping reviews on both users and items profiles among nodes while easing the weighting process. More- but, in fact, limiting the exploration of users and items over, in contrast to prior works, EGCF enriches edges by neighbors at one hop (i.e., the nearest neighborhood). Conversely, our proposed approach exploits reviews as 3.2. A limitation in the message-passing e(2) = e(2) = ︁({︁ e(1), ∀ ∈ ()}︁)︁

︁({︁ e(1), ∀ ∈ ()}︁)︁

The user formulation in Equation (2) can be expanded through Equation (1):

e(2) = (︀{ (︀{ e′ , ∀′ ∈ () ∖ {}}︀) , ∀ ∈ ()}︀)

What emerges is that, by propagating messages at two

hops, the node embedding of user is eventually refined through the contributions from other users who interacted with the same items as . In other words, after two hops, each user profile is influenced by the profiles of other users who rated the same items.

Indeed, this assumption is aligned with the rationale behind collaborative filtering, i.e., similar users are likely to interact with the same items. However, not all useritem interactions (i.e., graph edges) may be equally important to the users and items involved. Thus, indiscriminately aggregating neighbor node embeddings into the ego node could, after multiple hops, harm the node updating process by bringing all contributions from the neighborhood, even the noisy ones. We interpret this as a node representation error, propagating with the exploration hops in the graph.

For this reason, contributions coming from each neighbor node are usually weighted before aggregating them into the ego nodes, modifying the presented formula: e(2) = (︁ (→2){︁(︁{︁

(1′)→e′ , where (→) stands for the importance that the neighbor node has on the ego node after hops. These weights are generally calculated by means of attention mechanisms, and depend on the embeddings of the neighbor and the ego nodes they refer to, e.g., (→) = e(− 1), e(− 1))︁ , where (· , · ) is a neural network: ︁( e(2) = (︁ ⏞ ︁( (□ )

⏟ edge side information to describe user-item interactions and propagate their informative content at multiple hops to overcome theoretical issues in the way graph-based recommender systems are usually designed (see later).

3. Methodology The section presents and motivates our proposed method,

Edge Graph Collaborative Filtering (EGCF). We first introduce some notation and preliminaries to graph models for collaborative filtering. Then, we highlight a potentially critical issue in the message-passing schema. Even if weighting the importance of each neighbor node may alleviate the problem, we discuss the insights and propose an enhanced application of the importance weighting.

3.1. Notation and preliminaries

Let = {1, 2, . . . , } and ℐ = {1, 2, . . . , } be the sets of users and items in the system, respectively. Then, let us consider a bipartite and undirected user-item graph that connects pairs of nodes when there exists a recorded interaction among them. User and item nodes are represented through embeddings in the latent space, i.e., e ∈ R, ∀ ∈ and e ∈ R, ∀ ∈ ℐ

Inspired by popular approaches [ 5 ], current graphbased recommender systems refine users’ and items’ node embeddings by exploring their multi-hop interconnections represented in the graph. Let and be the nodes for a user and an item to be updated (i.e., the ego nodes), and let () and () be the sets of nodes at one hop from and , respectively (i.e., their neighborhood). The ego node embeddings e and e are updated by aggregating their neighborhoods (i.e., messages): e(1) = ({e, ∀ ∈ ()}) e(1) = ({e, ∀ ∈ ()})

(1) where e(1) and e(1) are the refined embedding versions of user and item after one hop, while (· ) indicates the aggregation function. This message-passing pattern may be iterated times, thus exploring wider and wider neighborhoods of the ego nodes. After two hops, the refined embeddings of user and item are: that is, e(2) depends on (□ ) the importance each neighbor item node has on the ego user node after one hop, and (△) the importance all users interacting with the same items as have on their items. Note that (□ ) may be further expanded: (2) ︁( e(1), e(1))︁ = (︁ ︁({︁

(1′)→e′ , ∀′ ∈ () ∖ {}}︁)︁ , ︁({︁

(′1→)e′ , ∀′ ∈ () ∖ {}}︁)︁ = (︁

︁({︁ (e′ , e)e′ , ∀′ ∈ () ∖ {}}︁)︁ , ︁({︁ (e′ , e)e′ , ∀′ ∈ () ∖ {}}︁)︁ gation hops, to calculate to what extent each user pro- the review of user about item through the pretrained leads node embeddings, after multiple propagation hops, embedding and user/item ones. to become closer and closer in their representation in the latent space, eventually flattening their existing differences. As this behavior would profoundly weaken models’ performance, exploration of the neighborhood

Then, we propose to enhance the neighborhood weighting procedure at hop by conditioning the importance weights also on the projected embedding of the review connecting user and item . For instance, the generally tends to be constrained to very few hops (e.g., importance of the neighbor item node on the ego user a maximum of two hops in attention-based weighting). node after hops is calculated as: mapped to word embeddings, which are injected into an opinion-based model pretrained to predict the rating expressed by the user through specific terms in the review.

While the output model carries the single information (7) about the predicted review score, the activation of a hid

Let r ∈ R den layer would unveil a richer source of textual features (i.e., an embedding) which drove the opinion-based model to predict that score. High-level features extracted from pretrained deep learning models can boost the recommendation performance of recommender systems leveraging items’ side information (e.g., visual-based recommender systems [40, 41]). We deem these textual features to deserve a pivotal role in this weighting process.

be the textual embedding extracted from layer neural network: opinion-based model. First, we project r ∈ R same latent space as e ∈ R and e ∈ R with a one to the p = LeakyReLU (Wr + b) (8) where p ∈ while W

is the projected review embedding, ∈ R× and b ∈ R are the projection matrix and the bias, respectively. We seek to retain only those textual features of review r which can be significant to later calculate the interdependence between this same items as on those items, and (△) the importance of all items interacted by on user . In other words, weighting the importance of each neighbor node on the ego node before the aggregation allows, after two propaifle is influenced by the profiles of the other users who rated the same items. Without loss of generality, a similar consideration could be made after a number of hops greater than two.

3.3. Enhancing neighborhood weighting through reviews As known, graph-based models in machine learning are afected by oversmoothing [ 16, 17]. This phenomenon

However, in recommendation scenarios, limiting the exploration of the user-item bipartite graph may represent an inconsistency to the idea of collaborative filtering , where users are connected to share preferences and tastes for similar items.

Under this assumption, we believe the neighborhood weighting process could be further enhanced by exploiting other sources of information that are not usually taken into account. In the majority of popular online platforms for e-commerce (e.g., Amazon), reviews are fundamental tools to share opinions and comments about interacted items, as they convey the multi-faceted aspects that drove a user to interact with an item. Leveraging such side information on the connections existing among users and items in the bipartite graph (i.e., graph edges) can improve the learning of the importance weights by reducing the oversmoothing efect because each user/item node embedding is conditioned on the opinion conveyed by the review. that compose the review written by user about item . After an initial tokenization step, the sets of tokens for is defined as = {1, 2, . . . , }. Tokens are (→) = ︁( e(− 1), e(− 1), p

︁) (9) Note that, since p cannot increase the impact of the oversmoothing efect (because it is not dependent on the hop ), its usage in the importance weight formula becomes even more beneficial . Let us focus on the weighting function (· , · , · ). Many approaches from the literature propose to leverage attention mechanisms, usually implemented as a neural network trained in the downstream task to predict the importance of the neighbor node on the ego node. In our solution, we opt for a simplified and lightweight formulation that seeks to calculate the similarity between the neighbor and the ego nodes, conditioned on the opinion embedding of the review connecting them. Specifically: (→) = cos ︁( e(− 1) view opinion embedding provides the interplay between each node feature and the opinion features, thus producing a modified version of the node representation that conveys a richer source of information. No trainable projection weight is learned in the presented formulation since the contribution of the review embedding is meaningful enough.

3.4. A double message-passing schema

The proposed neighborhood weighting procedure can help correct the representation error generated in the traditional message-passing schema. However, the idea is not to completely replace it, as several recent works from the literature have demonstrated its eficacy, especially in producing accurate recommendations [ 8 ]. The proposed approach involves a double message-passing schema, where two graph models are trained to refine their own user/item node representations. While the first one aggregates the contributions coming from the neighbor nodes into the ego nodes by weighting the neighborhood importance on the ego node statically, the second one aggregates the neighborhood’s messages which are also weighted through the opinion embeddings from reviews.

We define the two graph convolutional networks as

GCN (error-afected ) and GCN (correction) and assign the node embeddings e* to GCN, and the node embeddings c* to GCN. As for the aggregation function, in both cases, we sum the weighted messages coming from the neighbor nodes. As such, the update of the user node embedding after hops is calculated as: ∑︁ →e(− 1) = ∑︁ → (→) c(− 1) =

∑︁ ∈ () e(− 1) √︀ | ()|√︀| ()| e() = c() = = ∈ () ∈ ()

∑︁ ∈ () cos (︁ e(− 1) Note that → is static and only depends on the topology of the bipartite graph, while (→) varies along with the exploration hop and depends on the embeddings of After propagation hops, the final embedding representation is obtained as: e = ∑︁ c = ∑︁ =0 =0 1 1 1 + e , e = ∑︁

() 1 + c , c = ∑︁ () =0 =0 1 1 1 + 1 + e() c() c(− 1) (11) (12) (&) (&) () (&) ! (&) (&) (a) +3 +2 +1 !→# (&) () Opinion-based model " (b) +3 +2 +1 (&) (&) () (&) (%)!→# (&) (&) (&) () for EGCF. A statically-weighted GCN network afected by node representation error (a) is corrected through another GCN network (b), where an opinion-based embedding is extracted from each review as edge side information to weight the importance of the neighbor nodes on their ego nodes.

4. Experiments and Discussion 4.1. Experimental Setup Amazon’s Baby, Boys & Girls, and Men categories [18]

which contain historical user-item interactions and reviews. We retain only interactions with non-empty reviews, then keep the 20k and 10k most popular items for

Baby and Boys & Girls/Men, respectively. Finally, we ap

ply the 5- and 15-core on items and users on Baby/Boys & Girls and Men, respectively. Statistics are in Table 1.

Baselines. We compare our approach with eight state-of

ego/neighbor nodes, and the opinion review embedding. Datasets. We use three popular [43, 44] datasets from where we apply the scaling factor 1/(1+) to further alle- the-art models spanning several families: (i) traditional viate the oversmoothing problem. A schematic overview

CF (BPRMF [42] and MultiVAE [4]); (ii) review-based

CF (ConvMF [31] and RMG [37]); (iii) graph-based CF Table 2 (NGCF [ 7 ] and LightGCN [ 8 ]); (iv) graph-based CF with Accuracy metrics, i.e., , , and , for top-10 attention (GAT [ 11 ] and DGCF [ 15 ]). lists. Best value is in bold, while second-to-best is underlined. Reproducibility. We adopt the temporal leave-one-out Models Baby Boys & Girls Men to split the datasets, where the last two recorded inter- faphoceyrtrpisEoe,Gnra-sCnpadaFrr,fixeawmtienheceetlexubrtadsrtaewccdhtiitrsnheizvt[ehi4ee5two]va2aen5lmid6dbaafeotndilodldoniewnpagnotshcdthehtsebrostaotus.eg4Wlh0in0eae.tspAuopnspae-- LRBCNMMiMPoGgounRhslCGtvMttiFPGMVFoACFpEN 0000000.......1101111729342863471798007122 0000000.......1000001265799364281162705612 0000000.......1100011548690095028590507092 0000000.......2111212211805324920104651625 0000000.......1011011786460543944623156753 0000000.......2011011099677357967824206635 0000000.......2201011108670929360664182279 0000000.......1101001665357405219265644071 0000000.......1010101855678888272692472278 ular pre-trained model1. Datasets and codes are publicly DGGATCF 00..11589754 00..11035512 00..11253538 00..22204699 00..11751763 00..12804263 00..12609750 00..11525544 00..11487263 available2. All models are implemented in Elliot [46]. *EsGtaCtiFstically si0g.n1i9fi4c4an*t dife0r.e1n4c0e2s*(p-va0l.u1e623≤* 0.005.)2.325 0.1792* 0.2089* 0.2195* 0.1703* 0.1988* Evaluation protocol. We measure the model accuracy by adopting the recall (@), the normalized discounted cumulative gain (@), and the mean av- to weight the importance of neighbor nodes is rewarded erage recall ( @) [ 8, 15 ]. Additionally, consider- in Baby and Boys & Girls, where GAT always outpering the influence of novel and diverse recommendation forms NGCF, reaching remarkable results such as the lists [47, 48] on both user’s and business’s interests, we on Baby (i.e., 0.1595 vs. 0.1411) and the on also assess beyond-accuracy metrics such as the expected Boys & Girls (i.e., 0.1846 vs. 0.1783). Disentangling users’ popularity complement ( @) and the expected free intents on interacted items (i.e., DGCF) produces even discovery ( @), along with indices measuring con- more accurate recommendations to NGCF on all datasets. centration and coverage, i.e., the 1’s complement of the Nevertheless, LightGCN always performs better than Gini (@), the Shannon entropy (@), and the DGCF apart from very few cases (i.e., and item coverage (@). Specifically, the @ and on Men), even though DGCF’s calculated accuracy values the @ refer to long-tail items and stand for the ex- do not substantially difer from LightGCN’s ones (e.g., see pected number of recommended unknown items which the on Baby). Noticeably, the proposed model (i.e., are also relevant, and the expected number of recom- EGCF) outperforms the other baselines under all settings mended known items which are also relevant, respec- and datasets, with near 100% statistical hypothesis tests tively. Furthermore, the @ and the @ are (i.e., paired t-test) showing that the results significantly used to assess items’ distributional inequality, i.e., how difer. This finding further motivates the goodness of unequally a recommender system shows diferent items the solution. While we observe a substantial accuracy to users, and the @ quantifies the number of items improvement in traditional and review-based approaches that the model recommends. For all metrics, higher val- (e.g., +12% to MultiVAE for the on Boys & Girls ues mean better performance. We leave the assessment and +53% to RMG for the on Baby), introducing of complexity measures for the proposed model in future an additional GCN-like network guided by users’ reviews extensions of the work. is even more beneficial to correct the representation error observable in unweighted graph approaches. Particularly, 4.2. Results and Discussion results show that such correction may lead to small accuracy improvements in some cases (e.g., see the on Recommendation accuracy (RQ1). Table 2 reports Boys & Girls when correcting LightGCN) but also larger the results for accuracy measures on the top-10 recom- ones in other cases (e.g., see the on Men when mendation lists. Surprisingly, the sole introduction of correcting LightGCN). Such outcomes suggest that while reviews does not seem to produce a consistent accuracy keeping the error-affected contribution in the fi nal predicboost. For instance, the strongest review-based method tion formula is useful to preserve the superior performance (i.e., RMG) surpasses BPRMF only for the and of graph-based models to traditional and review-based apthe on Baby (i.e., 0.0911 vs. 0.0785 and 0.1059 proaches, the introduced correction term is useful to gain vs. 0.0980, respectively). Contrarily, adopting a graph even more accurate preference predictions than unweighted model can increase the accuracy to traditional CF. When graph architectures. comparing LightGCN with MultiVAE, which obtain the Recommendation novelty and diversity (RQ2). We best performance in their respective recommendation also assess how novel and diverse recommendation lists families, we observe that the former improves, on Baby, are. The two novelty metrics in Table 3 (i.e., the @ the of 7% and the of 9%. However, the and the @, left side) are discussed with concentraobserved diference even reverts on Men for the tion and coverage indices (i.e., the @, the @, and the . The application of attention mechanisms and the @, right side) as in an ideal recommender system, a loosely concentrated and large set of recom

1Please refer to our GitHub repository. 2https://github.com/sisinflab/Edge-Graph-Collaborative-Filtering.

mended items should equally span diferent ranges of without retaining less popular items from the long-tail popularity. As previously observed, EGCF is again the (observing the same models, +3% for the on Baby best or second-to-best technique. While NGCF is not and +6% for the on Boys & Girls). Such outcomes as capable as LightGCN of proposing long-tail items on demonstrate that the content enrichment brought by the Boys & Girls (e.g., 0.2510 vs. 0.3012 for the ), the extracted review features (injected into the representation former surpasses the latter for the concentration indices error correction) allows to explore user-item interactions at on the same dataset (e.g., 10.5595 vs. 10.1586 for the ). multiple hops, leading to more heterogeneous recommenSince NGCF adopts an ego-neighbor interaction compo- dation lists which also include items from the long-tail. nent, the concentration of explored and recommended Efect of hop exploration number (RQ3). Figure 3 near items gets loose. Moreover, neighborhood weight- displays, for EGCF, the @ and @ perforing leads to recommend items from the long tail (e.g., mance variation on top-10 recommendation lists when comparing GAT with NGCF, we observe a +17% for the exploring a number of hops in the range 1-4, where even on Baby). However, such a finding is not consis- numbers stand for same node type connections (e.g., usertent with the trend recognized for the concentration and user), while odd numbers refer to opposite node type concoverage indices (e.g., when comparing LightGCN with nections (i.e., user-item). As evident from the histograms DGCF, we notice 0.1304 vs. 0.2051 for the on Men), of Baby and Boys & Girls, the @ consistently as the neighborhood weighting procedure comes at the increases from 1 to 4 hops (this is why we adopt four expense of a limited hop exploration, not allowing such hop explorations for EGCF on those datasets). The same models to explore wider catalog portions. Conversely, trend is not observable for Men, where two explored hops injecting user-generated reviews brings new informative seem to provide the highest accuracy boost, motivating content (e.g., RMG recommends a broader and less con- the adoption of 2 hop explorations for EGCF on the same centrated range of items from the catalog than DGCF on dataset. Such behavior could be due to the average numthe Baby dataset). Finally, weighting the neighborhood ber of users’ interacted items in Men (approximately 19, importance and exploring long-distant user-item inter- see Table 1). The node refining probably does not reactions through reviews-enriched content (i.e., EGCF) quire a broad exploration of its neighborhood. As for the allows to retrieve larger portions of heterogeneous items @, the Baby and the Men datasets seem to agree (e.g., EGCF outperforms LightGCN for the by +63% on two exploration hops to produce the most diverse on Baby and DGCF for the by +7% on Boys & Girls), item lists of recommendations because they leverage (as previously recalled) user-user and item-item interconnections (and similarities). The trend is also aligned with the Boys & Girls dataset, where user-user and item-item links are exploited even at a higher depth (i.e., four exploration hops). The emerged insights shed light on two main contributions: (i) with the modified neighborhood weighting process, which makes use of reviews to enhance the informative content carried by user-item interactions, EGCF is less limited in the hop exploration, thus providing more accurate recommendations, and (ii) user-user and item-item connections are the keystones on which building more diverse item recommendation lists.

5. Conclusion and Future Work

This work proposes Edge Graph Collaborative Filtering (EGCF), which incorporates users’ opinions extracted from reviews into the edges of a GCN to weight the neighborhood importance on the ego node. Extensive experimental evaluation shows that EGCF outperforms traditional, review- and graph-based models. The work complements with an analysis of beyond-accuracy performance and an extensive study on the number of layers. Leveraging the importance of graph edges through node-node side information (e.g., users’ reviews) opens to future directions, namely: (i) study the impact of this re-weighting by making it a hyper-parameter, and (ii) analyze the possible application of the proposed technique to diferent tasks other than recommendation.

Acknowledgments The authors acknowledge partial support from the projects PASSEPARTOUT, ServiziLocali2.0, Smart Rights Management Platform, BIO-D, and ERP4.0.

[21] R. van den Berg, T. N. Kipf, M. Welling, Graph con- [36] X. Wang, I. Ounis, C. Macdonald, Leveraging revolutional matrix completion, CoRR abs/1706.02263 view properties for efective recommendation, in: (2017). WWW, ACM / IW3C2, 2021, pp. 2209–2219. [22] W. Song, Z. Xiao, Y. Wang, L. Charlin, M. Zhang, [37] C. Wu, F. Wu, T. Qi, S. Ge, Y. Huang, X. Xie, ReJ. Tang, Session-based social recommendation via views meet graphs: Enhancing user and item repredynamic graph attention networks, in: WSDM, sentations for recommendation with hierarchical ACM, 2019, pp. 555–563. attentive graph neural network, in: EMNLP/IJC[23] C. Xu, P. Zhao, Y. Liu, V. S. Sheng, J. Xu, F. Zhuang, NLP (1), Association for Computational Linguistics, J. Fang, X. Zhou, Graph contextualized self- 2019, pp. 4883–4892. attention network for session-based recommenda- [38] J. Gao, Y. Lin, Y. Wang, X. Wang, Z. Yang, Y. He, tion, in: IJCAI, ijcai.org, 2019, pp. 3940–3946. X. Chu, Set-sequence-graph: A multi-view ap[24] Y. Wu, J. Yang, Dual sequential recommendation proach towards exploiting reviews for recommenintegrating high-order collaborative relations via dation, in: CIKM, ACM, 2020, pp. 395–404. graph attention networks, in: IJCNN, IEEE, 2021, [39] L. Shi, W. Wu, W. Hu, J. Zhou, J. Chen, W. Zheng, pp. 1–8. L. He, Dualgcn: An aspect-aware dual graph con[25] X. Wang, X. He, Y. Cao, M. Liu, T. Chua, KGAT: volutional network for review-based recommender, knowledge graph attention network for recommen- Knowl. Based Syst. 242 (2022) 108359. dation, in: KDD, ACM, 2019, pp. 950–958. [40] R. He, J. J. McAuley, VBPR: visual bayesian person[26] Z. Tao, Y. Wei, X. Wang, X. He, X. Huang, T. Chua, alized ranking from implicit feedback, in: AAAI, MGAT: multimodal graph attention network for AAAI Press, 2016, pp. 144–150. recommendation, Inf. Process. Manag. 57 (2020) [41] Y. Deldjoo, T. D. Noia, D. Malitesta, F. A. Merra, 102277. Leveraging content-style item representation for [27] J. Ma, P. Cui, K. Kuang, X. Wang, W. Zhu, Disen- visual recommendation, in: ECIR (2), volume 13186 tangled graph convolutional networks, in: ICML, of Lecture Notes in Computer Science, Springer, 2022, volume 97 of Proceedings of Machine Learning Re- pp. 84–92.

search, PMLR, 2019, pp. 4212–4221. [42] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt[28] J. Wu, W. Shi, X. Cao, J. Chen, W. Lei, F. Zhang, Thieme, BPR: bayesian personalized ranking from W. Wu, X. He, Disenkgat: Knowledge graph embed- implicit feedback, in: UAI, AUAI Press, 2009, pp. ding with disentangled graph attention network, 452–461.

in: CIKM, ACM, 2021, pp. 2140–2149. [43] X. Chen, H. Chen, H. Xu, Y. Zhang, Y. Cao, Z. Qin, [29] H. Wang, N. Wang, D. Yeung, Collaborative deep H. Zha, Personalized fashion recommendation with learning for recommender systems, in: KDD, ACM, visual explanations based on multimodal attention 2015, pp. 1235–1244. network: Towards visually explainable recommen[30] A. Almahairi, K. Kastner, K. Cho, A. C. Courville, dation, in: SIGIR, ACM, 2019, pp. 765–774.

Learning distributed representations from reviews [44] Z. Wang, W. Ye, X. Chen, W. Zhang, Z. Wang, L. Zou, for collaborative filtering, in: RecSys, ACM, 2015, W. Liu, Generative session-based recommendation, pp. 147–154. in: WWW, ACM, 2022, pp. 2227–2235. [31] D. H. Kim, C. Park, J. Oh, S. Lee, H. Yu, Convolu- [45] J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algotional matrix factorization for document context- rithms for hyper-parameter optimization, in: NIPS, aware recommendation, in: RecSys, ACM, 2016, pp. 2011, pp. 2546–2554.

233–240. [46] V. W. Anelli, A. Bellogín, A. Ferrara, D. Malitesta, [32] L. Zheng, V. Noroozi, P. S. Yu, Joint deep modeling F. A. Merra, C. Pomo, F. M. Donini, T. D. Noia, Elof users and items using reviews for recommenda- liot: A comprehensive and rigorous framework for tion, in: WSDM, ACM, 2017, pp. 425–434. reproducible recommender systems evaluation, in: [33] H. Liu, Y. Wang, Q. Peng, F. Wu, L. Gan, L. Pan, SIGIR, ACM, 2021, pp. 2405–2414.

P. Jiao, Hybrid neural recommendation with joint [47] S. Vargas, Novelty and diversity enhancement and deep representation learning of ratings and reviews, evaluation in recommender systems and informaNeurocomputing 374 (2020) 77–85. tion retrieval, in: SIGIR, ACM, 2014, p. 1281. [34] Y. Lu, R. Dong, B. Smyth, Coevolutionary recom- [48] S. Vargas, P. Castells, Rank and relevance in novelty mendation model: Mutual learning between ratings and diversity metrics for recommender systems, in: and reviews, in: WWW, ACM, 2018, pp. 773–782. RecSys, ACM, 2011, pp. 109–116. [35] H. Liu, F. Wu, W. Wang, X. Wang, P. Jiao, C. Wu,

X. Xie, NRPA: neural recommendation with personalized attention, in: SIGIR, ACM, 2019, pp. 1233– 1236.

[1]

M. D.

Ekstrand ,

Riedl ,

J. A.

Konstan , Collaborative filtering recommender systems , Found. Trends Hum. Comput. Interact . 4 ( 2011 ) 175 - 243 .

[2]

Koren ,

R. M.

Bell ,

Volinsky , Matrix factorization techniques for recommender systems , Computer 42 ( 2009 ) 30 - 37 .

[3]

He ,

Liao ,

Zhang ,

Nie ,

Hu ,

Chua , Neural collaborative filtering , in: WWW, ACM, 2017 , pp. 173 - 182 .

[4]

Liang ,

R. G.

Krishnan ,

M. D.

Hofman , T. Jebara, Variational autoencoders for collaborative filtering , in: WWW, ACM, 2018 , pp. 689 - 698 .

[5]

T. N.

Kipf ,

Welling , Semi-supervised classification with graph convolutional networks, in: ICLR (Poster), OpenReview .net, 2017 .

[6]

Ying ,

He ,

Chen ,

Eksombatchai ,

W. L.

Hamilton ,

Leskovec , Graph convolutional neural networks for web-scale recommender systems , in: KDD, ACM, 2018 , pp. 974 - 983 .

[7]

Wang ,

He ,

Wang ,

Feng , T. Chua, Neural graph collaborative filtering , in: SIGIR, ACM, 2019 , pp. 165 - 174 .

[8]

He ,

Deng ,

Wang ,

Li ,

Zhang ,

Wang , Lightgcn: Simplifying and powering graph convolution network for recommendation , in: SIGIR, ACM, 2020 , pp. 639 - 648 .

[9]

Mao ,

Zhu ,

Xiao ,

Lu ,

Wang ,

He , Ultragcn: Ultra simplification of graph convolutional networks for recommendation , in: CIKM, ACM, 2021 , pp. 1253 - 1262 .

[10]

Shen ,

Wu ,

Zhang ,

Shan ,

Zhang , K. B. Letaief , D. Li , How powerful is graph convolution for recommendation? , in: CIKM, ACM, 2021 , pp. 1619 - 1629 .

[11]

Velickovic ,

Cucurull ,

Casanova ,

Romero ,

Liò ,

Bengio , Graph attention networks, in: ICLR (Poster), OpenReview .net, 2018 .

[12]

Xie ,

Li ,

Qin ,

Tseng ,

Kristinsson ,

Qiu ,

Y. L.

Murphey , Personalized session-based recommendation using graph attention networks , in: IJCNN , IEEE, 2021 , pp. 1 - 8 .

[13]

Zhang ,

Guo ,

Jin ,

Pan ,

Fang , Sequential recommendation with context-aware collaborative graph attention networks , in: IJCNN , IEEE, 2021 , pp. 1 - 8 .

[14]

Wang ,

Tang ,

Lei ,

Song ,

Wang ,

Zhang , Disenhan: Disentangled heterogeneous graph attention network for recommendation , in: CIKM, ACM, 2020 , pp. 1605 - 1614 .

[15]

Wang ,

Jin ,

Zhang ,

He ,

Xu ,

Chua , Disentangled graph collaborative filtering , in: SIGIR, ACM, 2020 , pp. 1001 - 1010 .

[16]

Chen ,

Lin ,

Li ,

Zhou ,

Sun , Measuring and relieving the over-smoothing problem for graph neural networks from the topological view , in: AAAI, AAAI Press, 2020 , pp. 3438 - 3445 .

[17]

Zhou ,

Huang ,

Li ,

Zha ,

Chen ,

Hu , Towards deeper graph neural networks with diferentiable group normalization , in: NeurIPS , 2020 .

[18]

Ni ,

Li , J. J. McAuley , Justifying recommendations using distantly-labeled reviews and finegrained aspects , in: EMNLP/IJCNLP (1) , Association for Computational Linguistics , 2019 , pp. 188 - 197 .

[19]

Chen ,

Wang , Recommender systems based on user reviews: the state of the art, User Model . User Adapt. Interact . 25 ( 2015 ) 99 - 154 .

[20]

Srifi ,

Oussous ,

A. A.

Lahcen ,

Mouline , Recommender systems based on collaborative filtering using review texts - A survey , Inf . 11 ( 2020 ) 317 .