1. Introduction

H. Zhiyong);

Adaptive Influence Diffusion Graph Neural Networks for Recommendation*

Zhiyong Hu

0 0 Fiberhome Telecommunication Technologies Co.,LTD , Wuhan , China 1 Xidian University , Xi'an, 710071 , China

2025

000 9 0009

Recommender systems have become a crucial intelligent tool for providing personalized services to users. Existing recommendation algorithms often face the problems of sparse data and unbalanced distribution of user interactions because they rely on user behavior to generate data. In addition, since recommendation algorithms based on graph learning capture user inter-item interactions through equal aggregation of neighbor information, it leads to ignoring the heterogeneity between user and item nodes as well as the variability of the influence of different nodes on the target node. To address the above issues, we propose an adaptive influence diffusion graph neural network. Specifically, we derive the user similarity graph and the item related graph from the user-item bipartite graph, and model the diffusion of influence between similar users and related items through a diffusion model. In addition, to model the heterogeneity between nodes and the variability of influence between nodes, we design a dual -attention mechanism to assign different influence weights to relationships that do not use the same type of relationship, in order to achieve adaptive propagation of information between heterogeneous nodes and between different nodes. Experimental results on several real data sets demonstrate the superiority and effectiveness of the proposed model.

eol>recommendation system graph diffusion；deep learning

1. Introduction

Collaborative filtering (CF) has become the most widely used technique in the recommendation domain by analyzing the users' historical interaction information to learn the users' and items' latent traits to predict the users' future preferences [ 1,2 ]. However, in real-world recommendation scenarios, CF-based recommendation algorithms often face the challenge of data sparsity [ 3 ]. This is because recommender systems rely on user behavior to generate data. However, most users generate a limited amount of behavioral data, which significantly impacts the efficiency of CF -based recommendation algorithms.

In recent years, graph neural networks (GNNs) [ 4 ] have shown great advantages in recommendation tasks by aggregating users' domain information. For example, GC-MC [ 5 ] and NGCF [6] construct user-item interaction bipartite graphs from user-item interaction data and utilize the structure of user-item graphs to propagate the embedding information of the user and the item on it. However, some existing methods treat all nodes in the domain equally, thus ignoring the differences between different neighboring nodes, resulting in suboptimal recommendation results.

Fortunately, as shopping platforms are updated, more and more people prefer to express their opinions about the items they have purchased on these platforms. These reviews often have a significant impact on the shopping behavior of similar users. This is due to the fact that an item may be more appealing to a user if the past consumers of the item have similar spending habits as the target user [7]. In addition, when users purchasing items, related items are also more attractive to them. Thus, modeling user similarity and item synergies can help us represent target user preferences and assist us in constructing and learning user preferences to alleviate the data sparsity problem and improve recommendation performance.

In summary, we develop a new recommendation model that constructs user similarity collaborative graph and item related collaborative graph for users and items respectively from useritem history interactions in order to construct similarity relationships between users and correlations between items under interaction information. A higher-order influence diffusion model is also used to model the diffusion influence between similar users and the synergistic attraction between related items. In addition, considering the heterogeneous influences of neighbors and the differences in the influences of different nodes, we propose a dual -attention model to aggregate information for neighbors in order to refine their different influences on the target user, and to achieve adaptive information dissemination among different types of neighbor nodes. Finally, the diffused information and the representation obtained based on the attention mechanism are connected for recommendation prediction. We conducted extensive experiments on various real-world datasets and verified the effectiveness of the proposed method in representation learning.

A tt e n ti o n A tt e n it o n

A tt e n it o n lrraeaeynL i lrreaaenyL i

Prediction

2. Formalization of problems

Based on the user interaction data we construct a two-part graph B = (U ,V , EUV ) consisting of two different types of node sets (user set U = {u1, u2 ,, un} and item set V = {v1, v2 ,, vm} ) and edge set EUV , where n = U , m = V and item v . matrix as R ∈  n×m , where ru,v ∈{0,1} , ru,v = 1 denote that there is an interaction between user u

Formally, the recommendation algorithm aims to construct an interaction matrix R ∈  n×m between users and items based on user-item interaction data. In this paper, we predict user-item interactions based on the original bipartite graph using the product of potential representations of users and items.

are the number of users and items. We denote the interaction

3. AIDGR model

Figure 1 shows an overview of AIDGR, which consists of three main modules. The first module is based on higher-order graph diffusion for feature representation In the module, we construct two auxiliary graphs based on the interaction information of users and items: user similarity graph and item relevance graph. Next, we employ a higher-order diffusion model to learn the initial user and item representations from the two auxiliary graphs. In the second module is dual-attention based embedding aggregation, in this section we develop a dual -attention mechanism to assign different weights to neighbors, and finally we aggregate the neighbor information based on the weights to update the feature representations of users and items. The last module is the prediction module where we pass the diffusion based initial feature representation and aggregated neighbor feature representation into a fully connected network to generate the final feature representation. Finally the interaction between users and items is predicted based on the inner product of users and items.

3.1. Feature Representation Module Based on Higher-Order Graph Diffusion

In this section, we learn the initial representation of users and items from two collaborative graphs based on a higher-order diffusion model.

3.1.1. collaborative graph construction

where N (ui ) is the set of items that the user ui interacts with, and if Sim(ui , u j ) >η , we then u u insert an edge between i and j , where η is an adjustable threshold. Similarly, we compute the correlation between two items as follows:

Sim(ui ,u j ) =

N (ui )  N (u j )

N (ui ) × N (u j ) Rel(vi , v j ) =

N (vi )  N (v j ) N (vi )  N (v j ) (1) (2) (3) Where N (vi ) denotes the set of users interacting with item vi , if Rel(vi , vj ) > ς , we add an edge v v between i and j , and similarly ς is an adjustable threshold.

3.1.2. Higher-order diffusion models

In this module, we design T Layer Graph Diffusion to model the diffusion of potential feature xs,t representations over the collaborative graph. Given user u ∈U , u represents the potential

Gs Gs feature representation of user u on U at the t layer. We update the representation of u on U at layer t +1 based on diffusion as follows:

 xus,t+1 =∑ σ   u′∈NUs (u) (W1s,t xus′,t +W2s,t (xus,t  xus′,t ))   

W s,t W s,t Where is 1 and 2 are the learnable parameters of the t layer, σ is the LeakyReLU Gs activation function, N Us (u) denotes the neighbor of u on U , and u′ is the neighbor node of u on Gs

U .Similarly, we also use a higher-order diffusion model to model the synergistic impacts between

Gr related projects on the V , and for a given project v ∈V , we use the following diffusion to update the feature representation of the v :

W r,t W r,t xr,t

Where 1 and 2 are the learnable parameters of the t layer, v denotes the potential feature representation of v on GVr at the t layer, σ is the LeakyReLU activation function, N Vr (v)

Gr r denotes the neighbor of v on V , and v′ is the neighboring node of v on GV .By diffusing the feature representations of the T graph diffusion layers we obtain a user representation xus,T and an initial feature representation xvr,T of the item. These representations are then passed into a dualattention based embedding aggregation for adaptive dissemination of user item information.

3.2. Embedding Aggregation Based on Dual Attention

After modeling the diffuse influence between similar users and related projects, to model the interaction information between users and projects, we introduce the user-project interaction graph B .Specifically, taking user u as an example, there are two different types of neighbor nodes for user u , i.e., user neighbor N Us (u ) in GUs and project neighbor N VB (u ) in B . Similarly, project v

N Vr (v) Gr N UB (v) has project neighbor in V and user neighbor in B .In order to achieve the information transfer between different types of nodes, we introduce the learnable parameters

W and V to map the initially obtained feature representations of the user and the project to the same embedding space, which are computed as follows:

  xvr,t+1 =∑(W1r,t σ  xvr′,t + W2r,t (xvr,t  xvr′,t ))   v′∈NVr (v)  (4) (6) (7) (8) (5)

WU  ev =WVxvr,T , v ∈V

In order to refine the influence of different neighbors of users and projects, we propose a dualattention model.

The impact of different neighbors on the same user. given that users u , are u 's

Gs friend neighbors in U . I aggregate the potential embeddings of these friend neighbors as follows: N Us (u ) where eul−′1 denotes the potential embedding of friend neighbor u′ in the l −1 layer, and α ul ,u′ is the weight corresponding to u′ , calculated as follows:

W l where u is the learnable parameter of the l layer , and σ is the LeakyReLU activation

N VB (u ) function. Similarly given user u , is the project neighbor of u on B . The potential embedding aggregation of these consistent project neighbors is as follows: el

N VB (u)

= ∑ v′∈N VB (u)α ul ,v′evl′−1 eu00

=WUxus,T , u ∈U eNlUs (u) = ∑u′∈NUs (u)α ul ,u′eul−′1 α ul ,u′ =

exp (σ (W l (eul−1 ⊕ eul−′1 ))) ∑u′∈NUs (u) exp (σ (W l (eul−1 ⊕ eul−′1 ))) α ul ,v′ =

exp (σ (W l (eul−1 ⊕ evl′−1 ))) ∑v′∈NVB(u) exp (σ (W l (eul−1 ⊕ evl′−1 ))) where evl′−1 denotes the potential embedding of the project neighbor v′ in the l −1 layer, α ul ,v′ is the weight parameter corresponding to v′ , W l is the learnable parameter of the l layer, and σ is the LeakyReLU activation function.

Influence of different types of neighbor nodes. In response to the fact that different types of

neighbors of the same user have different impacts, for this reason we propose a second-level attention mechanism for aggregating information obtained from two different perspectives: friend neighbors and project neighbors. Specifically, the user obtains the aggregated embedding as follows: AGGul =NlUs(u)eNUs β (u) + β NlVB (u)eNVB (u)

l l l l where eNUs (u) , eNVB (u) are the aggregated embeddings of the friend neighbors and project neighbors of user CCC, respectively, β NlUs (u) , β NlVB (u) are the attention weights of the friend neighbors and project neighbors, respectively, computed as follows: β NlUs (u) = β NlVB(u) =

exp (σ (W l (eul−1 ⊕ eNl−U1s (u) ))) ∑g∈NUs (u)NVB(u) exp (σ (W l (eul−1 ⊕ egl−1 )))

exp (σ (W l (eul−1 ⊕ eNl−V1B(u) ))) ∑ g∈NUs (u)NVB(u) exp (σ (W l (eul−1 ⊕ egl−1 )))

Where W l is the l layer learnable parameter and σ is the LeakyReLU activation function. The embedding of the final user u in the l layer is updated as follows: (9) (10) (11) (12) (13) (14) (15) (16) eul =(W σ l (el−1 ⊕ AGGul ))

u evl =(W σ l (el−1 ⊕ AGGvl ))

v where W l is the l layer learnable parameter and σ is the LeakyReLU activation function. Similarly given the item v , we can get the updated embedding of v as described above: eu ev =(Wu σ (eu0 ⊕ euL )) =(Wv σ (ev0 ⊕ evL ))

Where W l is the learnable parameter of the l layer, σ is the LeakyReLU activation function, and AGGvl is computed in a way similar to AGGul .

3.3. Prediction and Model Optimization

After completing the contextually consistent neighbor information aggregation, we can obtain el el l = [0,1, 2,, L] the layered embeddings of users and items, u and v where . In order to better model the potential features of users and items, we consider both the first and the last layers of user embeddings, because the first layer retains the original diffuse features of users and items, while the last layer provides a finer representation of the interaction features through the dual-attention mechanism for the item and user representations, and thus the final representations of users and items are: where u and v are the learnable weight matrices and σ is the activation function. Finally, we obtain the recommendation results by the inner product of user and item feature representations:

rˆu,v = eu ⋅ ev (17)

In order to learn the AIDGR model parameters, we need to specify an objective function for optimization. For implicit feedback, the most widely used loss function is cross entropy, defined as: L =∑ru,v log rˆu,v + (1− ru,v ) log(1− rˆu,v ) u,v∈EUV (18) r where EUV is the set of all observed ratings in the training set and u,v is the value (u, v ) corresponding to the pair in the interaction matrix R . In order to optimize the objective function, we use small batch adaptive moment estimation (Adam) [8] as the optimizer in our implementation, and a dropout strategy [9] to mitigate the overfitting problem.

3.4. Discussion

Θ1 =[P, Q]

Space complexity. The model parameters consist of three parts, the user and item embedding , and the parameter set Θ2 ={W s,t ,W r,t }t=T ,{WU ,WV }, Dual Attention(W l )l=1,2,,L ,{Wu ,Wv} .Since most embedding-based  t=1 models [ 1 ] need to store embeddings for each user and item, the space complexity of Θ1 is the same as that of traditional embedding-based models and grows linearly with the growth of users and Θ items. For the parameters in 2 , which are shared among all users and projects, the dimension of each parameter is much smaller than the number of users and projects, so this additional storage Θ Θ cost is a negligible constant. Therefore, the space complexity of AIDGR from 1 and 2 is the same as the traditional embedding model。

Time complexity. The time complexity of the AIDGR model is designed into two main components: a feature representation module based on higher-order graph diffusion and an aggregated embedding module based on dual attention. Given M users and N items and T diffusion layers, it is assumed that the average number of interacted items per user is nb and the average number of interacted users per item is mb . Before entering into a feature representation model based on higher-order graph diffusion, we need to evaluate the similarity between users and the synergy between items, the modelling time cost of which can be expressed as O(M 2 + N 2 ) . Assume that the average number of similar users for users is ms and the average number of collaborative items for items is nr . Afterwards, the time spent on embedding updates of users and items based on the diffusion of higher-order graphs is O(M ⋅ ms ⋅ D + N ⋅ nr ⋅ D) . For the aggregated embedding module with dual attention at each layer, the main time cost is the transfer of information from neighboring nodes on different graphs to users and projects. Firstly, we need to compute the information of the neighboring nodes from different graphs, and its time spent is in computing the attention scores between each neighboring node, which has a time complexity of O(M ⋅ (ms + nb ) ⋅ D + N ⋅ (nr + mb ) ⋅ D) . Then, we aggregate information from different types of neighbouring nodes based on the attention mechanism, and the time complexity is approximated to O(M + N ) since there are only two types of neighbouring nodes in practice. In practice ms , mb , nr , nb  min {M , N} , the time overhead grows linearly with the number of users and items, and linearly with the diffusion depth and the number of dual attention layers. Therefore, the total time complexity of AIDGR is acceptable.

4. experiment

In this section, the author mainly introduces the research content involved in the experiment, and then describes the datasets, evaluation metrics, experimental settings and experimental results used in this work.

4.1. Experimental dataset

Datasets. We apply our model to two publicly accessible datasets, namely Yelp and Amazon. Table 1 summarizes the statistics of the datasets. For each dataset, we use 80% of the data as a training set and the remaining 10%, 10% as a validation set and a test for final performance evaluation, respectively.

4.2. Evaluation indicators 4.3. Baseline algorithm

The proposed model AIDGR is compared with the following baselines.

FM[10]: The model is a unified model based on latent factors, utilizing user and item attributes. In practice, we use user and item functions as described above.

NCF[11]: This is a deep learning based recommendation model that utilizes multi-layer perceptrons to learn user-item interaction functions.

GCN[12]: This model uses spectral graph convolution operators to learn local graph network structure and node features, and implements semi-supervised learning directly on graph-structured data. NGCF[6]: This is a graph-based recommendation model that models higher-order connectivity in the graph of user items and injects collaboration signals into the embedding process in an explicit manner. LightGCN[13]: This model is simplified from the standard GCN for the recommendation task and proposes a lightweight graph convolutional network DiffNet++[14]: This model achieves better performance in social recommendation tasks by modeling the user's interest and influence diffusion process. DGCF[15]: This model decomposes the embeddings of users and items into multiple semantic factors and unifies different semantics through an attention mechanism to model the diverse intentions behind user behaviors. (19)

4.4. Analysis of results 4.4.1. Contrast to the baseline algorithm

Table 2 shows the comparison of the different methods on the two datasets. The following observations can be drawn from the results:

First FM, NCF performs poorly on these two datasets, probably because traditional collaborative filtering-based learning algorithms have difficulty in comprehensively dealing with the connectivity relationships between nodes compared to the strong modeling capabilities of graph structures on interacting data. The outperformance of DiffNet++ over traditional graph learning methods suggests that graph diffusion is able to cross the limitation of one -hop neighboring nodes in comparison to traditional graph learning methods, thus capturing richer graph properties. The superior performance of DGCF indicates that compared with traditional GNN models that model user behaviors based on a single intent, DGCF can better describe user preferences by modeling user behaviors with multiple intents.

Second, the proposed AIDGR model performance due to the baseline. The reasons are as follows, 1. AIDGR's higher-order diffusion operation based on the collaborative graph can effectively capture the influence between similar users and related items to enrich the feature representation of users and items, thus effectively alleviating the problem of data sparsity.2. The dual-attention mechanism can effectively differentiate the heterogeneity among nodes and the diversity of influences of neighboring nodes on the target node, and achieve adaptive information Propagation.

4.4.2. Ablation analysis

Influence of different components: the AIDGR consists of two key components, including 1) a higher-order influence diffusion model, and 2) a dual attention mechanism module. To study the impact of each component, we designed three AIDGR variants as follows:

AIDGR-Diff: removes the higher-order influence diffusion model from AIDGR.

AIDGR-Datt: replaces the dual-attention model with the normal GNN model.

Figure 2 shows that AIDGR-Diff performs worse than AIDGR, which confirms the important role of higher-order influence diffusion in learning about potential users and item presentation. The diffusion model improves recommendation performance from two perspectives. First, the user similarity and item relevance maps contain useful information reflecting user influence and item collaboration appeal. Second, the diffusion process allows users and projects to aggregate information from implicit neighbors with similar interests and project attributes.

AIDGR also achieves higher performance than AIDGR-DAtt, which suggests that all neighbors cannot be considered equally in the information aggregation process. An explanation for this phenomenon is that not all neighbors have the same impact on users and it is crucial to distinguish between heterogeneous impacts.

4.4.3. Parametric sensitivity analysis

In this subsection, we investigate how the performance of our proposed model varies with a number of hyperparameters, including the embedding dimension d , the threshold η for the user similarity graph, and the threshold ς for the item correlation graph. experiments show that when d is very small, the performance usually increases with an increase. However, when d is larger than a specific value, the performance decreases. In addition, the optimal dimension d varies across datasets. AIDGR performs best on Yelp when d = 32 and on Amazon when d = 64 . An explanation for this result is that larger values of d tend to lead to overfitting problems. this paper based on two datasets. Specifically, whenη = 0.5 , AIDGR performs best on Yelp. When η = 0.3 , AIDGR performs best on Amazon. As the threshold gradually increases from 0.1 to 0.9, the performance first increases and then gradually decreases. When η = 0.1, the model introduces a lot which leads to performance degradation. Therefore, an appropriate model performance. of noise, which reduces its performance. When η = 0.9 , very little valid information is captured, η is needed to ensure the η

on the item correlation graph, which is similar to

5. Conclusion

In this paper, we propose a new recommendation model, which constructs two synergistic similarity graphs for users and items from user-item history interactions to construct similarity relationships between users and correlation between items under the interaction information, and at the same time, uses a higher-order diffusion-of-influence model to model diffusion of influences between similar users and synergistic attraction between related items to alleviate the problems of data sparsity and user interactions that recommending is faced with. data distribution imbalance problem. In addition, considering the heterogeneity of neighbors and the differences in the influence of different neighbors, we propose a dual -attention model to perform information aggregation for neighbors to refine their different influences on the target nodes, and to achieve adaptive information dissemination among different neighbor nodes. Our extensive experiments show that the proposed model outperforms the state-of-the-art methods and verify the effectiveness of the proposed scheme.

Declaration on Generative AI

The author(s) have not employed any Generative AI tools. [6] Wang X, He X, Wang M, et al. Neural graph collaborative filtering[C]//Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval. 2019: 165-174. [7] Sarwar B, Karypis G, Konstan J, et al. Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th international conference on World Wide Web. 2001: 285-295. [8] Diederik P. Kingma and Jimmy Lei Ba. 2015. Adam: A Method for Stochastic Optimization. In

ICLR 2015 : International Conference on Learning Representations 2015. [9] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929–195. [10] Rendle S. Factorization machines[C]//2010 IEEE International conference on data mining. IEEE, 2010: 995-1000. [11] He X, Liao L, Zhang H, et al. Neural coll aborative filtering[C]//Proceedings of the 26th international conference on world wide web. 2017: 173-182. [12] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[J].

arxiv preprint arxiv:1609.02907, 2016. [13] He X, Deng K, Wang X, et al. Lightgcn: Simplifying and powering graph convolution network for recommendation[C]//Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 2020: 639-648. [14] Wu L, Li J, Sun P, et al. Diffnet++: A neural influence and interest diffusion network for social recommendation[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 34(10): 4753-4766. [15] Wang X, Jin H, Zhang A, et al. Disentangled graph collaborative filtering[C]//Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. 2020: 1001-1010.

[1] Koren

Factorization meets the neighborhood: a multifaceted collaborative filtering model[C]//Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining . 2008 : 426 - 434 .

[2] Koren

, Bell

, Volinsky

. Matrix factorization techniques for recommender systems [J]. Computer , 2009 , 42 ( 8 ): 30 - 37 .

[3] Adomavicius

, Tuzhilin

. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions[J] . IEEE transactions on knowledge and data engineering , 2005 , 17 ( 6 ): 734 - 749 .

[4] Wu

, Pan

, Chen

, et al. A comprehensive survey on graph neural networks[J] . IEEE transactions on neural networks and learning systems , 2020 , 32 ( 1 ): 4 - 24 .

[5]

Van

Den Berg

R , Thomas

N K , Welling

M .

Graph convolutional matrix completion [J]. arxiv preprint arxiv:1706.02263 , 2017 , 2 ( 8 ): 9 .