1. Introduction

Scoring Prediction Model Based on Fusion of Text and Temporal Features

Haoqian Li

Xuesong Su

Wenguang Zheng

Jiayi Song

Yingyuan Xiao

0 0 School of Computer Science and Engineering, Tianjin University of Technology , Tianjin , China 1 Technology Inspection Center of Shengli Oilfield SINOPEC Dongying , Shandong , China

145 149

Nowadays, the Internet has penetrated into every household and is inseparable from our lives. The Internet provides users with a wide range of online products and services, the utilization of online platforms is much higher than before, and users gradually become dependent on online consumption. To promote the development of e-commerce ecosystem, the main task is to increase the purchase rate of goods. In order to reduce information overload and meet the diverse needs of users, a personalized recommendation system that implements recommendation tasks using review text is an effective solution to increase the purchase rate of users, and it also solves the problems of sparse data and cold starts. Currently, many deep recommendation models based on review text have emerged in this field, but these models have some other problems, such as lack of fusion of multiple features and lack of judgment of feature importance levels.

review text recommendation system feature fusion rating prediction

1. Introduction

With the development of the Internet, people are now entering the era of big data, and the intelligent society makes our information exist in every registered end of the application; when we log in our account to browse the website, this website also records our browsing information, when we shop online, the information of the purchased products and when we give feedback, our comments on the products are also recorded one by one. In general, a growing number of studies argue that since reviews explain users' opinions, they should help to infer potential dimensions for predicting ratings or purchases. The schemes incorporating reviews have since evolved from simple regularization methods to achieve rating prediction to neural network methods to achieve rating prediction.1

One of the main directions of research in recommender systems is to improve prediction by using latent information features, especially in cold-start environments where interaction data may be sparse or noisy. It has been shown in current rating prediction algorithms that introducing review text data into the rating prediction task can greatly improve the accuracy of recommendations. To enhance the fusion of features and the importance of focusing on interaction features to make improvements in the accuracy of user-item rating prediction, this paper proposes a multi-feature fusion-based rating prediction model (2TFRS). Our proposed algorithm models the rating prediction problem as a text matching problem using user-item review text fused with temporal features. The interaction between features is enhanced by learning methods that capture important matches between user text and item text; secondly, the model still combines review timing information, user and item rating information. Finally, the prediction scores are derived through a regression layer.

2. Related works

In this section, we introduce the classical recommendation models incorporating deep learning techniques in recent years and the innovative points and construction ideas of our proposed model.

In 2016, the word2vec [ 1 ] and Item2vec algorithms, borrowed from the field of natural language processing, learned the user's behavior sequences as sentences for representation. In the same year, Google released YoutubeDNN [ 2 ] which introduced the classical recommendation system architecture, divided into two phases of recall and ranking, providing ideas for subsequent industrial-grade recommendation systems. Many other classical models were proposed in the same period, such as ConvMF [ 3 ], PNN [ 4 ], DNN [ 5 ], DeepCrossing [ 6 ] etc. Many scholars have found that the accuracy of the prediction results is greatly improved by taking the attention mechanism into account when modeling the scoring prediction problem. Seo et al. proposed an attention-based convolutional neural network model （D-Attn model[ 7 ]）to extract potential representations of users and items. In 2017, L Zheng et al. proposed the DeepCoNN[ 8 ] model to mine the nature of user-behavioral goods from user-item review texts; In 2018, C. Chen et al. proposed the NARRE [ 9 ] model, which adds the attention mechanism to assign weight to each comment based on the DeepCoNN model; in 2020, Parisa Abolfath and Saeedeh Momtazi proposed the MPRS [ 10 ] model, which aims to focus on user -item interaction features; In 2022, Peilin Yang et al. proposed a deep learning-based main auxiliary network—MAN[ 11 ], which helps the main network to generate rating prediction values by focusing on the deep meaning at the word vector level with the help of the auxiliary network; this paper is based on this model and presents new ideas. Considering the influence of time decay factor on recommendation results, combining the temporal information of reviews as the weights of embedded vectors and then doing local interactions between vector pairs, performing convolutional operations and then inputting into the regression layer to obtain predicted scores.

3. Our proposed model

First, we consider each data record as a tuple（u, v, , ）to represent the review records written by user u for item v, including the rating , the review and comment time . To construct the user-item matching matrix, a review document-based approach is used. Firstly, all the reviews written by user u are concatenated into a single document, denoted as , consisting of n words. Similarly, integrate all the comment texts of item v into one document as the document.

Next, we use the word vector matching matrix for each pair of user u and item v as input to the where m is the length of CNN architecture, and each element in the matrix represents the similarity of the pth word in the user text document to the qth word in the item text document . The first layer of our proposed model is the embedding layer, which serves to map each word in the review text to a d-dimensional vector. The words of each user review text and the words of each item review text are trained by a word embedding function that can be used with the pre-trained embeddings from GloVe [ 12 ] used in Wikipedia. The and

in Equation 1 represent the word embedding vectors of word and word respectively, with the user embedding vector denoted as α and the item embedding vector as β . In this model, we consider that the user's preferences may change with time, we propose a weighting function that takes into account the time decay, the user's recent review behavior represents the user's recent preferences, and the item's recent reviews often represent the recent quality of this item, which can accurately improve the recommendation performance.

Where Δt is the time difference between the time of user u commenting on item v and the current time. As Δt increases, the weight value of the embedding vector becomes smaller, which means that the = user's commenting behavior long ago will have less influence on the prediction score. Then, the user embedding vector with weights added is denoted as ε and the user embedding vector with weights added is denoted as η.

To better capture the word meanings between each word for matching, we calculate the similarity between two words by computing the cosine similarity of their word embedding vectors to construct a user-item matching matrix to achieve a joint representation of user-item pairs, capture the joint semantic information .

Its similarity is calculated as shown in Equation 2:

The above is the first layer of the architecture of our proposed model. Next, we input the user-item pair matching matrix constructed above into the convolutional neural (CNN) architecture.

E m b e d d i n g L a y e r

Rating Score

CNN F(α)

F(β )

User Review Text Item Review Text

The CNN framework consists of a convolutional layer, a pooling layer, and a fully connected layer, each convolutional layer uses a filter to generate a feature map ，defined as Equation 3: , The symbol is the convolution operator, is the bias term.

We apply a maximum pooling operation to each feature mapping obtained from the convolutional layer and use the maximum value in each pooling window as one of the features of the corresponding kernel. By repeating the convolutional and maximum pooling layers, we allow to filter the interference and extract each feature simultaneously in a certain proportion. The result of the final maximum pooling layer is passed to a fully connected layer and then to a single neuron layer with a linear activation function - the regression layer. Concomitantly, the User-Item prediction score is output, and the calculation formula 4 is as follows:

Where O is the result of the fully connected layer, W contains the weights of the regression layer, and is the bias term. (4)

The method proposed in this paper is a rating prediction model based on the fusion of textual and temporal features to achieve rating prediction, and this model is suitable for regression prediction based on user-item similarity.

4. Experiments

We used the Amazon Review Dataset-5core public dataset to make an evaluation for each experiment. Each dataset was randomly divided into training set, validation set and test set in the ratio of 80:10:10. Classify the comment data set by category, as shown in Table 1:

All comment texts were first processed by the Stanford Core NLP word generator to obtain the sub words, then their stop words and punctuation marks are removed, and finally t the word embedding vectors were trained on the Glove. 6B.50d word list used in Wikipedia.In CNN, we set the number of convolutional layers to 7, and the number of convolution kernels for each layer is 64; Set the hyperparameter batch size to 128 and the dropout rate to 0.5.The root mean square error (RMSE) is used to evaluate the performance of our proposed algorithm. Let N be the total number of data points to be tested. Then, define the RMSE (5) as

In order to better test the performance of our proposed models. For this purpose, we tested the DeepCoNN model, the NARRE model, the MPRS model, and the MAN model on a subset of Amazon on the latest version. The mean squared error values (RMSE) obtained for our models as well as for the models of the comparison experiments on five different classes of the Amazon data subset have been listed Table 1： (5)

As shown in the table, the performance of our proposed model is superior to the baseline experiments after testing on five data subsets. It can be seen that our model obtains the largest relative improvement (3.15%) on the Amazon dataset AZ-IV (Amazon instant video) category.

5. Conclusion

After drawing on a large number of excellent papers on recommendation systems, we propose a new model based on feature fusion to achieve rating prediction. The model is modeled separately by using user reviews and item reviews, embedding the word vectors, processing the temporal information as the weights of the embedded vectors, and then calculating the similarity of the user-item vector pairs with weights to form a matching matrix to The CNN architecture and regression network are used to predict the user's rating value of items. 6. References

[1] Mikolov , Tomas , et al. Efficient estimation of word representations in vector space . arXiv preprint arXiv:1301.3781 ( 2013 )

[2] Covington , Paul, Jay Adams, and Emre Sargin . Deep neural networks for youtube recommendations . Proceeding of the 10th ACM conference on recommender systems . 2016

[3]

Kim ,

Park ,

Oh ,

Lee ,

Yu , Convolutional matrix factorization for document context-aware recommendation , in: Proceedings of the 10th ACM Conference on Recommender Systems - RecSys'6 , 2016

[4] Qu , Yuanru , et al. Product-based neural networks for user response prediction . 2016 IEEE 16th International Conference on Data Mining (ICDM)IEEE , 2016

[5] Zhang, Weinan,

Tianming

Du , and

Jun

Wang . Deep learning over multi-field categorial data . European conference on information retrieval . Springer, Cham, 2017

[6] Shan , Ying , et al. Deep crossing: Web-scale modeling without manually crafted combinatorial features . Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining.ACM . 2016

[7]

Seo ,

Huang ,

Yang , Y. Liu, Representation learning of users and items for review rating prediction using attention-based convolutional neural network , in: Proceedings of the 3rd International Workshop on Machine Learning Methods for Recommender Systems - MLRec , 2017

[8]

Zheng ,

Noroozi ,

P.S.

Yu , Joint deep modeling of users and items using reviews for recommendation , in: Proceedings of the 10th ACM International Conference on Web Search and Data Mining - WSDM '17 , 2017 , pp. 425 - 434

[9]

Chen ,

Zhang , Y. Liu, S. Ma, Neural attentional rating regression with review-level explanations , in: Proceedings of the 2018 World Wide Web Conference , 2018 , pp. 1583 - 1592

[10] Parisa

Abolfath

, Beygi Dezfouli, Saeedeh Momtazi,

Mehdi

Dehghan . Deep neural review text interaction for recommendation systems , Applied Soft Computing Journal 100 ( 2021 ) 106985

[11] Yang , Peilin , et al, MAN: Main-auxiliary network with attentive interactions for review-based recommendation in: Applied Intelligence-2022pp : 1 - 16

[12]

Pennington ,

Socher ,

C. D.

Manning , GloVe: Global Vectors for Word Representation , EMNLP ( 2014 )