Scoring Prediction Model Based on Fusion of Text and Temporal
Features
Haoqian Li1, Xuesong Su2, Wenguang Zheng1*, Jiayi Song1 and Yingyuan Xiao1
1
    School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China
2
    Technology Inspection Center of Shengli Oilfield SINOPEC Dongying, Shandong, China

                  Abstract
                  Nowadays, the Internet has penetrated into every household and is inseparable from our lives.
                  The Internet provides users with a wide range of online products and services, the utilization of
                  online platforms is much higher than before, and users gradually become dependent on online
                  consumption. To promote the development of e-commerce ecosystem, the main task is to in-
                  crease the purchase rate of goods. In order to reduce information overload and meet the diverse
                  needs of users, a personalized recommendation system that implements recommendation tasks
                  using review text is an effective solution to increase the purchase rate of users, and it also
                  solves the problems of sparse data and cold starts. Currently, many deep recommendation
                  models based on review text have emerged in this field, but these models have some other
                  problems, such as lack of fusion of multiple features and lack of judgment of feature im-
                  portance levels.

                  Keywords:
                  review text; recommendation system; feature fusion; rating prediction.

1. Introduction

    With the development of the Internet, people are now entering the era of big data, and the intelligent
society makes our information exist in every registered end of the application; when we log in our
account to browse the website, this website also records our browsing information, when we shop
online, the information of the purchased products and when we give feedback, our comments on the
products are also recorded one by one. In general, a growing number of studies argue that since reviews
explain users' opinions, they should help to infer potential dimensions for predicting ratings or pur-
chases. The schemes incorporating reviews have since evolved from simple regularization methods to
achieve rating prediction to neural network methods to achieve rating prediction.1
    One of the main directions of research in recommender systems is to improve prediction by using
latent information features, especially in cold-start environments where interaction data may be sparse
or noisy. It has been shown in current rating prediction algorithms that introducing review text data into
the rating prediction task can greatly improve the accuracy of recommendations. To enhance the fusion
of features and the importance of focusing on interaction features to make improvements in the accu-
racy of user-item rating prediction, this paper proposes a multi-feature fusion-based rating prediction
model (2TFRS). Our proposed algorithm models the rating prediction problem as a text matching
problem using user-item review text fused with temporal features. The interaction between features is
enhanced by learning methods that capture important matches between user text and item text; secondly,
the model still combines review timing information, user and item rating information. Finally, the
prediction scores are derived through a regression layer.


ICBASE2022@3rd International Conference on Big Data & Artificial Intelligence & Software Engineering, October 21-23, 2022, Guang-
zhou, China
EMAIL: 1369227010@qq.com (Haoqian Li); wenguangz@tjut.edu.cn (Wenguang Zheng)
ORCID: 0000-0002-7604-9893 (Haoqian Li)
             ©️ 2022 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                  145
2. Related works

   In this section, we introduce the classical recommendation models incorporating deep learning
techniques in recent years and the innovative points and construction ideas of our proposed model.
   In 2016, the word2vec [1] and Item2vec algorithms, borrowed from the field of natural language
processing, learned the user's behavior sequences as sentences for representation. In the same year,
Google released YoutubeDNN [2] which introduced the classical recommendation system architecture,
divided into two phases of recall and ranking, providing ideas for subsequent industrial-grade recom-
mendation systems. Many other classical models were proposed in the same period, such as ConvMF [3],
PNN [4], DNN [5], DeepCrossing [6] etc. Many scholars have found that the accuracy of the prediction
results is greatly improved by taking the attention mechanism into account when modeling the scoring
prediction problem. Seo et al. proposed an attention-based convolutional neural network model
（D-Attn model[7]）to extract potential representations of users and items. In 2017, L Zheng et al.
proposed the DeepCoNN[8] model to mine the nature of user-behavioral goods from user-item review
texts; In 2018, C. Chen et al. proposed the NARRE [9] model, which adds the attention mechanism to
assign weight to each comment based on the DeepCoNN model; in 2020, Parisa Abolfath and Saeedeh
Momtazi proposed the MPRS [10] model, which aims to focus on user -item interaction features; In 2022,
Peilin Yang et al. proposed a deep learning-based main auxiliary network—MAN[11], which helps the
main network to generate rating prediction values by focusing on the deep meaning at the word vector
level with the help of the auxiliary network; this paper is based on this model and presents new ideas.
Considering the influence of time decay factor on recommendation results, combining the temporal
information of reviews as the weights of embedded vectors and then doing local interactions between
vector pairs, performing convolutional operations and then inputting into the regression layer to obtain
predicted scores.

3. Our proposed model

   First, we consider each data record as a tuple（u, v,       ,     ,    ）to represent the review records
written by user u for item v, including the rating , the review           and comment time        . To con-
struct the user-item matching matrix, a review document-based approach is used. Firstly, all the reviews
written by user u are concatenated into a single document, denoted as             , consisting of n words.
Similarly, integrate all the comment texts of item v into one document as          where m is the length of
the document.
    Next, we use the word vector matching matrix         for each pair of user u and item v as input to the
CNN architecture, and each element in the matrix          represents the similarity of the pth word in the
user text document         to the qth word        in the item text document        . The first layer of our
proposed model is the embedding layer, which serves to map each word in the review text to a
d-dimensional vector. The words of each user review text and the words of each item review text are
trained by a word embedding function that can be used with the pre-trained embeddings from GloVe [12]
used in Wikipedia. The         and     in Equation 1 represent the word embedding vectors of word
and word       respectively, with the user embedding vector denoted as α and the item embedding vector
as β . In this model, we consider that the user's preferences may change with time, we propose a
weighting function that takes into account the time decay, the user's recent review behavior represents
the user's recent preferences, and the item's recent reviews often represent the recent quality of this item,
which can accurately improve the recommendation performance.

                                                       =                                                 (1)

   Where Δt is the time difference between the time of user u commenting on item v and the current
time. As Δt increases, the weight value of the embedding vector becomes smaller, which means that the


                                                    146
user's commenting behavior long ago will have less influence on the prediction score. Then, the user
embedding vector with weights added is denoted as ε and the user embedding vector with weights
added is denoted as η.
   To better capture the word meanings between each word for matching, we calculate the similarity
between two words by computing the cosine similarity of their word embedding vectors to construct a
user-item matching matrix to achieve a joint representation of user-item pairs, capture the joint se-
mantic information .
   Its similarity is calculated as shown in Equation 2:

                                                                                                      (2)

   The above is the first layer of the architecture of our proposed model. Next, we input the user-item
pair matching matrix constructed above into the convolutional neural (CNN) architecture.

                                             Rating Score


                                             CNN
                                   F(α)                                                 F(β )
     Embedding Layer


                       User Review Text                                   Item Review Text


Figure 1 Scoring prediction model based on text and temporal feature fusion

   The CNN framework consists of a convolutional layer, a pooling layer, and a fully connected layer,
each convolutional layer uses a filter     to generate a feature map ，defined as Equation 3:

                                                                                                       (3)
                       , The symbol is the convolution operator,         is the bias term.
   We apply a maximum pooling operation to each feature mapping obtained from the convolutional
layer and use the maximum value in each pooling window as one of the features of the corresponding
kernel. By repeating the convolutional and maximum pooling layers, we allow to filter the interference
and extract each feature simultaneously in a certain proportion. The result of the final maximum pooling
layer is passed to a fully connected layer and then to a single neuron layer with a linear activation
function - the regression layer. Concomitantly, the User-Item prediction score           is output, and the
calculation formula 4 is as follows:
                                                                                                       (4)
   Where O is the result of the fully connected layer, W contains the weights of the regression layer, and
  is the bias term.


                                                   147
   The method proposed in this paper is a rating prediction model based on the fusion of textual and
temporal features to achieve rating prediction, and this model is suitable for regression prediction based
on user-item similarity.

4. Experiments

    We used the Amazon Review Dataset-5core public dataset to make an evaluation for each experi-
ment. Each dataset was randomly divided into training set, validation set and test set in the ratio of
80:10:10. Classify the comment data set by category, as shown in Table 1:
    All comment texts were first processed by the Stanford Core NLP word generator to obtain the sub
words, then their stop words and punctuation marks are removed, and finally t the word embedding
vectors were trained on the Glove. 6B.50d word list used in Wikipedia.In CNN, we set the number of
convolutional layers to 7, and the number of convolution kernels for each layer is 64; Set the hy-
perparameter batch size to 128 and the dropout rate to 0.5.The root mean square error (RMSE) is used to
evaluate the performance of our proposed algorithm. Let N be the total number of data points to be
tested. Then, define the RMSE (5) as

                                                                                                     (5)

    In order to better test the performance of our proposed models. For this purpose, we tested the
DeepCoNN model, the NARRE model, the MPRS model, and the MAN model on a subset of Amazon
on the latest version. The mean squared error values (RMSE) obtained for our models as well as for the
models of the comparison experiments on five different classes of the Amazon data subset have been
listed Table 1：

Table 1
RMSE results of the compared methods on different datasets
                        Deep-                                                   2TFRS(relative
       Dataset                        MPRS             MAN
                        CoNN                                                    improvement)
  Cell Phones and
                       1,4129         1.0129          1.1022                     1.0743(2.79%)
    Accessories
  Toy and Games        0,9024         0.8959          0.8384                     0.8271(1.13%)
  Office Products      0.8692         0.7887          0.7727                     0.7536(1.91%)
  Amazon instant
                       1.1132         1.1022          0.9786                     0.9471(3.15%)
        video
   Digital Music       0.9006         0.8760          0.8701                     0.8502(1.99%)

   As shown in the table, the performance of our proposed model is superior to the baseline experi-
ments after testing on five data subsets. It can be seen that our model obtains the largest relative im-
provement (3.15%) on the Amazon dataset AZ-IV (Amazon instant video) category.

5. Conclusion

   After drawing on a large number of excellent papers on recommendation systems, we propose a new
model based on feature fusion to achieve rating prediction. The model is modeled separately by using
user reviews and item reviews, embedding the word vectors, processing the temporal information as the
weights of the embedded vectors, and then calculating the similarity of the user-item vector pairs with
weights to form a matching matrix to The CNN architecture and regression network are used to predict
the user's rating value of items.


                                                   148
6. References

[1] Mikolov, Tomas, et al. Efficient estimation of word representations in vector space. arXiv preprint
     arXiv:1301.3781(2013)
[2] Covington, Paul, Jay Adams, and Emre Sargin. Deep neural networks for youtube recommenda-
     tions. Proceeding of the 10th ACM conference on recommender systems. 2016
[3] D. Kim, C. Park, J. Oh, S. Lee, H. Yu, Convolutional matrix factorization for document con-
     text-aware recommendation, in: Proceedings of the 10th ACM Conference on Recommender
     Systems – RecSys’6, 2016
[4] Qu, Yuanru, et al. Product-based neural networks for user response prediction. 2016 IEEE 16th
     International Conference on Data Mining (ICDM)IEEE,2016
[5] Zhang, Weinan, Tianming Du, and Jun Wang. Deep learning over multi-field categorial data.
     European conference on information retrieval. Springer, Cham,2017
[6] Shan, Ying, et al. Deep crossing: Web-scale modeling without manually crafted combinatorial
     features. Proceedings of the 22nd ACM SIGKDD international conference on knowledge dis-
     covery and data mining.ACM.2016
[7] S. Seo, J. Huang, H. Yang, Y. Liu, Representation learning of users and items for review rating
     prediction using attention-based convolutional neural network, in: Proceedings of the 3rd Inter-
     national Workshop on Machine Learning Methods for Recommender Systems – MLRec, 2017
[8] L. Zheng, V. Noroozi, P.S. Yu, Joint deep modeling of users and items using reviews for rec-
     ommendation, in: Proceedings of the 10th ACM International Conference on Web Search and Data
     Mining - WSDM ’17,2017, pp. 425–434
[9] C. Chen, M. Zhang, Y. Liu, S. Ma, Neural attentional rating regression with review-level expla-
     nations, in: Proceedings of the 2018 World Wide Web Conference, 2018, pp. 1583–1592
[10] Parisa Abolfath, Beygi Dezfouli, Saeedeh Momtazi, Mehdi Dehghan. Deep neural review text
     interaction for recommendation systems, Applied Soft Computing Journal 100 (2021) 106985
[11] Yang, Peilin, et al, MAN: Main-auxiliary network with attentive interactions for review-based
     recommendation in: Applied Intelligence-2022pp:1—16
[12] J. Pennington, R. Socher, C. D. Manning, GloVe: Global Vectors for Word Representation,
     EMNLP (2014)


                                                 149