<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Scoring Prediction Model Based on Fusion of Text and Temporal Features</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Haoqian Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xuesong Su</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wenguang Zheng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiayi Song</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yingyuan Xiao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science and Engineering, Tianjin University of Technology</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Technology Inspection Center of Shengli Oilfield SINOPEC Dongying</institution>
          ,
          <addr-line>Shandong</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <fpage>145</fpage>
      <lpage>149</lpage>
      <abstract>
        <p>Nowadays, the Internet has penetrated into every household and is inseparable from our lives. The Internet provides users with a wide range of online products and services, the utilization of online platforms is much higher than before, and users gradually become dependent on online consumption. To promote the development of e-commerce ecosystem, the main task is to increase the purchase rate of goods. In order to reduce information overload and meet the diverse needs of users, a personalized recommendation system that implements recommendation tasks using review text is an effective solution to increase the purchase rate of users, and it also solves the problems of sparse data and cold starts. Currently, many deep recommendation models based on review text have emerged in this field, but these models have some other problems, such as lack of fusion of multiple features and lack of judgment of feature importance levels.</p>
      </abstract>
      <kwd-group>
        <kwd>review text</kwd>
        <kwd>recommendation system</kwd>
        <kwd>feature fusion</kwd>
        <kwd>rating prediction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>With the development of the Internet, people are now entering the era of big data, and the intelligent
society makes our information exist in every registered end of the application; when we log in our
account to browse the website, this website also records our browsing information, when we shop
online, the information of the purchased products and when we give feedback, our comments on the
products are also recorded one by one. In general, a growing number of studies argue that since reviews
explain users' opinions, they should help to infer potential dimensions for predicting ratings or
purchases. The schemes incorporating reviews have since evolved from simple regularization methods to
achieve rating prediction to neural network methods to achieve rating prediction.1</p>
      <p>One of the main directions of research in recommender systems is to improve prediction by using
latent information features, especially in cold-start environments where interaction data may be sparse
or noisy. It has been shown in current rating prediction algorithms that introducing review text data into
the rating prediction task can greatly improve the accuracy of recommendations. To enhance the fusion
of features and the importance of focusing on interaction features to make improvements in the
accuracy of user-item rating prediction, this paper proposes a multi-feature fusion-based rating prediction
model (2TFRS). Our proposed algorithm models the rating prediction problem as a text matching
problem using user-item review text fused with temporal features. The interaction between features is
enhanced by learning methods that capture important matches between user text and item text; secondly,
the model still combines review timing information, user and item rating information. Finally, the
prediction scores are derived through a regression layer.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>In this section, we introduce the classical recommendation models incorporating deep learning
techniques in recent years and the innovative points and construction ideas of our proposed model.</p>
      <p>
        In 2016, the word2vec [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Item2vec algorithms, borrowed from the field of natural language
processing, learned the user's behavior sequences as sentences for representation. In the same year,
Google released YoutubeDNN [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] which introduced the classical recommendation system architecture,
divided into two phases of recall and ranking, providing ideas for subsequent industrial-grade
recommendation systems. Many other classical models were proposed in the same period, such as ConvMF [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
PNN [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], DNN [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], DeepCrossing [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] etc. Many scholars have found that the accuracy of the prediction
results is greatly improved by taking the attention mechanism into account when modeling the scoring
prediction problem. Seo et al. proposed an attention-based convolutional neural network model
（D-Attn model[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]）to extract potential representations of users and items. In 2017, L Zheng et al.
proposed the DeepCoNN[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] model to mine the nature of user-behavioral goods from user-item review
texts; In 2018, C. Chen et al. proposed the NARRE [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] model, which adds the attention mechanism to
assign weight to each comment based on the DeepCoNN model; in 2020, Parisa Abolfath and Saeedeh
Momtazi proposed the MPRS [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] model, which aims to focus on user -item interaction features; In 2022,
Peilin Yang et al. proposed a deep learning-based main auxiliary network—MAN[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], which helps the
main network to generate rating prediction values by focusing on the deep meaning at the word vector
level with the help of the auxiliary network; this paper is based on this model and presents new ideas.
Considering the influence of time decay factor on recommendation results, combining the temporal
information of reviews as the weights of embedded vectors and then doing local interactions between
vector pairs, performing convolutional operations and then inputting into the regression layer to obtain
predicted scores.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Our proposed model</title>
      <p>First, we consider each data record as a tuple（u, v,
,
）to represent the review records
written by user u for item v, including the rating , the review and comment time . To
construct the user-item matching matrix, a review document-based approach is used. Firstly, all the reviews
written by user u are concatenated into a single document, denoted as , consisting of n words.
Similarly, integrate all the comment texts of item v into one document as
the document.</p>
      <p>
        Next, we use the word vector matching matrix
for each pair of user u and item v as input to the
where m is the length of
CNN architecture, and each element in the matrix
represents the similarity of the pth word
in the
user text document to the qth word in the item text document . The first layer of our
proposed model is the embedding layer, which serves to map each word in the review text to a
d-dimensional vector. The words of each user review text and the words of each item review text are
trained by a word embedding function that can be used with the pre-trained embeddings from GloVe [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
used in Wikipedia. The
and
      </p>
      <p>in Equation 1 represent the word embedding vectors of word
and word respectively, with the user embedding vector denoted as α and the item embedding vector
as β . In this model, we consider that the user's preferences may change with time, we propose a
weighting function that takes into account the time decay, the user's recent review behavior represents
the user's recent preferences, and the item's recent reviews often represent the recent quality of this item,
which can accurately improve the recommendation performance.</p>
      <p>Where Δt is the time difference between the time of user u commenting on item v and the current
time. As Δt increases, the weight value of the embedding vector becomes smaller, which means that the
=
user's commenting behavior long ago will have less influence on the prediction score. Then, the user
embedding vector with weights added is denoted as ε and the user embedding vector with weights
added is denoted as η.</p>
      <p>To better capture the word meanings between each word for matching, we calculate the similarity
between two words by computing the cosine similarity of their word embedding vectors to construct a
user-item matching matrix to achieve a joint representation of user-item pairs, capture the joint
semantic information .</p>
      <p>Its similarity is calculated as shown in Equation 2:</p>
      <p>The above is the first layer of the architecture of our proposed model. Next, we input the user-item
pair matching matrix constructed above into the convolutional neural (CNN) architecture.</p>
      <p>E
m
b
e
d
d
i
n
g
L
a
y
e
r</p>
      <p>Rating Score</p>
      <p>CNN
F(α)</p>
      <p>F(β )</p>
      <sec id="sec-3-1">
        <title>User Review Text</title>
      </sec>
      <sec id="sec-3-2">
        <title>Item Review Text</title>
        <p>The CNN framework consists of a convolutional layer, a pooling layer, and a fully connected layer,
each convolutional layer uses a filter to generate a feature map ，defined as Equation 3:
, The symbol is the convolution operator, is the bias term.</p>
        <p>We apply a maximum pooling operation to each feature mapping obtained from the convolutional
layer and use the maximum value in each pooling window as one of the features of the corresponding
kernel. By repeating the convolutional and maximum pooling layers, we allow to filter the interference
and extract each feature simultaneously in a certain proportion. The result of the final maximum pooling
layer is passed to a fully connected layer and then to a single neuron layer with a linear activation
function - the regression layer. Concomitantly, the User-Item prediction score is output, and the
calculation formula 4 is as follows:</p>
        <p>Where O is the result of the fully connected layer, W contains the weights of the regression layer, and
is the bias term.
(4)</p>
        <p>The method proposed in this paper is a rating prediction model based on the fusion of textual and
temporal features to achieve rating prediction, and this model is suitable for regression prediction based
on user-item similarity.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>We used the Amazon Review Dataset-5core public dataset to make an evaluation for each
experiment. Each dataset was randomly divided into training set, validation set and test set in the ratio of
80:10:10. Classify the comment data set by category, as shown in Table 1:</p>
      <p>All comment texts were first processed by the Stanford Core NLP word generator to obtain the sub
words, then their stop words and punctuation marks are removed, and finally t the word embedding
vectors were trained on the Glove. 6B.50d word list used in Wikipedia.In CNN, we set the number of
convolutional layers to 7, and the number of convolution kernels for each layer is 64; Set the
hyperparameter batch size to 128 and the dropout rate to 0.5.The root mean square error (RMSE) is used to
evaluate the performance of our proposed algorithm. Let N be the total number of data points to be
tested. Then, define the RMSE (5) as</p>
      <p>In order to better test the performance of our proposed models. For this purpose, we tested the
DeepCoNN model, the NARRE model, the MPRS model, and the MAN model on a subset of Amazon
on the latest version. The mean squared error values (RMSE) obtained for our models as well as for the
models of the comparison experiments on five different classes of the Amazon data subset have been
listed Table 1：
(5)</p>
      <p>As shown in the table, the performance of our proposed model is superior to the baseline
experiments after testing on five data subsets. It can be seen that our model obtains the largest relative
improvement (3.15%) on the Amazon dataset AZ-IV (Amazon instant video) category.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>After drawing on a large number of excellent papers on recommendation systems, we propose a new
model based on feature fusion to achieve rating prediction. The model is modeled separately by using
user reviews and item reviews, embedding the word vectors, processing the temporal information as the
weights of the embedded vectors, and then calculating the similarity of the user-item vector pairs with
weights to form a matching matrix to The CNN architecture and regression network are used to predict
the user's rating value of items.
6. References</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <surname>Tomas</surname>
          </string-name>
          , et al.
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Covington</surname>
            , Paul, Jay Adams, and
            <given-names>Emre</given-names>
          </string-name>
          <string-name>
            <surname>Sargin</surname>
          </string-name>
          .
          <article-title>Deep neural networks for youtube recommendations</article-title>
          .
          <source>Proceeding of the 10th ACM conference on recommender systems</source>
          .
          <source>2016</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Convolutional matrix factorization for document context-aware recommendation</article-title>
          ,
          <source>in: Proceedings of the 10th ACM Conference on Recommender Systems - RecSys'6</source>
          ,
          <fpage>2016</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yuanru</surname>
          </string-name>
          , et al.
          <article-title>Product-based neural networks for user response prediction</article-title>
          .
          <source>2016 IEEE 16th International Conference on Data Mining (ICDM)IEEE</source>
          ,
          <fpage>2016</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5] Zhang, Weinan,
          <string-name>
            <given-names>Tianming</given-names>
            <surname>Du</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jun</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Deep learning over multi-field categorial data</article-title>
          .
          <source>European conference on information retrieval</source>
          . Springer, Cham,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Shan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ying</surname>
          </string-name>
          , et al.
          <article-title>Deep crossing: Web-scale modeling without manually crafted combinatorial features</article-title>
          .
          <source>Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining.ACM</source>
          .
          <year>2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Seo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          , Y. Liu,
          <article-title>Representation learning of users and items for review rating prediction using attention-based convolutional neural network</article-title>
          ,
          <source>in: Proceedings of the 3rd International Workshop on Machine Learning Methods for Recommender Systems - MLRec</source>
          ,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Noroozi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Joint deep modeling of users and items using reviews for recommendation</article-title>
          ,
          <source>in: Proceedings of the 10th ACM International Conference on Web Search and Data Mining - WSDM '17</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>425</fpage>
          -
          <lpage>434</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Y. Liu, S. Ma,
          <article-title>Neural attentional rating regression with review-level explanations</article-title>
          ,
          <source>in: Proceedings of the 2018 World Wide Web Conference</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1583</fpage>
          -
          <lpage>1592</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Parisa</surname>
            <given-names>Abolfath</given-names>
          </string-name>
          , Beygi Dezfouli, Saeedeh Momtazi,
          <string-name>
            <given-names>Mehdi</given-names>
            <surname>Dehghan</surname>
          </string-name>
          .
          <article-title>Deep neural review text interaction for recommendation systems</article-title>
          ,
          <source>Applied Soft Computing Journal</source>
          <volume>100</volume>
          (
          <year>2021</year>
          )
          <fpage>106985</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Peilin</surname>
          </string-name>
          , et al,
          <article-title>MAN: Main-auxiliary network with attentive interactions for review-based recommendation</article-title>
          <source>in: Applied Intelligence-2022pp</source>
          :
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>GloVe: Global Vectors for Word Representation</article-title>
          ,
          <string-name>
            <surname>EMNLP</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>