=Paper= {{Paper |id=Vol-3132/Paper_6.pdf |storemode=property |title=Comparative Analysis of Basic Approaches to Implementing Model-Based Recommendation Systems Based on Implicit Economic Information |pdfUrl=https://ceur-ws.org/Vol-3132/Paper_6.pdf |volume=Vol-3132 |authors=Yurii Kryvenchuk,Viktoriia Lakiza,Yuliia Bidak,Iryna Myskiv |dblpUrl=https://dblp.org/rec/conf/iti2/KryvenchukLBM21 }} ==Comparative Analysis of Basic Approaches to Implementing Model-Based Recommendation Systems Based on Implicit Economic Information == https://ceur-ws.org/Vol-3132/Paper_6.pdf
Comparative Analysis of Basic Approaches to Implementing
Model-Based Recommendation Systems Based on Implicit
Economic Information
Yurii Kryvenchuk, Viktoriia Lakiza, Yuliia Bidak and Iryna Myskiv

Lviv Polytechnic National University, Profesorska Street 1, Lviv, 79013, Ukraine

                Abstract
                The paper considers ways to solve the problem of Internet congestion.Analogs of
                recommendation systems of different researchers are also given. The main algorithms in
                recommendation systems are analyzed: Content based, demographic based, Coloborative
                filter. Two types of data are considered, which help to form an overall assessment in the
                recommendation system. The main problems that shape the work with recommendation
                systems are considered.The tasks of recommendation systems are analyzed in detail. The
                paper provides a step-by-step creation of a recommendation system and identifies the main
                requirements that it must meet.The study presents a similarity matrix, which is calculated
                from the entire recommendation vector. The personalization of the recommendation is also
                calculated.The matrix factorization method is analyzed (Matrixfactorization). The evaluation
                that follows from the user profile is considered.In the work, to get results on the proposed
                models, offers its own web service for finding movies, where the user can search for movies,
                as well as view detailed information about them or the movie rating. Recommendations in
                this system are based on implicit feedback, and it is possible to receive information about the
                user's id to make personalized recommendations.The implemented methods of
                recommendations are also analyzed:Linear Regression Prediction, Content-Based Prediction,
                Collaborative-Filtering Prediction User-Based, Collaborative-Filtering Prediction Item –
                Based.

                Keywords 1
                Machine Learning, Artificial Intelligence, neural network, intelligent technologies, sum of
                error squares, recommendation system, content-based approach.

1. Introduction
    Today, the problem of Internet congestion remains open. The amount of information on the
Internet is growing exponentially every day [1, 5, 18]. Recommendation systems are a relatively
young field. It all started in 2006 when Netflix launched the Netflix Prize data analysis competition.
Around the same time, the annual RecSys conference on referral systems began, which is still held
today [3, 7].
    The study aims to describe models where the components for translating the characteristics of user
behaviour are his assessments, which are used for his content recommendations [6, 2]. If the problem
is attributed to the difficulties of classification or regression, the list of required algorithms is quite
comprehensive. Therefore, the study should pay attention to the work of several algorithms based on
accurate data [4, 8].
    Before starting the study, a list of characteristics that can describe any recommendation system is
given.

Information Technology and Implementation (IT&I-2021), December 01–03, 2021, Kyiv, Ukraine
EMAIL: yurkokryvenchuk@gmail.com (Yu. Kryvenchuk); viktoriia.v.lakiza@lpnu.ua (V. Lakiza); yuliia.bidak.knm.2019@lpnu.ua
(Yu.Bidak); myskiviryna@i.ua (I. Myskiv); inem.news@gmail.com (Yu. Malynovskyy)
ORCID: 0000-0002-2504-5833 (Yu. Kryvenchuk); 0000-0002-6764-8536 (V. Lakiza); 0000-0002-9780-1546 (Yu.Bidak); 0000-0002-
3761-2276 (I. Myskiv); 0000-0002-7139-5623 (Yu. Malynovskyy)
           ©️ 2022 Copyright for this paper by its authors.
           Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
           CEUR Workshop Proceedings (CEUR-WS.org)



                                                                                                                     63
    Subject of recommendations - what is recommended. It can be anything: movies, music,
       products, news, articles, books, products, videos, people and more.
    Purpose of recommendations - the navigator is recommended. They are gathering, providing
       information, training, meeting new people.
    Recommendation context - what the user is doing at the moment. You are browsing products,
       listening to music, communicating with people.
    Source of recommendation - who recommends. Audience-like users, experts.
    Degree of personalization. Non-personal recommendations - when you are recommended all
       the same as other users. They allow targeting by region or time but do not consider their
       preferences. Additional enhancements include the number of recommendations for your current
       session. You have reviewed several products and recommended similar products for you.
       Personal recommendations contain all available information about customers, including the
       history of their purchases.
    Transparency. People trust recommendations more when they understand what they are based
       on. So there is less chance of coming across a system that recommends offering goods or
       services.
    Recommendation format. This can be included in a window, a sorted list found in certain parts
       of the site, a bar that opens the screen, or something else.
    Algorithms. Despite many available algorithms, they all come down to a few basic approaches.
       The most classic is:
          Summary (non-personal);
          Based on content (models based on the product description);
          Collaborative (collaborative filtering);
          Matrix factorization (methods based on matrix schedules).
   To define recommendations, standard filtering systems must correlate two fundamentally different
objects: elements and users. Therefore, the aim of this study is to compare two main approaches,
which are the two main methods of joint filtering: the neighborhood approach and the model of
hidden factors. Neighbourhood methods focus on relationships between objects or between users.
   The relevance of this study is the process of modeling user preferences based on assessments of
similar aspects of the same user [9, 12].
   Hidden factor models, such as matrix factorization (SVD), contain an alternative approach, turning
both elements and users into the same confidential factor space. Latent space explains ratings by
characterizing products and users by factors that automatically follow from user feedback.
   Matrix decomposition methods [8, 10] combine ease of implementation with relatively high
accuracy. This made them the best technique for solving the most extensive public data set - Netflix
data. Hidden factor models (LFMs) are suitable for co-filtering with the holistic purpose of
identifying latent features that explain the observed estimates; examples include pLSA, neural
networks, latent Dirichlet distribution, and models induced by factoring the evaluation matrix of user
elements (also known as SVD models) [11, 15]. Recently, models based on matrix extensions have
gained popularity due to their attractive accuracy and scalability.

2. Materials and methods
   When searching for information, matrix decomposition methods are used to identify hidden
semantic factors. However, its application to precise estimates in co-filtering is complex due to the
large proportion of missing values [13, 16]. The usual matrix decomposition method is not determined
when knowledge of the matrix is incomplete. Moreover, the careless attitude towards only a few well-
known records is prone to excessive placement. Previous work has relied on imputation, filling in the
gaps and making the rating matrix dense [14, 17]. However, the hint can be very expensive, as it
significantly increases the amount of data. In addition, data can be distorted considerably due to false
imputations. Thus, newer works suggest directly modelling only the observed ratings, avoiding
adjustable model branches.



                                                                                                     64
2.1. Tasks of the recommendation system
    The task of the recommendation system is to inform the user about the product that may interest
him most at a particular time. The customer receives recommendations about the product he needs,
and the service earns, depending on the business model, recommendation systems can be profitable in
different ways [7, 12]. The first option is the direct sale of goods. The following can affect the number
of users and in turn the revenue from advertising and so on.
    In the previous section, the main principles, problems and objectives of recommendation systems
were discussed. This should focus on preparing for practical implementation [10, 16]. The first step is
to define the requirements that the recommendation system must meet.
    1. Coverage. Coverage is the percentage of test items that a test set recommendation system may
        recommend..
    2. Personalization. Personalization shows how many identical things the recommendation system
        shows to different users. Personalization is calculated in Table 1.

Table 1
Requirement for the recommendation system - personalization
                       A            B              C                   D            X             Z
        0              1            1              1                   1            0             0
        1              1            1              1                   0            1             0
        2              1            1              1                   0            0             1

   Binary variables define two states (1 – the subject was recommended to the user. 0 – was not). The
next step is to calculate the similarity matrix for users in Table 2.

Table 2
Similarity matrix for users
                                           0                           1                    2
            0                              1                         0,75                 0,75
            1                            0,75                          1                  0,75
            2                            0,75                        0,75                   1

   The similarity matrix is calculated from the whole recommendation vector. Personalization= 1 –
0,75 = 0,25. The next step is to calculate the average of the upper triangle and subtract from the unit.
A high score means that the model provides highly personalized recommendations.
   3. Estimation of similarity
   The similarity assessment determines how much similar items are advised to the user. This uses
feature features (such as genres in movies) to calculate similarity. Let's look at an example of Figure
1.




Figure 1: Example of defining movie id

   In the Table 3 defined features about the object - the film, which are determined by the user using
the recommendation system. So, in Table 3 genres for recommended movies for the first user. In
Figure 2 shows an assessment of similar films received by the user of the recommendation system.
   The higher the rating, the more similar movies the user will receive. Therefore, the metrics that
determine the quality of the recommendation system should be considered. Recall and Precision at k.
This metric was commonly used in binary classification algorithms. Now this is one of the effective


                                                                                                      65
ways to determine the quality of the recommendation system. In this case, it is necessary to say
whether the recommendation interested the user or not. A rating of 1-5 is usually used for this.

Table 3
Representation of features on the film using the recommendation system
        movieId                        Action                 Comedy                     Romance
          3                              0                       1                          0
          7                              0                       1                          0
          5                              0                       1                          0
          9                              1                       0                          0




    Figure 2: Score for films offered by the recommendation system
    To translate the rating into the binary system, suffice it to say that all values above a certain level
should be considered positive. For example, take the value of 3.5 (these can be absolute values
depending on the problem). The next step is to determine the ‘k’. Since recommendation systems
usually return a list of recommended products, only the first ‘k’ should be considered..
    This metric shows the percentage of recommendations from the top ‘k’ items that were correct and
relevant to the user.

3. Experiments
   The main part of working with recommendation systems is data. To review the algorithms, use the
Deskdrop dataset, which includes 12-month records from CI & T's Internal Communication platform
(DeskDrop). It includes information about 73 thousand users who interacted with 3000 articles
distributed on the platform and includes 2 files: shared_articles.csv; users_interactions.csv.
   Their structure should be considered for analysis. For the Shared_articles.csv file, which contains
information about common files on the platform, where each article has its original url, title, content
as plain text, language, and information about the user who published the article. Also, each time
stamp has two possible events: Content is distributed and available to users; Content has been deleted
and is not available to users. In the Table 4 shows data with timestamp, type of interaction, movie ID,
user ID, user session ID.

Table 4
Representation of file features Shared_articles.csv
  timestamp        eventType         contentId          authorPersonId authorSessionId Author
                                                                                       UserAgent
1   1459411468      ContentShared      -4011547382      38732923901    243872438932 Nan
2   1459411469      ContentShared      -3834093833      37239832892    894173187267 Nan
3   1459411470      ContentShared      -3736267384      56712348938    -21378327824    Nan
4   1459411471      ContentShared      -3284737777      23923802332    327632872398 Nan
5   1459411471      ContentShared      -5671839300      23983298320    23932893232     Nan
   The users_interaction.csvfile stores information about user interaction with articles. This dataset
includes the following types of interactions: Views, Preferences, Comments, Tracking (user will be
notified of new comments on this article), Saved (the user saved the article to return to it in the
future). In the Table 5 users_interaction.csva dataset with a timestamp, type of interaction, movie ID,
user ID, user session ID.
   The next stage is the transformation of data, where for each type of interaction is given a certain
weight (Figure 3), which will reflect the user's interest in a particular article.

                                                                                                        66
 Table 5
 Representation of file featuresusers_interaction.csv
       timestamp         eventType    PersonId           SessionId     contentId          userAgent
1      12782187          View         -2383298233        1872414232    3223898921         Nan
2      14789898          Follow       83893722323        3213313132    2392841894         Mozilla
3      12873891          View         31839212323        2298283933    8023974873         Nan




 Figure 3: Giving certain weights for interaction

    Also a common problem in referral systems is the cold start problem, so you should only work
 with users who have 5 or more interactions.. On the DeskDrop platform, the user can view articles
 several times and interact with them each time, which is why you should create a new column that
 will reflect the user's interaction with this article by summing up all types of interactions.
    In the Table 6 presents data on user interaction with the recommendation system. Let's see what
 the columns with which the user interacts will look like.

 Table 6
 Representation of interaction between users and certain content
           personId                      ContentId                          EventStrength
   0       -231789239                    -89762372                          1.00000
   1       -998327887                    -83478183                          1.00000
   2       -932834343                    -23873277                          3.16943

    The following is a list of the most popular algorithms.

 3.1. Overview of basic alorithms and models in recommendation systems
    1. Model by popularity
    The most common model because of its simplicity. This model is not personalized at all. It simply
 recommends to the user the most popular (with the highest rating) items or content. In general, it
 offers good recommendations that are liked and will be interesting to most.
    It shows in the metric Recall @ 5, where the figures are about 24%, which means that 24 percent
 with which the user interacted, the system was able to predict the ranking in the top 5. And with
 Recall @ 10, the figures generally reach 37% (Figure 4).
    2. Content based filtering model.
    This model uses content attributes that can be recommended to the user of the article, similar to
 those with which he has already interacted. TF-IDF, a popular technique in search engines, is
 commonly used to work with text. This technique converts unstructured text into a vector, where each
 word is represented by a word and the position of that word in the vector. To prepare a user profile,
 take all the articles he interacted with and display the main words in them and multiply them by the
 weight of each article relative to the user (The more the user interacted with the article, the more
 important the keywords in it will be).

                                                                                                   67
Figure 4: Metrics Recall @ 5 for the model by popularity
   This method received a score of Recall @ 5 = 0.162 ~ 16.2 percent. Recall @ 10 = 0.261 ~ 26
percent (Figure 5). As you can see in Figure 6, this model, despite the fact that it is more difficult to
implement showed worse results than a simpler model in popularity.




Figure 5: Metrics met Recall @ 5 at Content based filtering model
   3. Collaborative model.
   This model is divided into two types:
    Memory-based - this model uses previous user interactions with articles to find a user with
       similar preferences and use it for recommendations in the future.
    Model baseduses different methods and models of machine learning (neural networks,
       Bayesian networks) to cluster users and find common preferences between them.
   Next, you need to evaluate a system based on the Matrixfactorization model. In this case, in Figure
6 ratings for Recall @ 5 (33%) and for Recall @ 10 (46%).




   Figure 6: Metric indicators Recall@5 при colloborative model
   4. Hybrid model
   The last and most progressive model, which combines the two previous models (colloborative and
content-basedfiltering). This model showed the best results, namely Recall @ 5 = 34.2%, Recall @ 10
= 47.9% (Figure 7).




Figure 7: Metrics Recall @ 5 in the hybrid model
   In the Table 7 shows the results of comparison of the main models in the recommendation
systems.


                                                                                                      68
Table 7
Comparison of basic models in recommendation systems
                                  recall@5                             recall@10
   Conten-based                   0,16                                 0,26
   Popularity                     0,24                                 0,37
   Colloborative filtering        0,33                                 0,46
    After the results given in Table 7, it can be concluded that for further development of the system it
is necessary to use a hybrid model for the best results. We should also give an example of a more
modern method that has gained popularity, namely the factorization of the matrix
(Matrixfactorization). To begin with, let's learn what factorization is. Factorization is the
decomposition of a matrix into principal components. Take for example a table where the columns
correspond to the names of the films, and the rows of user ratings for these films (Table 8).
Table 8
User ratings for specific movies
                 Avenger        Thor           DeadPool       Avatar          Rocky          Titanic
 Pumba           4              5              3              3               1              -
 Henry           5              -              3              2               -              4
 Jerry           1              2              2              -               4              2
 Tom             3              4              -              2               4              1
 Timon           4              2              3              5               3              -

    If there is a dash at the place of evaluation, it means that the user has not watched this movie and
the task is to predict his impressions after watching.
    Accordingly, in Figure 9 the initial matrix is marked in blue, let's call it V , and the next two
matrices, on which the initial matrix should be decomposed, are called W and H . Thus it is possible
to deduce the general kind of expression: V (m * n)  W (m * k ) * H (k * n) , where k – count of
components. This means that when multiplying matrices W and H we obtain an approximate matrix
V in which empty values will take on a certain meaning that will correspond to the predicted
estimates of users for certain products. There are three main methods of decomposing matrices and
their comparison is shown in Figure 8.
    As can be seen from Figure 8 SVD and NNMF methods work best. The choice between them
depends only on the data set, but they have one significant difference. When SVD works with a range
of numbers from minus infinity to plus infinity, the result of the method can give the same range of
numbers. And in the analysis, the NNMF method works only with positive numbers.

4. Work results
    To get results on the proposed models, the work created a web service for finding movies, where
the user can search for movies, as well as view detailed information about them, as well as the movie
rating. Based on this data about the user's interaction with the site, you can create recommendations.
MovieLensDataset was used to build the service. Recommendations should be based on implicit
feedback. To do this, the client side collects information about user clicks, while recording
information in the object, which consists of the name of the movie, the number of clicks on this
movie, as well as its evaluation. After the user has watched several movies, the information is sent to
the server where the object was used as test data (Figure 9). As can be seen from Figure 10, the server
also receives user id information to make personalized recommendations. As initial data on object the
client with 10 films which can be interesting to the user is sent and we receive the list of
recommendations. In general, the system implements several methods of recommendations, so you
need to call a certain, of your choice, to get results. The following methods of recommendations are
implemented in the proposed system: Linear Regression Prediction, Content-Based Prediction,
Collaborative-Filtering Prediction User-Based, Collaborative-Filtering Prediction Item –Based.

                                                                                                       69
Figure 8: Comparison of basic decomposition methods




Figure 9: Customer feedback information, where each line corresponds to the addition of a new
review for a new movie.

                                                                                          70
Figure 10: Output of movies that are offered to the user through personalized analysis

5. Discussion of results
   In an information-saturated world, referral systems play an essential role in the user's interaction
only with potentially exciting information. In this paper, a comparative analysis of the main
approaches to implementing procedures of this kind. Several basic methods were compared during the
study. The most popular are basic, subject-basic, hybrid-basic and matrix factorization. Figure 11
shows the results of this study. On the results shown in Figure 11, it can be seen that models give the
best accuracy based on the hybrid approach and matrix factorization. If there are opportunities and
necessary personalized recommendations, then the best ones are the ones that are different from
neural networks and other approaches. After all, they remain transparent and easy to implement. If
personalization is not required, using a popularity system is sufficient for most tasks. They also
significantly simplify the procedure.




Figure 11: Evaluation of the results of the use of methods in the construction of recommendation
systems

6. Conclusions
   Today, during a pandemic, many businesses have their recommendation pages. For example, we
can name such giants as Amazon, Google, Linkedin. In this paper, much attention was paid to
methods based on matrix expansions for recommendation systems, namely for the reconstruction of
the rating table. Based on these methods, data analysis for the selected dataset was performed. Each of
the studied methods has its characteristics and is worth noting because it is helpful for a specific range
of goals set by the developer of recommendation systems. Thus, the proposed models allow us to
focus on the characteristics of the object, which determine the rating of the product or service it is



                                                                                                       71
looking for. The application of the proposed algorithms allowed you to choose the best option for
creating your recommendation system, which offers the user a behaviour model.

7. References
[1] S. Bo, Ch. Haiyan, A Survey of k Nearest Neighbor Algorithms for Solving the Class
     Imbalanced Problem, Wireless Communications and Mobile Computing, Vol. 2021, 2021.
     https://doi.org/10.1155/2021/5520990.
[2] A. Trabelsi, Z. Elouedi, and E. Lefevre, Decision tree classifiers for evidential attribute values
     and class labels, Fuzzy Sets and Systems, vol. 366, 2019, pp. 46–62.
[3] M. T. Jones, Recommender systems, Introduction to approaches and algorithms. Retrieved
     November 25, 2017.https://www.ibm.com/developerworks/library/os-recommender1/
[4] N. Shakhovska, N. Boyko, P. Pukach, The information model of cloud data warehouses.
     Advances in Intelligent Systems and Computing (AISC), 871, 2019, pp. 182–191
[5] N. Kunanets, O. Vasiuta, N. Boikо, Advanced Technologies of Big Data Research in Distributed
     Information Systems, in: Proceedings of the 14th International conference "Computer sciences
     and Information technologies", 2019, pp. 71-76. DOI: 10.1109/STC-CSIT.2019.8929756
[6] A. Tejeda-Lorente, C. Porcel, E. Peis, R. Sanz, E. Herrera-Viedma, A quality-based
     recommender system to disseminate information ina university digital library, Information
     Sciences, 266, 2014, pp. 52 - 69.
[7] C. Aggarwal, Recommender Systems. Springer, 2016, p. 498. doi: https://doi.org/10.1007/978-3-
     319-29659-3
[8] B. Hallinan, T. Striphas, Recommended for you: The Netf lix Prize and the production of
     algorithmic culture. New Media & Society, 18 (1), 2014, pp. 117–137. doi:
     https://doi.org/10.1177/1461444814538646
[9] G. Adomavicius, J. Bockstedt, S. Curley, J. Zhang, De-Biasing User Preference Ratings in
     Recommender Systems, in: Proceedings of the Joint Workshop on Interfaces and Human
     Decision Making for Recommender Systems co-located with ACM Conference on
     Recommender Systems, 2–9. Available at: http://ceur-ws.org/Vol-1253/paper1.pdf
[10] I. Gunes, C. Kaleli, A. Bilge, H. Polat, Shilling attacks against recommender systems: a
     comprehensive survey. Arti-ficial Intelligence Review, 42 (4), 2014, pp. 767–799. doi:
     https://doi.org/10.1007/s10462-012-9364-9
[11] Y. Wang, L. Qian, F. Li, L. Zhang, A Comparative Study on Shilling Detection Methods for
     Trustworthy Recommen-dations. Journal of Systems Science and Systems Engineering, 27 (4),
     2018, pp. 458–478. doi: https://doi.org/10.1007/s11518-018-5374-8
[12] K. Patel, A. Thakkar, C. Shah, K. Makvana, A State of Art Survey on Shilling Attack in
     Collaborative Filtering Based Recommendation System. Smart Innovation, Systems and
     Technologies, 2018, pp. 377–385. doi: https://doi.org/10.1007/978-3-319-30933-0_38
[13] W. Zhou, J. Wen, M. Gao, H. Ren, P. Li, Abnormal Profiles Detection Based on Time Series and
     Target Item Analysis for Recommender Systems. Mathematical Problems in Engineering, 2015,
     pp. 1–9. doi: https://doi.org/10.1155/2015/490261
[14] M. Gao, Q. Yuan, B. Ling, Q. Xiong, Detection of Abnormal Item Based on Time Intervals for
     Recommender Sys-tems. The Scientific World Journal, 2014, pp. 1–8. doi:
     https://doi.org/10.1155/2014/845897
[15] M. Gao, R. Tian, J. Wen, Q. Xiong, B. Ling, L. Yang, Item Anomaly Detection Based on
     Dynamic Partition for Time Series in Recommender Systems. PLOS ONE, 10 (8), 2015, pp.135-
     155. doi: https://doi.org/10.1371/journal.pone.0135155
[16] O. Chala, L. Novikova, L. Chernyshova, Method for detecting shilling attacks in e-commerce
     systems using weighted temporal rules. EUREKA: Physics and Engineering, 5, 2017, pp. 29–36.
     doi: https://doi.org/10.21303/2461-4262.2019.00983
[17] V. Levykin, O. Chala, Method of determining weights of temporal rules in Markov logic network
     for building knowledge base in information control systems. EUREKA: Physics and
     Engineering, 5, 2018, pp. 3–10. doi: https://doi.org/10.21303/2461-4262.2018.00713
[18] S. Chalyi, V. Leshchynskyi, I. Leshchynska, Method of forming recommendations using
     temporal constraints in a situation of cyclic cold start of the recommender system. EUREKA:
     Physics and Engineering, 4, 2019, pp. 34–40. doi: https://doi.org/10.21303/2461-
     4262.2019.00952

                                                                                                   72