=Paper=
{{Paper
|id=Vol-3132/Paper_6.pdf
|storemode=property
|title=Comparative Analysis of Basic Approaches to Implementing Model-Based Recommendation Systems Based on Implicit Economic Information
|pdfUrl=https://ceur-ws.org/Vol-3132/Paper_6.pdf
|volume=Vol-3132
|authors=Yurii Kryvenchuk,Viktoriia Lakiza,Yuliia Bidak,Iryna Myskiv
|dblpUrl=https://dblp.org/rec/conf/iti2/KryvenchukLBM21
}}
==Comparative Analysis of Basic Approaches to Implementing Model-Based Recommendation Systems Based on Implicit Economic Information ==
Comparative Analysis of Basic Approaches to Implementing
Model-Based Recommendation Systems Based on Implicit
Economic Information
Yurii Kryvenchuk, Viktoriia Lakiza, Yuliia Bidak and Iryna Myskiv
Lviv Polytechnic National University, Profesorska Street 1, Lviv, 79013, Ukraine
Abstract
The paper considers ways to solve the problem of Internet congestion.Analogs of
recommendation systems of different researchers are also given. The main algorithms in
recommendation systems are analyzed: Content based, demographic based, Coloborative
filter. Two types of data are considered, which help to form an overall assessment in the
recommendation system. The main problems that shape the work with recommendation
systems are considered.The tasks of recommendation systems are analyzed in detail. The
paper provides a step-by-step creation of a recommendation system and identifies the main
requirements that it must meet.The study presents a similarity matrix, which is calculated
from the entire recommendation vector. The personalization of the recommendation is also
calculated.The matrix factorization method is analyzed (Matrixfactorization). The evaluation
that follows from the user profile is considered.In the work, to get results on the proposed
models, offers its own web service for finding movies, where the user can search for movies,
as well as view detailed information about them or the movie rating. Recommendations in
this system are based on implicit feedback, and it is possible to receive information about the
user's id to make personalized recommendations.The implemented methods of
recommendations are also analyzed:Linear Regression Prediction, Content-Based Prediction,
Collaborative-Filtering Prediction User-Based, Collaborative-Filtering Prediction Item –
Based.
Keywords 1
Machine Learning, Artificial Intelligence, neural network, intelligent technologies, sum of
error squares, recommendation system, content-based approach.
1. Introduction
Today, the problem of Internet congestion remains open. The amount of information on the
Internet is growing exponentially every day [1, 5, 18]. Recommendation systems are a relatively
young field. It all started in 2006 when Netflix launched the Netflix Prize data analysis competition.
Around the same time, the annual RecSys conference on referral systems began, which is still held
today [3, 7].
The study aims to describe models where the components for translating the characteristics of user
behaviour are his assessments, which are used for his content recommendations [6, 2]. If the problem
is attributed to the difficulties of classification or regression, the list of required algorithms is quite
comprehensive. Therefore, the study should pay attention to the work of several algorithms based on
accurate data [4, 8].
Before starting the study, a list of characteristics that can describe any recommendation system is
given.
Information Technology and Implementation (IT&I-2021), December 01–03, 2021, Kyiv, Ukraine
EMAIL: yurkokryvenchuk@gmail.com (Yu. Kryvenchuk); viktoriia.v.lakiza@lpnu.ua (V. Lakiza); yuliia.bidak.knm.2019@lpnu.ua
(Yu.Bidak); myskiviryna@i.ua (I. Myskiv); inem.news@gmail.com (Yu. Malynovskyy)
ORCID: 0000-0002-2504-5833 (Yu. Kryvenchuk); 0000-0002-6764-8536 (V. Lakiza); 0000-0002-9780-1546 (Yu.Bidak); 0000-0002-
3761-2276 (I. Myskiv); 0000-0002-7139-5623 (Yu. Malynovskyy)
©️ 2022 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
63
Subject of recommendations - what is recommended. It can be anything: movies, music,
products, news, articles, books, products, videos, people and more.
Purpose of recommendations - the navigator is recommended. They are gathering, providing
information, training, meeting new people.
Recommendation context - what the user is doing at the moment. You are browsing products,
listening to music, communicating with people.
Source of recommendation - who recommends. Audience-like users, experts.
Degree of personalization. Non-personal recommendations - when you are recommended all
the same as other users. They allow targeting by region or time but do not consider their
preferences. Additional enhancements include the number of recommendations for your current
session. You have reviewed several products and recommended similar products for you.
Personal recommendations contain all available information about customers, including the
history of their purchases.
Transparency. People trust recommendations more when they understand what they are based
on. So there is less chance of coming across a system that recommends offering goods or
services.
Recommendation format. This can be included in a window, a sorted list found in certain parts
of the site, a bar that opens the screen, or something else.
Algorithms. Despite many available algorithms, they all come down to a few basic approaches.
The most classic is:
Summary (non-personal);
Based on content (models based on the product description);
Collaborative (collaborative filtering);
Matrix factorization (methods based on matrix schedules).
To define recommendations, standard filtering systems must correlate two fundamentally different
objects: elements and users. Therefore, the aim of this study is to compare two main approaches,
which are the two main methods of joint filtering: the neighborhood approach and the model of
hidden factors. Neighbourhood methods focus on relationships between objects or between users.
The relevance of this study is the process of modeling user preferences based on assessments of
similar aspects of the same user [9, 12].
Hidden factor models, such as matrix factorization (SVD), contain an alternative approach, turning
both elements and users into the same confidential factor space. Latent space explains ratings by
characterizing products and users by factors that automatically follow from user feedback.
Matrix decomposition methods [8, 10] combine ease of implementation with relatively high
accuracy. This made them the best technique for solving the most extensive public data set - Netflix
data. Hidden factor models (LFMs) are suitable for co-filtering with the holistic purpose of
identifying latent features that explain the observed estimates; examples include pLSA, neural
networks, latent Dirichlet distribution, and models induced by factoring the evaluation matrix of user
elements (also known as SVD models) [11, 15]. Recently, models based on matrix extensions have
gained popularity due to their attractive accuracy and scalability.
2. Materials and methods
When searching for information, matrix decomposition methods are used to identify hidden
semantic factors. However, its application to precise estimates in co-filtering is complex due to the
large proportion of missing values [13, 16]. The usual matrix decomposition method is not determined
when knowledge of the matrix is incomplete. Moreover, the careless attitude towards only a few well-
known records is prone to excessive placement. Previous work has relied on imputation, filling in the
gaps and making the rating matrix dense [14, 17]. However, the hint can be very expensive, as it
significantly increases the amount of data. In addition, data can be distorted considerably due to false
imputations. Thus, newer works suggest directly modelling only the observed ratings, avoiding
adjustable model branches.
64
2.1. Tasks of the recommendation system
The task of the recommendation system is to inform the user about the product that may interest
him most at a particular time. The customer receives recommendations about the product he needs,
and the service earns, depending on the business model, recommendation systems can be profitable in
different ways [7, 12]. The first option is the direct sale of goods. The following can affect the number
of users and in turn the revenue from advertising and so on.
In the previous section, the main principles, problems and objectives of recommendation systems
were discussed. This should focus on preparing for practical implementation [10, 16]. The first step is
to define the requirements that the recommendation system must meet.
1. Coverage. Coverage is the percentage of test items that a test set recommendation system may
recommend..
2. Personalization. Personalization shows how many identical things the recommendation system
shows to different users. Personalization is calculated in Table 1.
Table 1
Requirement for the recommendation system - personalization
A B C D X Z
0 1 1 1 1 0 0
1 1 1 1 0 1 0
2 1 1 1 0 0 1
Binary variables define two states (1 – the subject was recommended to the user. 0 – was not). The
next step is to calculate the similarity matrix for users in Table 2.
Table 2
Similarity matrix for users
0 1 2
0 1 0,75 0,75
1 0,75 1 0,75
2 0,75 0,75 1
The similarity matrix is calculated from the whole recommendation vector. Personalization= 1 –
0,75 = 0,25. The next step is to calculate the average of the upper triangle and subtract from the unit.
A high score means that the model provides highly personalized recommendations.
3. Estimation of similarity
The similarity assessment determines how much similar items are advised to the user. This uses
feature features (such as genres in movies) to calculate similarity. Let's look at an example of Figure
1.
Figure 1: Example of defining movie id
In the Table 3 defined features about the object - the film, which are determined by the user using
the recommendation system. So, in Table 3 genres for recommended movies for the first user. In
Figure 2 shows an assessment of similar films received by the user of the recommendation system.
The higher the rating, the more similar movies the user will receive. Therefore, the metrics that
determine the quality of the recommendation system should be considered. Recall and Precision at k.
This metric was commonly used in binary classification algorithms. Now this is one of the effective
65
ways to determine the quality of the recommendation system. In this case, it is necessary to say
whether the recommendation interested the user or not. A rating of 1-5 is usually used for this.
Table 3
Representation of features on the film using the recommendation system
movieId Action Comedy Romance
3 0 1 0
7 0 1 0
5 0 1 0
9 1 0 0
Figure 2: Score for films offered by the recommendation system
To translate the rating into the binary system, suffice it to say that all values above a certain level
should be considered positive. For example, take the value of 3.5 (these can be absolute values
depending on the problem). The next step is to determine the ‘k’. Since recommendation systems
usually return a list of recommended products, only the first ‘k’ should be considered..
This metric shows the percentage of recommendations from the top ‘k’ items that were correct and
relevant to the user.
3. Experiments
The main part of working with recommendation systems is data. To review the algorithms, use the
Deskdrop dataset, which includes 12-month records from CI & T's Internal Communication platform
(DeskDrop). It includes information about 73 thousand users who interacted with 3000 articles
distributed on the platform and includes 2 files: shared_articles.csv; users_interactions.csv.
Their structure should be considered for analysis. For the Shared_articles.csv file, which contains
information about common files on the platform, where each article has its original url, title, content
as plain text, language, and information about the user who published the article. Also, each time
stamp has two possible events: Content is distributed and available to users; Content has been deleted
and is not available to users. In the Table 4 shows data with timestamp, type of interaction, movie ID,
user ID, user session ID.
Table 4
Representation of file features Shared_articles.csv
timestamp eventType contentId authorPersonId authorSessionId Author
UserAgent
1 1459411468 ContentShared -4011547382 38732923901 243872438932 Nan
2 1459411469 ContentShared -3834093833 37239832892 894173187267 Nan
3 1459411470 ContentShared -3736267384 56712348938 -21378327824 Nan
4 1459411471 ContentShared -3284737777 23923802332 327632872398 Nan
5 1459411471 ContentShared -5671839300 23983298320 23932893232 Nan
The users_interaction.csvfile stores information about user interaction with articles. This dataset
includes the following types of interactions: Views, Preferences, Comments, Tracking (user will be
notified of new comments on this article), Saved (the user saved the article to return to it in the
future). In the Table 5 users_interaction.csva dataset with a timestamp, type of interaction, movie ID,
user ID, user session ID.
The next stage is the transformation of data, where for each type of interaction is given a certain
weight (Figure 3), which will reflect the user's interest in a particular article.
66
Table 5
Representation of file featuresusers_interaction.csv
timestamp eventType PersonId SessionId contentId userAgent
1 12782187 View -2383298233 1872414232 3223898921 Nan
2 14789898 Follow 83893722323 3213313132 2392841894 Mozilla
3 12873891 View 31839212323 2298283933 8023974873 Nan
Figure 3: Giving certain weights for interaction
Also a common problem in referral systems is the cold start problem, so you should only work
with users who have 5 or more interactions.. On the DeskDrop platform, the user can view articles
several times and interact with them each time, which is why you should create a new column that
will reflect the user's interaction with this article by summing up all types of interactions.
In the Table 6 presents data on user interaction with the recommendation system. Let's see what
the columns with which the user interacts will look like.
Table 6
Representation of interaction between users and certain content
personId ContentId EventStrength
0 -231789239 -89762372 1.00000
1 -998327887 -83478183 1.00000
2 -932834343 -23873277 3.16943
The following is a list of the most popular algorithms.
3.1. Overview of basic alorithms and models in recommendation systems
1. Model by popularity
The most common model because of its simplicity. This model is not personalized at all. It simply
recommends to the user the most popular (with the highest rating) items or content. In general, it
offers good recommendations that are liked and will be interesting to most.
It shows in the metric Recall @ 5, where the figures are about 24%, which means that 24 percent
with which the user interacted, the system was able to predict the ranking in the top 5. And with
Recall @ 10, the figures generally reach 37% (Figure 4).
2. Content based filtering model.
This model uses content attributes that can be recommended to the user of the article, similar to
those with which he has already interacted. TF-IDF, a popular technique in search engines, is
commonly used to work with text. This technique converts unstructured text into a vector, where each
word is represented by a word and the position of that word in the vector. To prepare a user profile,
take all the articles he interacted with and display the main words in them and multiply them by the
weight of each article relative to the user (The more the user interacted with the article, the more
important the keywords in it will be).
67
Figure 4: Metrics Recall @ 5 for the model by popularity
This method received a score of Recall @ 5 = 0.162 ~ 16.2 percent. Recall @ 10 = 0.261 ~ 26
percent (Figure 5). As you can see in Figure 6, this model, despite the fact that it is more difficult to
implement showed worse results than a simpler model in popularity.
Figure 5: Metrics met Recall @ 5 at Content based filtering model
3. Collaborative model.
This model is divided into two types:
Memory-based - this model uses previous user interactions with articles to find a user with
similar preferences and use it for recommendations in the future.
Model baseduses different methods and models of machine learning (neural networks,
Bayesian networks) to cluster users and find common preferences between them.
Next, you need to evaluate a system based on the Matrixfactorization model. In this case, in Figure
6 ratings for Recall @ 5 (33%) and for Recall @ 10 (46%).
Figure 6: Metric indicators Recall@5 при colloborative model
4. Hybrid model
The last and most progressive model, which combines the two previous models (colloborative and
content-basedfiltering). This model showed the best results, namely Recall @ 5 = 34.2%, Recall @ 10
= 47.9% (Figure 7).
Figure 7: Metrics Recall @ 5 in the hybrid model
In the Table 7 shows the results of comparison of the main models in the recommendation
systems.
68
Table 7
Comparison of basic models in recommendation systems
recall@5 recall@10
Conten-based 0,16 0,26
Popularity 0,24 0,37
Colloborative filtering 0,33 0,46
After the results given in Table 7, it can be concluded that for further development of the system it
is necessary to use a hybrid model for the best results. We should also give an example of a more
modern method that has gained popularity, namely the factorization of the matrix
(Matrixfactorization). To begin with, let's learn what factorization is. Factorization is the
decomposition of a matrix into principal components. Take for example a table where the columns
correspond to the names of the films, and the rows of user ratings for these films (Table 8).
Table 8
User ratings for specific movies
Avenger Thor DeadPool Avatar Rocky Titanic
Pumba 4 5 3 3 1 -
Henry 5 - 3 2 - 4
Jerry 1 2 2 - 4 2
Tom 3 4 - 2 4 1
Timon 4 2 3 5 3 -
If there is a dash at the place of evaluation, it means that the user has not watched this movie and
the task is to predict his impressions after watching.
Accordingly, in Figure 9 the initial matrix is marked in blue, let's call it V , and the next two
matrices, on which the initial matrix should be decomposed, are called W and H . Thus it is possible
to deduce the general kind of expression: V (m * n) W (m * k ) * H (k * n) , where k – count of
components. This means that when multiplying matrices W and H we obtain an approximate matrix
V in which empty values will take on a certain meaning that will correspond to the predicted
estimates of users for certain products. There are three main methods of decomposing matrices and
their comparison is shown in Figure 8.
As can be seen from Figure 8 SVD and NNMF methods work best. The choice between them
depends only on the data set, but they have one significant difference. When SVD works with a range
of numbers from minus infinity to plus infinity, the result of the method can give the same range of
numbers. And in the analysis, the NNMF method works only with positive numbers.
4. Work results
To get results on the proposed models, the work created a web service for finding movies, where
the user can search for movies, as well as view detailed information about them, as well as the movie
rating. Based on this data about the user's interaction with the site, you can create recommendations.
MovieLensDataset was used to build the service. Recommendations should be based on implicit
feedback. To do this, the client side collects information about user clicks, while recording
information in the object, which consists of the name of the movie, the number of clicks on this
movie, as well as its evaluation. After the user has watched several movies, the information is sent to
the server where the object was used as test data (Figure 9). As can be seen from Figure 10, the server
also receives user id information to make personalized recommendations. As initial data on object the
client with 10 films which can be interesting to the user is sent and we receive the list of
recommendations. In general, the system implements several methods of recommendations, so you
need to call a certain, of your choice, to get results. The following methods of recommendations are
implemented in the proposed system: Linear Regression Prediction, Content-Based Prediction,
Collaborative-Filtering Prediction User-Based, Collaborative-Filtering Prediction Item –Based.
69
Figure 8: Comparison of basic decomposition methods
Figure 9: Customer feedback information, where each line corresponds to the addition of a new
review for a new movie.
70
Figure 10: Output of movies that are offered to the user through personalized analysis
5. Discussion of results
In an information-saturated world, referral systems play an essential role in the user's interaction
only with potentially exciting information. In this paper, a comparative analysis of the main
approaches to implementing procedures of this kind. Several basic methods were compared during the
study. The most popular are basic, subject-basic, hybrid-basic and matrix factorization. Figure 11
shows the results of this study. On the results shown in Figure 11, it can be seen that models give the
best accuracy based on the hybrid approach and matrix factorization. If there are opportunities and
necessary personalized recommendations, then the best ones are the ones that are different from
neural networks and other approaches. After all, they remain transparent and easy to implement. If
personalization is not required, using a popularity system is sufficient for most tasks. They also
significantly simplify the procedure.
Figure 11: Evaluation of the results of the use of methods in the construction of recommendation
systems
6. Conclusions
Today, during a pandemic, many businesses have their recommendation pages. For example, we
can name such giants as Amazon, Google, Linkedin. In this paper, much attention was paid to
methods based on matrix expansions for recommendation systems, namely for the reconstruction of
the rating table. Based on these methods, data analysis for the selected dataset was performed. Each of
the studied methods has its characteristics and is worth noting because it is helpful for a specific range
of goals set by the developer of recommendation systems. Thus, the proposed models allow us to
focus on the characteristics of the object, which determine the rating of the product or service it is
71
looking for. The application of the proposed algorithms allowed you to choose the best option for
creating your recommendation system, which offers the user a behaviour model.
7. References
[1] S. Bo, Ch. Haiyan, A Survey of k Nearest Neighbor Algorithms for Solving the Class
Imbalanced Problem, Wireless Communications and Mobile Computing, Vol. 2021, 2021.
https://doi.org/10.1155/2021/5520990.
[2] A. Trabelsi, Z. Elouedi, and E. Lefevre, Decision tree classifiers for evidential attribute values
and class labels, Fuzzy Sets and Systems, vol. 366, 2019, pp. 46–62.
[3] M. T. Jones, Recommender systems, Introduction to approaches and algorithms. Retrieved
November 25, 2017.https://www.ibm.com/developerworks/library/os-recommender1/
[4] N. Shakhovska, N. Boyko, P. Pukach, The information model of cloud data warehouses.
Advances in Intelligent Systems and Computing (AISC), 871, 2019, pp. 182–191
[5] N. Kunanets, O. Vasiuta, N. Boikо, Advanced Technologies of Big Data Research in Distributed
Information Systems, in: Proceedings of the 14th International conference "Computer sciences
and Information technologies", 2019, pp. 71-76. DOI: 10.1109/STC-CSIT.2019.8929756
[6] A. Tejeda-Lorente, C. Porcel, E. Peis, R. Sanz, E. Herrera-Viedma, A quality-based
recommender system to disseminate information ina university digital library, Information
Sciences, 266, 2014, pp. 52 - 69.
[7] C. Aggarwal, Recommender Systems. Springer, 2016, p. 498. doi: https://doi.org/10.1007/978-3-
319-29659-3
[8] B. Hallinan, T. Striphas, Recommended for you: The Netf lix Prize and the production of
algorithmic culture. New Media & Society, 18 (1), 2014, pp. 117–137. doi:
https://doi.org/10.1177/1461444814538646
[9] G. Adomavicius, J. Bockstedt, S. Curley, J. Zhang, De-Biasing User Preference Ratings in
Recommender Systems, in: Proceedings of the Joint Workshop on Interfaces and Human
Decision Making for Recommender Systems co-located with ACM Conference on
Recommender Systems, 2–9. Available at: http://ceur-ws.org/Vol-1253/paper1.pdf
[10] I. Gunes, C. Kaleli, A. Bilge, H. Polat, Shilling attacks against recommender systems: a
comprehensive survey. Arti-ficial Intelligence Review, 42 (4), 2014, pp. 767–799. doi:
https://doi.org/10.1007/s10462-012-9364-9
[11] Y. Wang, L. Qian, F. Li, L. Zhang, A Comparative Study on Shilling Detection Methods for
Trustworthy Recommen-dations. Journal of Systems Science and Systems Engineering, 27 (4),
2018, pp. 458–478. doi: https://doi.org/10.1007/s11518-018-5374-8
[12] K. Patel, A. Thakkar, C. Shah, K. Makvana, A State of Art Survey on Shilling Attack in
Collaborative Filtering Based Recommendation System. Smart Innovation, Systems and
Technologies, 2018, pp. 377–385. doi: https://doi.org/10.1007/978-3-319-30933-0_38
[13] W. Zhou, J. Wen, M. Gao, H. Ren, P. Li, Abnormal Profiles Detection Based on Time Series and
Target Item Analysis for Recommender Systems. Mathematical Problems in Engineering, 2015,
pp. 1–9. doi: https://doi.org/10.1155/2015/490261
[14] M. Gao, Q. Yuan, B. Ling, Q. Xiong, Detection of Abnormal Item Based on Time Intervals for
Recommender Sys-tems. The Scientific World Journal, 2014, pp. 1–8. doi:
https://doi.org/10.1155/2014/845897
[15] M. Gao, R. Tian, J. Wen, Q. Xiong, B. Ling, L. Yang, Item Anomaly Detection Based on
Dynamic Partition for Time Series in Recommender Systems. PLOS ONE, 10 (8), 2015, pp.135-
155. doi: https://doi.org/10.1371/journal.pone.0135155
[16] O. Chala, L. Novikova, L. Chernyshova, Method for detecting shilling attacks in e-commerce
systems using weighted temporal rules. EUREKA: Physics and Engineering, 5, 2017, pp. 29–36.
doi: https://doi.org/10.21303/2461-4262.2019.00983
[17] V. Levykin, O. Chala, Method of determining weights of temporal rules in Markov logic network
for building knowledge base in information control systems. EUREKA: Physics and
Engineering, 5, 2018, pp. 3–10. doi: https://doi.org/10.21303/2461-4262.2018.00713
[18] S. Chalyi, V. Leshchynskyi, I. Leshchynska, Method of forming recommendations using
temporal constraints in a situation of cyclic cold start of the recommender system. EUREKA:
Physics and Engineering, 4, 2019, pp. 34–40. doi: https://doi.org/10.21303/2461-
4262.2019.00952
72