=Paper=
{{Paper
|id=Vol-2933/paper12
|storemode=property
|title=Comparison of the Effectiveness of Various Algorithms on a Recommendation System (short paper)
|pdfUrl=https://ceur-ws.org/Vol-2933/paper12.pdf
|volume=Vol-2933
|authors=Gulnara Bektemyssova,Yerassyl Akhmer
}}
==Comparison of the Effectiveness of Various Algorithms on a Recommendation System (short paper)==
<pdf width="1500px">https://ceur-ws.org/Vol-2933/paper12.pdf</pdf>
<pre>
    Comparison of the Effectiveness of Various Algorithms
             on a Recommendation System

                       Bektemyssova Gulnara and Akhmer Yerassyl

            International Information Technology University, Almaty, Kazakhstan


      Abstract. Recommender systems attempt to identify user information by proposing
      related products or resources that customers may be interested. Recommender
      methods have attracted attention in the fields of information technology, ecommerce,
      and so on, by essentially fertilizing from a standard collection of decisions that
      led consumers to find information of interest. This research focuses on the three
      common recommendation systems: Collaborative Filtering, Content-Based
      Filtering, and Hybrid recommendation systems. For the purposes of this analysis, the
      well-known MovieLens dataset has been used. The assessment considered both the
      quantitative and qualitative dimensions of the recommendation systems. This paper
      describes the field of various recommendation approaches and related fundamental
      techniques. Any algorithm in this field has both benefits and drawbacks. The goal
      of the research is to bring various algorithms to the test in order to find the right one
      based on the layout of the dataset and the researchers’ goals.


      Keywords: Recommender Systems, Collaborative Filtering, Content-Based Filter-
      ing, E-Commerce, Hybrid Recommendation System.


1   Introduction
Recommender systems are an integral part of e-commerce today. The active
transition from traditional offline sales to online makes the introduction of
machine learning technologies and algorithms for recommendations more
and more popular in retail. [1]. Recommendations simplify shopping for store
customers, and allow sellers to increase customer loyalty by saving time and
an individual approach to product offerings, as well as increasing the product
matrix and average customer check. Unlike e-commerce, grocery chains do not
represent how customers react to promoted products in real time. However, thanks
to loyalty programs and check databases, it is possible to build a recommendation
system from scratch. [2].
     In this paper, we will look at various concepts of recommender systems. We
will introduce how they perform, define their theoretical background, and start
debating their strengths and limitations for each of them. A comparative analysis


 Copyright © 2021 for this paper by its authors. Use permitted under
 Creative Commons License Attribution 4.0 International (CC BY 4.0).
of these algorithms is carried out from the point of view of the criteria of the ac-
curacy of the results obtained and the performance.
     In the first part, we will address the two main methodologies of recommend-
er systems: collaborative and content-based approaches. The following two parts
would then go through different collaborative filtering methods, such as user-
user, item-item, and matrix factorization. The part that follows presents contents-
based approaches and their operation. Finally, we will go over how to assess a
recommender system.
     In retail, three types of recommendations are commonly used: content, col-
laborative, and hybrid. Recommendation systems are frequently divided into
three large categories:
     • Content-Based systems, which are using keywords to propose products to
         a client that are close to those historically favored [7];
     • Collaborative Filtering methods, that propose products based on informa-
         tion recently seen or purchased.
     • Hybrid Recommendation methods, which provide a variation of Content-
         Based and Collaborative Filtering techniques to overcome some of the
         shortcomings that occur in the above-mentioned systems.

2   Approaches

2.1 Collaborative filtering
Collaborative recommendation is quite certainly the commonly used, and
advanced of the approaches. Collaborative recommender frameworks combine
item ratings or suggestions, identify common threads among customers based
on their scores, and produce new suggestions based on inter-user correlations.
This approach may be Memory-Based Collaborative Filtering, which measures
customer’s access using similarity or other metrics, or Model-Based Collaborative
Filtering, that derives a template from past prescriptive analytics and uses it to
make forecasts. [3, 4].

2.2 Content-based filtering
Even though Collaborative Filtering is well known and effective, it has
drawbacks. One of them is the sparsity dilemma, which happens when users
give no scores; throughout this situation, our model is unable to produce fair
suggestions. To address the sparsity problem, study suggests Content-based
Recommender Systems, which are focused on the analysis of adjunct data such
as text, photographs, and videos, as well as customers’ accounts. [5]. Assume
anyone loves science fiction, romance, and action films but not fantasy films.
Through period, the algorithm could collect this knowledge and decide that the

                                        123
client has a high approval rating for genres such as science fiction, romance,
and action, and a negative rating for fantasy. The algorithm could even discover
which actors the client likes and dislikes. Also with tiny remarks, the customer’s
choice may be inferred in this manner. The critical point between Content Based
Filtering and Collaborative Filtering is that Collaborative Filtering proposes new
products depending on the taste of the customers who have common preferences
for many other products, while Content Based Filtering is focused on the analysis
of source data and is not associated with the expectations of many other clients.

2.3 Hybrid recommender system
The term “hybrid recommendation strategy” applies to a recommendation
system that employs two or more sources of recommendation methods in order to
achieve better results while minimizing the disadvantages of each particular one.
Collaborative filtering is often paired with another method.

3   Related works
When working with items containing textual data, content-based systems
yield outcomes that are more accurate. However, these systems are incapable
of distinguishing between a well-written text definition and a poorly written
one, particularly when similar or different phrases are used [8]. Furthermore,
these systems are sometimes constrained by the over similarity issue; when a
system suggests products that have a higher correlation to a customer’s profile,
the client is likely to be recommended with products that are identical to those
which have already been seen [10]. Besides that, when a new customer enters
in the system with little or no rankings, he or she is very likely to be given low
accuracy suggestions (this is recognized as the cold-start or new-user problem)
[8]. As mentioned in [10]. Content-based systems need a great amount of scores
before recommending products to a consumer with high precision. Collaborative
Filtering methods, in comparison to content-based systems, result in bias due
to the sparsity problem [8]. Since the amount of items on e-commerce websites
is immense, the most frequent users normally rank only a portion of the given
data. It implies that some of the most common products have very few scores
and therefore have a low probability of being suggested by the system [8, 9].
Collaborative Filtering systems, like Content-based systems, should have a large
number of relevant data on a user account before producing correct predictions.
Furthermore, new products must be assessed by a wide range of users; otherwise,
the RS would be unable to offer suggestions for items [11]. In specific, RS face
technical challenges; given the massive quantities of data available on websites
and apps, a significant amount of computing effort has been put to generate
suggestions [9].

                                       124
4   Preliminary experiments
For preliminary study, we used the ‘MovieLens 1M Dataset.’ The dataset includes
1,000,209 anonymous reviews of roughly 3,900 movies submitted by 6,040
MovieLens subscribers who entered the site in 2000. We explicitly selected two
documents: ratings and movies. There were four fields in the ratings file. They
are as follows: UserID (scale from 1 to 6040), MovieID (varies from zero to
3952), Ratings (a 5-star ranking), and Timestamp (in seconds after the epoch).
Each consumer does have at least 20 ratings. There were three basic forms in the
movies log. They are as follows: MovieID, Title, and Genres. Titles are much the
same as given by IMDB (including year of release). Genres are tube and chosen
from the categories listed: Children’s, Comedy, Crime, Documentary, Drama,
Fantasy, Film-Noir, Horror, Musical, Mystery, Romance, Sci-Fi, Thriller, War
and Western.
     We conducted preliminary research study on the datasets. Figure 1 depicts
the histogram of average ratings posted by customers. As we can see, this plot
resembles a normal distribution with a strong left tail. The majority of users have
average scores between 3.5 and 4.


                      Fig. 1. Histogram of users’ average scoring.

    Fig. 2 depicts a histogram of user-rated products. According to these two
graphs, most consumers score just a few objects.


                                         125
                         Fig. 2. Histogram of items rated by users.


5   Results and discussion
     Quantitative analysis starts by examining the RMSE and MAE errors of a
Collaborative Filtering-based and a Hybrid system. Since the Content-Based Fil-
tering approach has quite a statistical attribute. In this section, we select the top-
recommended movies from both methods for ten clients and compute the RMSE
errors for each method for analysis. The RMSE graph for ten clients in Fig. 3
shows that perhaps the hybrid model has a relatively lower RMSE. Fig. 4’s typi-
cal RMSE plot also illustrates the hybrid system’s supremacy.


         Fig. 3. RMSE of collaborative filtering and hybrid recommendation system.


                                           126
      Fig. 4. Average RMSE of collaborative filtering and hybrid recommendation system.

    Next, we consider 5 batches of users with each batch containing 5 users for
whom we do the same test. We calculated the MAE of these sets of users that is
shown in Fig. 5 and the comparison shows Hybrid system performs comparative-
ly better. Fig. 6 shows the average MAE of Collaborative Filtering and Hybrid
Recommendation System.


  Fig. 5. MAE of collaborative filtering and hybrid recommendation system for 5 sets of users.


   Fig. 6. Average MAE of collaborative filtering based and hybrid recommendation system
                                     for 5 sets of users.


                                             127
     Fig. 7 shows that Collaborative Filtering will predict which films a client is
likely to score higher. However and therefore has no possibility of suggesting
related movies to a specific one suited to the consumer? The genres are all around
the place, as shown by the genre section. In this segment, we assume User 1 and
propose the top 20 movies that he is likely to appreciate high.


     Fig. 7. Collaborative filtering based recommendation system’s top 20 suggested films
                                       for a particular user.

     A Content-Based Filtering recommendation framework, from the other side,
seems to have the opportunity to relate us so much similar movies to a specified
one, as seen in Figure 8, it has very little insight into whether a client will like that
or not. In this part, we select Movie Name: Toy Story 39 (1995) with Movie ID 1
and propose the top 20 films that are close to the film, Toy Story.


                                            128
       Fig. 8. Top 20 content-based filtering recommendation system recommendations
                                       for a specific film.

     We get the best possible outcome in a hybrid system. In this section, we iden-
tify User ID 1, Movie Toy Story (1995) with Movie ID 1, and suggest the top 20
films that are close to Toy Story and are probably to still be ranked highly by the
User 1. As a result, we may infer that perhaps a hybrid recommendation system
outperforms a separate Collaborative Filtering or Content-Based Filtering recom-
mendation system from both qualitative and quantitative terms.

6   Conclusion
Within the same dataset, three techniques were applied in the analysis to build
a recommendation method. By using possibly the best MovieLens dataset, we
examined various recommendation mechanisms such as Collaborative Filtering,
Content – Based Filtering and Hybrid recommendation systems. We contrasted
all three-suggestion mechanisms using a descriptive and analytical assessment of
the dataset. The need for a combined quantitative and qualitative analysis reflects
the fact that Content-Based Filtering processes cannot be easily evaluated.
Furthermore, for any recommender system, the qualitative analysis is vital. In
addition, that is why, in addition to the conventional methodology, we developed
our unique assessment process. We discovered that a hybrid recommendation
system outperforms a traditional recommendation system in all scenarios.
Following the example of the whole study, there have been possibilities for
additional research. In the suggestion method, for instance, we did not take into

                                           129
account any demographic details about the client. Even so, considering this will
bring more dimension of complexity to the hybrid recommendation framework.
Furthermore, we just addressed genre in our Content-Based Filtering suggestion,
but one should check at production team as well as movie ratings for any further
similarities. A correlation of various Collaborative Filtering-based approaches
and consistency tests can also be of concern.

References
1. Kalitin D.V. Artificial neural networks [Electronic resource]: tutorial / Kalitin DV – Electron.
    Text data. — Moscow: Misis Publishing House, 2018. — 88 p
2. Francesco R., Lior R. and Bracha Sh. Introduction to Recommender Systems Handbook.
    Springer, 2011, pp. 1-35
3. Markovsky I. Low-Rank Approximation: Algorithms, Implementation, Applications, Springer,
    2012, ISBN 978-1-4471-2226-5.
4. Takacs G., Pilaszy I., Nemeth B., Tikk D. (March 2009). Scalable Collaborative Filtering Ap-
    proaches for Large Recommender Systems (PDF). Journal of Machine Learning Research 10:
    623–656.
5. Brusilovsky P. (2007). The Adaptive Web. p. 325. ISBN 978-3-540-72078-2.
6. MovieLens dataset, https://grouplens.org/datasets/movielens
7. Konstan J.A. and Riedl J. (2012). Recommender systems: from algorithms to user experience.
    User Model. User-Adapt. Interact., 22(1-2):101–123.
8. Adomavicius G. and Tuzhilin A. (2005). Toward the next generation of recommender sys-
    tems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng.,
    17(6):734–749.
9. Pu P., Chen L., and Hu R. (2012). Evaluating recommender systems from the user’s perspec-
    tive: survey of the state of the art. User Model. User-Adapt. Inter act., 22(4- 5):317–355.
10. Lu L., Medo M., Yeung C. H., Zhang Y.-C., Zhang Z.-K., and Zhou T. (2012). Recommender
    systems. Physics Reports, 519(1):1–49.
11. Ning X., Desrosiers C. and Karypis, G. (2015). A comprehensive survey of neighborhood-
    based recommendation methods. In Recommender Systems Handbook, pages 37–76.


                                               130

</pre>