Analysis of Recommendation System Methods for Accuracy of Predicted Estimates Nataliya Boyko, Petro Telishevskyi and Beata Kushka Lviv Polytechnic National University, Profesorska Street 1, Lviv, 79013, Ukraine Abstract The Internet plays a major role in people's lives today. In the network, people communicate with friends, meet new people, develop themselves, buy goods and services, and spend time on entertainment, namely, watching movies, listening to music and so on. Because there is enough content on the Internet, it means that it is difficult for people to physically choose what they want at the moment. Therefore, web services use different levels of referral systems. Because, recommendation systems help us choose from a thousand not a lot of content that interests us, but what is not interesting to reject. Recommendation systems have been implemented for a long time, but have only recently been developed and applied. Namely, with the active development of the Internet. The most successful recommendation systems are systems based on collaborative filtering. Investigate the methods used in collaborative filtering, namely, User-based, Item-based and SVD. Conducting experimental studies on the data using methods for recommendation. A comparative characterization between the two methods after the experiments. Keywords 1 Data Mining, Recommended System, Algorithms, SVD, method, Item-Based Method, NMF, RMSE 1. Introduction One of the manifestations of information uncertainty is uncertainty caused by data gaps. The objective characteristics of certain processes can be changed or even distorted due to the loss of some data during their receipt, transmission or storage [3,8]. There is a need to recover such missed data and, importantly, to select the algorithms by which they will be recovered, because incorrect or insufficiently reliable recovery can cause more damage than the data gaps themselves [1, 5]. There are algorithms that allow you to process gaps with the necessary information, such as the Hot Deck method, the Barlet method, the Resampling algorithms, Zet, Zetbraid, EM estimation, regression modeling and value prediction. A feature of these algorithms is the filling of gaps with values that are selected by the algorithm [2, 4]. Recommendation systems are systems that try to solve the problem of information reloading on the Internet with the help of classification techniques and information retrieval. Using various techniques, they are created to search and recommend to users information that will be of interest to them. These systems are widely used today in marketing, social networking and entertainment. Corporations use referral systems to increase traffic to their site as well as increase sales. For example, here are the statistics of well-known companies:  In Netflix, 2/3 of the movies watched by users have been offered by the system.  Google has improved (Click-through rate, CTR) by 38 percent. COLINS-2021: 5th International Conference on Computational Linguistics and Intelligent Systems, April 22–23, 2021, Kharkiv, Ukraine EMAIL: nataliya.i.boyko@lpnu.ua (N. Boyko); petro.telishevskyi.kn.2017@lpnu.ua (P. Telishevskyi); beatakushka@yahoo.fr (B. Kushka) ORCID: 0000-0002-6962-9363 (N. Boyko); 0000-0002-6187-740X (P. Telishevskyi); 0000-0002-4080-4607 (B. Kushka) ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org)  Amazon sells 35% of all products with recommendations. The most popular are systems that work on the model of explicit feedback (explicitfeedback), their main essence is that users leave reviews about a product or service and based on these reviews, recommendations are built in this work, we will explore a different approach, namely implicitfeedback. This model allows based on user behavior to predict the user's attitude to a particular product or service. As a result, we can automatically generate a content rating and use it for other users. A recommended system is a system that recommends content to us among the vast amount of information from our needs. Users who use the resource, where there are recommendations, determine their interests using estimates [6, 10]. To create resources for watching videos, to store preferences, recommendation systems use user profiles, because these profiles store an estimate of the content of the resource. When a user evaluates, new recommendations for the user are calculated and compared with his profile [3, 12]. Ratings can be explicit or implicit. The evaluation that the user makes on the content that interests him is called explicit. And when the benefits are made on purchases, on the pages to which users went, it is an implicit rating. Most recommendation systems use one of three approaches:  Collaborative filtering;  Content-based filtering;  Hybrid filtration. 2. Methods To date, there are many technologies and tools for analysis, classification. These include machine learning, BigData, Natural Language Processing (NLP), and referral systems. In many cases, such systems are created to predict and analyze user behavior. Recommendation systems are designed to simplify and improve, speed up the user's search for the necessary content. They are very important, because they allow the user to interact only with the content that may interest him, and therefore increase the efficiency of the system. Wang P. said that: "Recommendation systems are systems that aim to select specific objects that meet user requirements, where each of these objects is stored in computer memory and is characterized by a set of attributes." Using specially collected information, recommendation systems can predict whether the user is interested in this content, rank. Or choose a specific set of N items that may be of interest to the user. From these statements there are two problems which solve systems of recommendations. The first problem is the problem of predicting when it is necessary to predict whether this content will be interesting. As well as the problem (top-N) of choosing a set of data that might interest the user. This helps companies increase profits by reducing purchases of goods that will not be sold any time soon. Modern recommendation systems provide high forecasting accuracy, but only if there is sufficient information about the user and his preferences. When a user does not provide enough information about himself, the system is unable to make the correct prediction Lack of sufficient information can lead to the following problems. The problem of similarity arises when it is difficult to determine the similarity between users because the number of features about the user is less than necessary for a quality recommendation. Researcher Yu also made an interesting conclusion about this problem, that in fact the content suffers from this problem no less than users. Because they typically use fewer features which is often not enough for popular algorithms such as feature-based, content-based algorithms. A Cold Start Problem occurs when a newly registered user has not yet rated any content and the system has no information about it. The problem of changing the taste. For example, today the buyer is looking for a product for himself and tomorrow he is looking for a gift for his mother, so the user may have incorrect ideas about his tastes. The problem of new things happens when the system does not know about specific content because no one has evaluated it yet. Unpredictable things depend on the user's preferences, especially in music, it is difficult for the system to evaluate such things, because everyone can have their own reaction to this content. Basic algorithms in recommendation systems:  Content based This approach recommends that users use similar things to those they have liked in the past. Keyword search is often used to find similar things. In this case, the company Pandora music online service in the framework of MUSIC GENOME PROJECT to create for each of its songs a vector consisting of approximately 450 features. This allowed the use of more standard machine learning approaches that gave the output the likelihood that the user would like the music.  Demographic based Another approach is when there is a lot of information about users such as Facebook, Linkedin. Therefore, you can use this information to set the direction of recommendations regardless of previous user behavior.  Coloborative filter This method is based on user-content interaction. It only analyzes the rating and ignores information about the content or user. The key idea is that such users like similar things and if users have watched the same movies, there is a high probability that in the future they will also get the same recommendation. All you need is some kind of assessment that arises from the interaction of user and content. There are two types of such data:  Explicit - this is when the available rating matches the rating or preferences. Popular services offer users to evaluate things that interest them in different ways. For example Netflix, Youtube use the like / dislike system. At the same time, Amazon and Aliexpress trading platforms use a system of stars, the maximum number of which is 10.  Implicit is the process by which a service collects information about a user's interaction with content with information about clicks, views, or purchases. It also takes into account the time: how much the user spent on the page and many other factors. A striking example is online cinemas, where each film has a certain rating. It allows you to see in numerical terms the user's preferences for the film. The only problem is that some users do not leave any feedback, so the recommendation system may be inaccurate for this and implicit evaluation is used because it allows you to increase the amount of information about the user Currently, many commercial pages on the Internet have their own recommendation pages. Ahead of others are sites such as Netflix, Amazon, Google, Linkedin. Many researchers also presented their versions of recommendation systems. Today, there is a lot of research on referral systems, but they all face the problem of insufficient data. And implicit evaluation allows you to improve these results, because it does not require any additional action from the user. There are several major problems when working with referral systems. Problems (1-2) were encountered at the beginning of the development of recommendation systems. Now they are successfully solved within the framework of modern algorithms. 1. The problem of similarity. A hybrid algorithm is used to solve this problem. Due to which we do not always need information about the user to give an accurate and adequate recommendation. 2. The problem of lack of data. It is solved by using implicit estimation, and also colloborative algorithm. This is usually sufficient for recommendations with high accuracy. The main problem is the problem of cold start. 3. Cold start problem. Consider a problem with new content that has not yet received enough ratings to be recommended. This problem can be solved thanks to the content-based algorithm, hybrid algorithm. Due to certain categories that will be when mastering the content it is possible to show it along with similar content. However, not every content is labeled with enough classes or certain features to be used in recommendations. In this case, text mining uses several basic algorithms. Here is an example based on the recommendations of websites. 1. Preprocessing stage In this case, highlight the keywords, highlight the theme of the page, discard the stop words. 2. TF-IDF (TF — term frequency, IDF — inverse document frequency) nw d At this stage, the calculation is performed for each word w in each document d. tf ( w, d )  , nd where n w d - the total number of word entries in the document, n d - the total number of words in the document. This method is used to discard redundant information and not store it in memory. Because in the relevance feedback algorithm, only 200 words with maximum weight will suffice. 3. Relewant feedback This algorithm brings us to the problem. It is based on page ratings that the user likes but without overall page ratings. But it is usually used for new content, because it loses in efficiency to more traditional algorithms. The first step is to find a TF-IDF for sites that the user likes (if the user has just started using the service, it is good practice to let him mark several sites (movies, books,…) that he visited and that he liked). The second step is to find the similarity of the page to the user's preferences, which is calculated as  a scalar product of the vectors of user weights and sites k (u , d )  wW Vww * tfdf ( w, d ) , where U WU - words from the user profile. As a result, this algorithm allows you to explore the similarity of any page that has text to any user who has certain preferences. The problem with the cold start for the user is that the system meets the user for the first time and there is no information about him in his memory. This problem can occur constantly. The first example is when a user has not registered and information about him is stored in cookies, but users always have the opportunity to delete their information. The system will then consider them as new users. The second problem arises when a user searches for goods for someone. For example, the user went to the site in search of a particular product and the relevant recommendations he needs only until the time of purchase. And then he will need another product. As a result, the user remains the same, but his tastes can be radically different. The most common solution to this problem is to use geolocation. When a user has just registered or is using the service for the first time, we will show him information that is popular in his area of residence at a certain point in time. After a few likes or interactions with the content, the system will be able to show more accurate suggestions. 2.1. User-based method A method that is based on the user liking products that have been selected by users similar to him (Formula 1) [11, 13]. 1 rˆui  ri   sim (i, i)(ru ,i  ri ). iR (u ) sim (i, i ) iR (u ) (1) The degree of similarity sim(i, i ) is calculated from the matrix of estimates R. The most commonly used similarity metric is the Pearson correlation and the cosine distance of the rows (columns) of the matrix (Formula 2-3). sim (u, u )   iR ( u ) R ( u ) (ru ,i  r u )( ru,i  r u ) . (2) iR(u )R(u) (ru,i  r u ) 2 iR(u )R(u) (ru,i  r u ) 2 sim (u, u )   iR ( u ) R ( u ) (ru ,i ru,i ) . (3)  iR ( u ) R ( u ) u ,i iR ( u ) R ( u ) u 2 r r 2.2. Item-based method A method that is based on the fact that the user will like products similar to those he has already chosen. User A is characterized by objects that he has viewed or rated. For each of the selected / rated objects, m neighboring objects are defined, ie there are m most similar objects in terms of user views / ratings. When building a PC for movies, m takes values from 10 to 30. All neighboring objects are combined into a collection, which excludes objects viewed or evaluated by user A. And from the remains of the collection is built top n recommendations. Thus, in the item-based approach, all users who liked this or that object from the collection take part in creating recommendations (Formula 4) [14, 15]. 1 rˆui  r i   sim (i, i)(ru ,i  r i ). iR (u ) sim (i, i) iR (u ) (4) 2.3. SVD algorithm Almost all collaborative filtering algorithms have such shortcomings as cold start, triviality of recommendation results, and so on. One of the fairly new algorithms that reduces the impact of typical collaborative filtering problems was the SVD algorithm, which was created to improve the results of conventional algorithms [16, 17]. SVD is a method of factorization of matrices, which is usually used to reduce the number of data set functions by reducing the size of space from n to k, where k