Recommendation System Based on a Compact Hybrid User Model Using Fuzzy Logic Algorithms Nina Khairova1, Nataliia Sharonova1, Dmytro Sytnikov2, Mykyta Hrebeniuk2, Polina Sytnikova2 1National Technical University β€œKharkiv Polytechnic Institute”, Kyrpychova str. 2, Kharkiv, 61002, Ukraine 2Kharkiv National University of Radio Electronics, Nauky Ave. 14, Kharkiv, City, 61166, Ukraine Abstract The paper presents algorithm designed to address the challenges of traditional collaborative filtering methods by integrating a compact hybrid user model. This model incorporates hybrid features, demographic information, and fuzzy logic principles to improve recommendation accuracy. A key contribution of this work is the development of an innovative approach for calculating user similarity using fuzzy logic algorithms. By considering fuzzy concepts, the proposed approach effectively captures the inherent uncertainty and imprecision in user preferences, leading to more nuanced and accurate recommendations. Experimental evaluations conducted on the widely used MovieLens dataset provide insights into the performance of the proposed algorithm compared to traditional collaborative filtering techniques such as Pearson correlation and cosine similarity. The dataset, which contains both user ratings and demographic details, serves as a comprehensive testbed for assessing recommendation systems. The results of the experiments demonstrate the superiority of the proposed approach in capturing user similarities and enhancing recommendation accuracy. This paper contributes to the ongoing progress in recommendation systems by proposing a solution that addresses the challenges associated with traditional collaborative filtering methods. Through the integration of hybrid user models, demographic data, and fuzzy logic principles, the proposed algorithm offers a promising approach for enhancing recommendation accuracy across diverse application domains. Keywords 1 Recommendation system, fuzzy logic, hybrid feature, compact hybrid user model, fuzzy user model, similarity, fuzzy distance 1. Introduction With the proliferation of data-driven technologies and the increasing volume of information across various domains, the demand for efficient recommendation systems has grown substantially. These systems play a vital role in assisting users in discovering relevant items or content based on their preferences and behavior. Collaborative filtering, a widely used approach in recommendation systems, leverages user interactions and similarities to generate personalized recommendations. However, traditional collaborative filtering methods often face challenges related to scalability and accuracy, particularly when dealing with sparse or incomplete data. To address these challenges, hybrid recommendation systems have emerged as a promising solution by integrating multiple recommendation techniques, such as collaborative filtering and demographic filtering. By combining collaborative data with demographic information, hybrid systems aim to enhance recommendation accuracy and mitigate sensitivity to data sparsity. Additionally, the incorporation of fuzzy logic allows for a more nuanced representation of user preferences, accommodating the inherently fuzzy nature of human decision-making. COLINS-2024: 8th International Conference on Computational Linguistics and Intelligent Systems, April 12–13, 2024, Lviv, Ukraine khairova.nina@gmail.com (N. Khairova); nvsharonova@ukr.net (N. Sharonova); dmytro.sytnikov@nure.ua (D. Sytikov); mykyta.hrebeniuk@nure.ua (M. Hrebeniuk); polina.sytnikova@nure.ua (P. Sytnikova) 0000-0002-9826-0286 (N. Khairova); 0000-0002-8161-552X (N. Sharonova); 0000-0003-1240-7900 (D. Sytikov); 0009-0008-0989-7957 (M. Hrebeniuk); 0000-0002-6688-4641 (P. Sytnikova) Β© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings This paper explores the development and evaluation of a hybrid recommendation system that integrates collaborative filtering with demographic information and fuzzy logic. The system utilizes a compact user model, incorporating genre interest indicators (GII) derived from user ratings and demographic attributes such as age, gender, and profession. By hybridizing at both the feature and model levels, the system aims to improve recommendation accuracy while maintaining scalability. Experimental evaluations are conducted using the MovieLens dataset, comprising user ratings for movies along with demographic information. The performance of the proposed hybrid recommendation system is compared against traditional collaborative filtering methods, including Pearson correlation and cosine similarity. The experiments aim to assess the effectiveness of the fuzzy distance function in capturing user similarities and improving recommendation accuracy. Through this research, insights into the efficacy of hybrid recommendation systems and the impact of incorporating fuzzy logic are gained. The findings contribute to advancing the understanding of recommendation system design and optimization, with implications for enhancing user experiences across various applications and domains. 2. Related works The landscape of recommender systems has been shaped by extensive research efforts over the past decade, with a focus on improving recommendation accuracy and addressing various challenges [1]. Collaborative filtering (CF) has remained a cornerstone in recommendation system research, owing to its ability to provide personalized recommendations based on user- item interactions [2]. Hybrid recommendation systems, which integrate multiple recommendation techniques, have emerged as a promising approach to overcome the limitations of individual methods. [3,4] explores the landscape of hybrid recommender systems, offering insights into different hybridization techniques and identifying trends in hybrid recommender system research. Paper [5] describes a hybrid music recommendation system that combine collaborative filtering with content-based filtering techniques. By integrating user preferences with item features such as genre and artist, these systems aim to provide more diverse and personalized music recommendations for automated playlist continuation. Another area of exploration involves weighting strategies [6] for recommender systems that cluster items based on genres. By assigning weights to items within each cluster, these systems aim to enhance recommendation accuracy by giving more importance to relevant genres in the user's preferences. [7] introduces the concept of genre interest measure (GII), which is a hybrid feature that combines user ratings and movie genres and represents user preferences at the model level. Also, an important part of the recommendation system is the calculation of similarity. [8] has delved into evaluating different similarity measures used in collaborative filtering-based recommender systems. Traditional measures such as Pearson correlation and cosine similarity, as well as more advanced techniques like adjusted cosine similarity and Jaccard coefficient, are reviewed and compared to evaluate their performance in recommendation tasks through experimental comparisons. Additionally, the integration of demographic information into recommendation systems has shown promise. Addressing the cold-start problem in recommender systems, another study leverages user demographic attributes to provide personalized recommendations to new users with limited interaction history. By incorporating demographic information such as age, gender, and location into recommendation algorithms, the system aims to mitigate the cold-start problem and offer relevant recommendations [9,10]. Specifically tailored for movie recommendations, a demographic collaborative recommender system leverages demographic information such as age and gender to enhance collaborative filtering algorithms, offering accurate and personalized movie recommendations to users [10]. The incorporation of fuzzy logic principles into recommendation algorithms has garnered attention for its capacity to model the inherent uncertainty in user preferences. A paper [11] provides an overview of fuzzy logic techniques applied in recommender systems, discussing how fuzzy logic can address uncertainty and imprecision in user preferences and item attributes. Various fuzzy logic-based recommendation approaches are reviewed, along with their effectiveness in improving recommendation accuracy. By integrating fuzzy logic with CF techniques, as demonstrated in [12,13], recommendation accuracy can be improved, highlighting the importance of incorporating fuzzy concepts in user modeling. [14] addresses this problem by using fuzzy C-means method and comparing its performance against other clustering techniques used in user-based Collaborative filtering recommendation systems. By incorporating fuzzy logic principles, the system offers a more nuanced understanding of user preferences, leading to more accurate recommendations. In summary, the integration of collaborative filtering, hybrid recommendation techniques, and fuzzy logic principles has led to significant advancements in recommendation system effectiveness. 3. Methods and materials Formally, we have 𝑀 users, π‘ˆ = {𝑒1 . . . , 𝑒𝑀 }, explicit or implicit ratings of 𝐾 items, 𝑆 = {𝑠1 . . . , 𝑠𝐾 }, such as news, web pages, books, games, or movies. The spaces 𝑆 and π‘ˆ are large and can be very large in some cases. Each user 𝑒𝑖 , where 𝑖 = 1, . . . , 𝑀 rated a subset of items 𝑆𝑖 . The rating of user 𝑒𝑐 for item π‘ π‘˜ , where π‘˜ = 1, . . . , 𝐾 is denoted as π‘Ÿπ‘,π‘˜ . All available ratings are collected in an 𝑀 Γ— 𝐾 user-item matrix denoted as 𝑅. The architecture of different recommendation systems can be centralized or distributed. In this work, we assume a centralized architecture where the recommendation system is in one specific place. During the development of a recommendation system, the following five phases can be identified: 1. Data collection 2. User model formation 3. Similarity computation 4. Neighbor selection 5. Predictions and recommendations 3.1 Data collection The recommendation system should have as much information as possible about users to provide them with satisfactory results from the very beginning. This information includes user interests, origin, habits, personal data, and other details. Typically, three types of data can be collected from users in addition to product descriptions, namely demographic information during registration, explicit ratings (expressing users' opinions on items) for a subset of available items, and implicit data from user behavior on the service. Implicit ratings relate to the interpretation of user behavior or choice for assigning a rating or preference based on viewing data, purchase history, or other types of information access models. Additionally, the recommendation system should have access to a database of items being evaluated (in our case, movies). 3.2 User model formation Memory-based collaborative filtering is more accurate, but its scalability compared to model- based recommendation systems is poor. In addition, actual user preferences may not always be captured solely through ratings, and therefore, some item content descriptions are needed. This can be achieved if we build a hybrid user model [3] that integrates user ratings with some item content descriptions. The user-based memory-based collaborative filtering model consists of a vector of elements whose ratings increase as the user interacts with the system over time. This huge amount of data requires a very large space and extremely long processing time. During a query search across the entire database to find the best set of neighbors is computationally expensive. On the other hand, model-based collaborative filtering receives a model from a group of users that may be far from the actual preferences of the users. While memory-based collaborative filtering is simple, it provides recommendations with high accuracy and allows for easy addition of new data, but it is expensive in terms of computation as the size of the input data set increases. Ultimately, the user may leave the website until processing is complete. On the other hand, applying only model-based collaborative filtering to such sparse data, although reducing the cost of online processing, often comes at the price of recommendation accuracy. However, one of the common threats in current recommendation system research is the need to combine recommendation methods to mitigate sparsity and scalability problems. But most common hybridization methods create two separate models and implement an online process for each filtering technique separately. Finally, some merging is used to obtain the result. What if we build a user model according to a certain filtering technique, and then apply another filtering method to the created model? Thus, only one online filtering process (Collaborative filtering) should be used, while another filtering method (Content-based filtering) is used to densify the data. To achieve this, the utilization of hybrid features is proposed. In our methodology, we incorporate the concept of Genre Interest Indicator (GII) [6] to enhance the user model formation process. The GII is a measure of a user's interest in specific genres of items, such as movies. It is calculated based on explicit ratings provided by users for items belonging to different genres. This approach allows us to capture nuanced user preferences beyond simple ratings, enabling the recommendation system to better understand user tastes. To implement the GII, we utilize a hybrid approach that combines collaborative filtering with demographic data. Specifically, we leverage explicit user ratings to link users to genres, while also incorporating demographic information such as age, gender, and profession. This hybrid user model provides a more accurate representation of user preferences by considering both explicit ratings and demographic factors. 3.2.1 Combining collaborative and demographic data The compact user model described above uses genre interest indicator (GII) to build a model by linking explicit ratings to genres. However, the assertion that two people are similar is based not only on whether they have similar thoughts on a particular topic but also on other factors, such as their background and personal data. In many cases, the ratings claimed by some users are not sufficient to describe them adequately. Therefore, a hybrid user model with age, gender, and professions as demographic information, in addition to GII, may be a good choice for creating more accurate and individual recommendations. Combining features of collaborative and demographic filtering allows for considering explicit ratings without relying solely on them, thus reducing the sensitivity of collaborative filtering to the number of ratings [8]. Conversely, it enables having demographic information about users that would otherwise be unavailable. Moreover, most current hybrid recommendation systems are weighted systems, where the online process is realized for each filter separately and then some merging is used to obtain the final result. In this work, we will attempt to introduce hybridization at two different levels, namely at the model level and at the approach level. Figure 1 illustrates the hybrid user model that we introduce to obtain hybrid collaborative/demographic filtering. Figure 1: Hybrid user model structure Accordingly, the hybrid user model consists of age, gender, and profession as demographic information, and GII for genres, as shown in Table 1. Table 1 An example of a hybrid user model β„– Gender Age Profession Degree Drama … Triller 1 Male 24 Programmer Master's 𝐺𝐼𝐼𝑒𝑖 (𝐺1 ) … 𝐺𝐼𝐼𝑒𝑖 (𝐺𝑁 ) … … … … … … … … 3.2.2 Fuzzy user model Fuzzy sets were introduced as a generalization of classical crisp sets in order to deal with fuzzy concepts such as "young," "rich," "tall," etc. Instead of the rigid membership of elements in a crisp set (1 if an element belongs to the set and 0 otherwise), a fuzzy set allows elements to have a partial degree of membership, i.e., any value in the interval [0,1]. In the theory of fuzzy sets, a fuzzy subset A of the universe of discourse U is described by a membership function πœ‡π΄ (π‘₯): π‘₯ ∈ π‘ˆ β†’ [0,1], which represents the degree of membership of x in the set A. Fuzzy logic refers to all theories and technologies that use fuzzy sets. For recommendation systems, most user preferences are fuzzy, so fuzzy logic is an appropriate tool for representing these preferences. In these methods, each object is represented by a set of primitive propositions, whose truth is determined in the object space by a value in the interval [0,1]. For example, a proposition could be "This movie is a comedy." The associated value with this proposition indicates the degree to which this movie is a comedy. The crisp description of age and GII in the hybrid user model (Table 1) does not reflect the real case of human decisions. For example, the distance between two users aged 15 and 19 is 4, while both users belong to the same age group, namely teenagers. These fuzzy characteristics need to be taken into account when comparing users. Below we will discuss how to implement the hybrid user model and introduce a fuzzy distance function for finding the nearest neighbors. The fuzzy user model will help create a set of neighbors as close as possible to the active user. However, to build a fuzzy model, it is first necessary to label the features of the user model. First of all, age is divided into three fuzzy sets: young, adult, and old (Figure 2), with the following membership functions: 1 π‘Ž ≀ 20 35βˆ’π‘Ž π΄π‘Œπ‘œπ‘’π‘›π‘” (π‘Ž) = {( ) 20 < π‘Ž ≀ 35 (1) 15 0 π‘Ž > 35 0 π‘Ž ≀ 20, π‘Ž > 60 π‘Žβˆ’20 ( 15 ) 20 < π‘Ž ≀ 35 𝐴𝐴𝑑𝑒𝑙𝑑 (π‘Ž) = (2) 1 35 < π‘Ž ≀ 45 60βˆ’π‘Ž {( 15 ) 45 < π‘Ž ≀ 60 0 π‘Ž ≀ 45 π‘Žβˆ’45 𝐴𝑂𝑙𝑑 (π‘Ž) = {( ) 45 < π‘Ž ≀ 60 (3) 15 1 π‘Ž > 60 Figure 2: Membership function for the age feature The values of gender and profession are considered as fuzzy points with a membership value of one. Finally, GII is divided into six fuzzy sets, very bad (VB), bad (B), average (AV), good (G), very good (VG), and excellent (E) with the following membership functions (Figure 3): 1βˆ’π‘Ž π‘Žβ‰€1 𝐡𝑉𝐡 (π‘Ž) = { (4) 0 π‘Ž >1 0 π‘Ž ≀ 𝑖 βˆ’ 2, π‘Ž > 𝑖 𝐡𝐴(𝑖) (π‘Ž) = {π‘Ž βˆ’ 𝑖 + 2 π‘–βˆ’2 < π‘Ž ≀ π‘–βˆ’1 𝑖 = 2,3,4,5 (5) π‘–βˆ’π‘Ž π‘–βˆ’1<π‘Ž ≀𝑖 Here, 𝐴(𝑖) = 𝐡, 𝐴𝑉, 𝐺, 𝑉𝐺 for 𝑖 = 2,3,4,5 respectively. 0 π‘Žβ‰€4 𝐡𝐸 (π‘Ž) = { (6) π‘Žβˆ’4 4<π‘Žβ‰€5 Figure 3: Membership function for the GII feature 3.3 Similarity computation After building a user model, a recommendation system compares the active user to the available database according to the corresponding similarity function. Based on the calculated similarity values, a connection is established between the active user and other users, allowing the recommendation system to form a set of neighbors for the active user. The choice of similarity function depends on the program and is based on the nature of the user model's features. Some similarity function modifiers have been introduced to refine or enhance the recommendation system's ability to find close neighbors. It should be noted that similarity calculations for collaborative filtering can be performed between items instead of users. This work only discusses user-based methods of similarity (user-to-user similarity), as it is the most popular. The similarity between two users is a measure of how similar they are to each other. Formally, the similarity function π‘ π‘–π‘š(𝒖𝒙 , π’–π’š ) for a set of users π‘ˆ is a function with a non-negative value: π‘ π‘–π‘š: π‘ˆ Γ— π‘ˆ β†’ 𝑅 + + {0} . Here, we distinguish between 𝑒π‘₯ and 𝒖𝒙 based on their context. When we use 𝑒π‘₯ , we refer to user-x, while 𝒖𝒙 represents the feature vector for the user-x model. The similarity function may have some of the following properties: (P1) Identity: βˆ€π‘’π‘₯ ∈ π‘ˆ, π‘ π‘–π‘š(𝒖𝒙 , 𝒖𝒙 ) > 0 (P2) Positivity: βˆ€π‘’π‘₯ (β‰  𝑒𝑦 ) ∈ π‘ˆ, π‘ π‘–π‘š(𝒖𝒙 , π’–π’š ) β‰₯ 0 (P3) Symmetry: βˆ€π‘’π‘₯ , 𝑒𝑦 ∈ π‘ˆ, π‘ π‘–π‘š(𝒖𝒙 , π’–π’š ) = π‘ π‘–π‘š(π’–π’š , 𝒖𝒙 ) Any function that satisfies (P1) is a similarity function. Although symmetry is a convenient property, it is not satisfied in all programs. Non-negativity is not satisfied for two standard examples: correlation coefficients and scalar products. Different similarity functions π‘ π‘–π‘š(𝒖𝒙 , π’–π’š ) were used in the study of collaborative filtering between users 𝑒π‘₯ and 𝑒𝑦 . The most popular similarity function for memory-based collaborative filtering is the Pearson correlation coefficient [7], where the similarity between two users is based only on their common ratings 𝑆π‘₯𝑦 . The Pearson correlation coefficient: βˆ‘π‘  βˆˆπ‘†π‘₯𝑦(π‘Ÿπ‘₯,π‘˜ βˆ’π‘šπ‘₯ )(π‘Ÿπ‘¦,π‘˜ βˆ’π‘šπ‘¦ ) π‘˜ π‘π‘œπ‘Ÿπ‘Ÿ(𝒖𝒙 , π’–π’š ) = (7) βˆšβˆ‘π‘ π‘˜ βˆˆπ‘†π‘₯𝑦 (π‘Ÿπ‘₯,π‘˜ βˆ’π‘šπ‘₯ )2 βˆ‘π‘ π‘˜ βˆˆπ‘†π‘₯𝑦 (π‘Ÿπ‘¦,π‘˜ βˆ’π‘šπ‘¦ )2 Another similarity function is the cosine similarity function [7], which considers two users as two vectors in an |𝑆π‘₯𝑦 | dimensional space. βˆ‘π‘  βˆˆπ‘†π‘₯𝑦 π‘Ÿπ‘₯,π‘˜ Γ—π‘Ÿπ‘¦,π‘˜ π‘˜ π‘π‘œπ‘ π‘–π‘›π‘’(𝒖𝒙 , π’–π’š ) = (8) βˆšβˆ‘π‘ π‘˜ βˆˆπ‘†π‘₯𝑦(π‘Ÿπ‘₯,π‘˜ )2 βˆ‘π‘ π‘˜ βˆˆπ‘†π‘₯𝑦(π‘Ÿπ‘¦,π‘˜ )2 On the other hand, dissimilarity is the opposite of similarity and is related to the concept of distance, where two terms are used interchangeably: small distances mean small differences, and large distances mean large differences. Formally, the distance function 𝑑𝑖𝑠(𝒖𝒙 , π’–π’š ) for a set of users π‘ˆ is a function 𝑑𝑖𝑠: π‘ˆ Γ— π‘ˆ β†’ 𝑅 + + {0}. The 𝑑𝑖𝑠 function may have some of the following properties: (P1) Identity or reflexivity: βˆ€π‘’π‘₯ ∈ π‘ˆ, 𝑑𝑖𝑠(𝒖𝒙 , 𝒖𝒙 ) = 0 (P2) Positivity: βˆ€π‘’π‘₯ (β‰  𝑒𝑦 ) ∈ π‘ˆ, 𝑑𝑖𝑠(𝒖𝒙 , π’–π’š ) > 0 (P3) Symmetry: βˆ€π‘’π‘₯ , 𝑒𝑦 ∈ π‘ˆ, 𝑑𝑖𝑠(𝒖𝒙 , π’–π’š ) = 𝑑𝑖𝑠(π’–π’š , 𝒖𝒙 ) (P4) Uniqueness or definiteness: 𝑑𝑖𝑠(𝒖𝒙 , π’–π’š ) = 0 β‡’ 𝑒π‘₯ = 𝑒𝑦 (P5) Triangle inequality: βˆ€π‘’π‘₯ , 𝑒𝑦, 𝑒𝑧 ∈ π‘ˆ, 𝑑𝑖𝑠(𝒖𝒙 , 𝒖𝒛 ) ≀ 𝑑𝑖𝑠(𝒖𝒙 , π’–π’š ) + 𝑑𝑖𝑠(π’–π’š , 𝒖𝒛 ) Generally, identity and positivity are crucial for determining the correct distance function. Obviously, formulas (1) and (2) are not suitable if the model includes diverse features because these formulas consider only the elements of the joint assessment of both users. The Euclidean distance function provides another way of computing differences for recommendation systems, which considers numerical peculiarities. 𝑑𝑖𝑠(𝒖𝒙 , π’–π’š ) = βˆšβˆ‘π‘›π‘—=1(π‘₯𝑗 βˆ’ 𝑦𝑗 )2 (9) , where π‘₯𝑗 β€” is the j-th feature of 𝒖𝒙 , and 𝑛 β€” is the number of features. 3.3.1 Fuzzy distance function Using a fuzzy user model has many advantages, but how can we compare two user models that have many fuzzy features? In general, each function has many fuzzy sets. Actually, the choice of distance function is an important issue for the system and depends largely on the problem itself. For the hybrid user model in Figure 1, a vector with N features represents the user, and therefore, for each function, a local fuzzy distance should be found. Therefore, for each pair of users, we have N local fuzzy distances. The global fuzzy distance could be obtained by two methods. The first method uses the fuzzy IF-THEN rule: IF(π‘₯1 is close to 𝑦1 ) and (π‘₯2 is close to 𝑦2 ) ... and (π‘₯𝑁 is close to 𝑦𝑁 ) THEN (𝒖𝒙 is similar to π’–π’š ). In this case, the global fuzzy distance is defined as: 𝑓𝑑𝑖𝑠(𝒖𝒙 , π’–π’š ) = π‘šπ‘–π‘›{𝑓𝑑𝑖𝑠(π‘₯1 , 𝑦2 ), 𝑓𝑑𝑖𝑠(π‘₯1 , 𝑦2 ), … , 𝑓𝑑𝑖𝑠(π‘₯𝑁 , 𝑦𝑁 )} (10) The second method considers each local fuzzy distance as an opinion. The global fuzzy distance is the global opinion of all. An aggregation operator is needed for this in fuzzy logic. The aggregation operator can be the average of N local fuzzy distances. βˆ‘π‘ 𝑖=1 𝑓𝑑𝑖𝑠(π‘₯𝑖 ,𝑦𝑖 ) 𝑓𝑑𝑖𝑠(𝒖𝒙 , π’–π’š ) = 𝑁 (11) Formula (10) works poorly for the hybrid user model because it considers only the feature with the minimum distance and ignores other features. According to fuzzy set and concept distances, we need a local fuzzy distance metric, 𝑓𝑑𝑖𝑠(π‘₯𝑖 , 𝑦𝑖 ), which satisfies the following conditions: A. Zero value for the same feature values. B. Zero value for different feature values in the same fuzzy set with the same membership values. C. Minimized distance between any two feature values that belong to the same fuzzy set and have close membership values. D. Maximized distance between any two feature values that belong to two different fuzzy sets. Condition (A) is a fundamental requirement for any distance function. To clarify condition (B), let's assume that we have two users who are 40 and 35 years old, respectively. Both users have a membership value of 1 for the β€œadult” category (Figure 2). The distance between them is 5, but they are similar users in terms of fuzzy sets. To make the distance between two users zero, we need another term that gives zero value for this and similar cases. What really makes these two users similar is their equal membership values in one fuzzy set. We then define a corresponding fuzzy distance function that satisfies all four above-mentioned conditions. Let a and b be the membership vectors corresponding to two crisp values a and b for a given feature with 𝑙 fuzzy sets. The fuzzy distance between a and b is defined as 𝑓𝑑𝑖𝑠(π‘Ž, 𝑏) = 𝑑𝑖𝑠(𝒂, 𝒃) Γ— 𝑑𝑖𝑓(π‘Ž, 𝑏) (12) where 𝑑𝑖𝑓(π‘Ž, 𝑏) is simply the difference operator, and a and b are vectors of size l, and 𝑑𝑖𝑠(𝒂, 𝒃) is any vector metric distance. In this work, the Euclidean distance is used to calculate 𝑑𝑖𝑠(𝒂, 𝒃): 𝑑𝑖𝑠(𝒂, 𝒃) = βˆšβˆ‘π‘™π‘—=1(π‘Žπ‘— βˆ’ 𝑏𝑗 )2 (13) where π‘Žπ‘— is the membership value of feature a for its fuzzy set j. Example Let's assume we need to calculate the fuzzy distance between two users who have ages: a) 35 and 40 b) 45 and 60 c) 18 and 23 Case a: 𝐚 = 〈0,1,0βŒͺ, 𝒃 = 〈0,1,0βŒͺ. Then 𝑑𝑖𝑠(𝒂, 𝒃) = √(0 βˆ’ 0)2 + (1 βˆ’ 1)2 + (0 βˆ’ 0)2 = 0 𝑑𝑖𝑓(35,40) = 40 βˆ’ 35 = 5 𝑓𝑑𝑖𝑠(35,40) = 0 Γ— 5 = 0 (similar users) Case b: 𝐚 = 〈0,1,0βŒͺ, 𝒃 = 〈0,0,1βŒͺ. Then 𝑑𝑖𝑠(𝒂, 𝒃) = √(0 βˆ’ 0)2 + (1 βˆ’ 0)2 + (0 βˆ’ 1)2 = √2 𝑑𝑖𝑓(60,45) = 60 βˆ’ 45 = 15 𝑓𝑑𝑖𝑠(60,45) = √2 Γ— 15 (opposite users) Case c: 𝐚 = 〈1,0,0βŒͺ, 𝒃 = 〈0.8,0.2,0βŒͺ. Then 𝑑𝑖𝑠(𝒂, 𝒃) = √(0.8 βˆ’ 1)2 + (0.2 βˆ’ 0)2 + (0 βˆ’ 0)2 = 0.283 𝑑𝑖𝑓(23,18) = 23 βˆ’ 18 = 5 𝑓𝑑𝑖𝑠(23,18) = 0.283 Γ— 5 (close users) The example results show that formula (12) satisfies all four features for a necessary local function of fuzzy distance. Therefore, for the fuzzy approach, the fuzzy distance function between two users can be aggregated using formula (11). 3.4 Neighbor selection After calculating similarity values, the system ranks users according to their similarity to the active user to obtain a set of neighbors for them. The size of the neighbor set can be fixed by choosing the first N users or variable by selecting users whose similarity exceeds a certain threshold. This work distinguishes between the set of neighbors and the set of actual recommendations. The output of the neighbor set is the same as mentioned earlier (distance function), and a priority set is used to refine it. 3.5 Predictions and recommendations At this stage, the recommendation system assigns a predicted rating to all items that the set of neighbors sees, rather than the active user. The predicted rating π‘π‘Ÿπ‘₯,π‘˜ indicates the expected interest of item π‘†π‘˜ to user 𝑒π‘₯ and is usually computed as the sum of the ratings of 𝑒π‘₯ environment for the same item π‘†π‘˜ : βˆ‘π‘’π‘¦βˆˆπ‘π‘₯ π‘Ÿπ‘¦,π‘˜ π‘π‘Ÿπ‘₯,π‘˜ = 𝑁π‘₯ (14) where 𝑁π‘₯ denotes the set of neighbors for 𝑒π‘₯ , who rated item π‘†π‘˜ . Some examples of aggregation functions: π‘π‘Ÿπ‘₯,π‘˜ = π‘˜ βˆ‘π‘’π‘¦ βˆˆπ‘π‘₯ π‘ π‘–π‘š(𝒖𝒙 , π’–π’š ) Γ— π‘Ÿπ‘¦,π‘˜ (15) π‘π‘Ÿπ‘₯,π‘˜ = π‘šπ‘₯ + π‘˜ βˆ‘π‘’π‘¦ βˆˆπ‘π‘₯ π‘ π‘–π‘š(𝒖𝒙 , π’–π’š ) Γ— (π‘Ÿπ‘¦,π‘˜ βˆ’ π‘šπ‘¦ ) (16) The multiplier π‘˜ serves as a normalizing coefficient and is typically chosen as π‘˜ = 1 and π‘šπ‘¦ – is the average rating of user 𝑒𝑦 . βˆ‘π‘’π‘¦βˆˆπ‘π‘₯ |π‘ π‘–π‘š(𝒖𝒙 π’š )| ,𝒖 1 π‘šπ‘¦ = βˆ‘π‘ π‘˜ βˆˆπ‘π‘†π‘¦ π‘Ÿπ‘¦,π‘˜ (17) |𝑆𝑦 | The weighted sum (6) is the most commonly used aggregation function for predicting ratings. Since users typically use rating scales differently, this prediction formula compensates for variations in the rating scale. This allows maintaining predicted ratings for a given user to be close to the average rating of this active user. Based on the predicted ratings of items that have not yet been rated, seen by the neighbors of the active user, the recommendation system sorts them in descending order according to their predicted ratings to form a prediction list for the active user π‘’π‘Ž . π‘ƒπ‘Ÿπ‘’π‘‘π‘†π‘’π‘‘(π‘’π‘Ž ) = {π‘ π‘˜ |π‘ π‘˜ ∈ 𝑆𝑖 βŠ‚ π‘π‘Ž , π‘ π‘˜ βˆ‰ π‘†π‘Ž } (18) The rank of item π‘ π‘˜ in the prediction list π‘’π‘Ž , π‘…π‘Žπ‘›π‘˜π‘’π‘Ž (𝑠𝑗 ) is the position of item π‘ π‘˜ in the prediction list for active user π‘’π‘Ž . Accordingly, we can define the recommendation list π‘…π‘’π‘π‘šπ‘†π‘’π‘‘(π‘’π‘Ž ) for the active user π‘’π‘Ž as the set of items with the highest rating π‘π‘Ÿ in π‘ƒπ‘Ÿπ‘’π‘‘π‘†π‘’π‘‘(π‘’π‘Ž ), which is given by: π‘…π‘’π‘π‘šπ‘†π‘’π‘‘(π‘’π‘Ž ) = {π‘ π‘˜ |π‘ π‘˜ ∈ π‘ƒπ‘Ÿπ‘’π‘‘π‘†π‘’π‘‘(π‘’π‘Ž ), π‘…π‘Žπ‘›π‘˜π‘’π‘Ž (𝑠𝑗 ) ≀ π‘π‘Ÿ } (19) It is expected that objects with the highest rating will be the most predominant, so the user is likely to explore objects in an ordered list, starting from the top, hoping to find interesting objects. 4. Experiments The experiments were conducted using the Python programming language in the Jupyter Notebook environment as a standalone interface for the analyst. The software product is primarily intended for determining the best locations based on machine learning models. Computation hardware: β€’ OS Microsoft Windows 10 β€’ Intel Core i5 7300HQ 2.5 GHz – 3.5 GHz β€’ 16 GB of RAM β€’ SSD storage drive β€’ Graphics card: Nvidia Geforce 1050Ti Dataset description: β€’ Utilized the original MovieLens dataset comprising 100,000 ratings by 943 users for 1682 movies. β€’ Ratings categorized from 1 (poor) to 5 (excellent). β€’ Each user rated a minimum of 20 movies. β€’ Demographic data (age, gender, occupation, zip code) available for all users. β€’ Movie information includes title, release date, video release date, and genre (e.g., Action, Comedy, Drama). Experiment design: 1. User selection criteria: β€’ Included users who rated at least 60 movies. β€’ Split the users into two groups: a) Group 1 (20 movies for building the user model). b) Group 2 (40 movies for testing). β€’ Resulted in 497 eligible users providing 84,596 ratings. 2. Random split generation: β€’ Created five random splits of training and active users. β€’ Each split involved selecting 50 active users and utilizing the remaining 447 users as training users. β€’ These splits were labeled as split-1, split-2, ..., split-5 for subsequent cross- validation. 3. Cross-validation procedure: β€’ Conducted five-fold cross-validation, repeating experiments five times, once with each split. β€’ Each split served as a distinct training and testing dataset. 4. Training and testing phases: β€’ Training phase - utilized the set of training users (447 users) to find neighbors for the active user. β€’ Testing phase - divided ratings of each active user randomly into two sets: a) Training ratings (34%) b) Test ratings (66%) Training ratings used to model the user, while test ratings remained unseen for prediction evaluation. 5. Results In this experiment, we are running a recommendation system using fuzzy distance and comparing its results with classical systems (Pearson correlation and cosine similarity). The size of the neighbor set is kept at 30 for all experiments. In this experiment, we are running the system over the entire database of training users, even if it takes a long time. The difference between gender (profession) values is either 0 if both users have the same gender (profession), or 1 otherwise. This is consistent with our reasoning for establishing opposing values, as far as possible. In addition, a certain normalization is used for age values to ensure that they fall within the same GII range, i.e., [0, 5]. Each age value is multiplied by (5/MAX), where MAX is the oldest user in the system and no younger than 60 years old. The system selects movies from the set of test ratings of the active user one by one. After that, it predicts ratings for them based on the set of all neighbors who rated the same movie. Once the predicted ratings are obtained, the system compares them with the actual ratings provided by the active user. Figures 4-8 show the percentage of correct predictions obtained for fifty active users. Each graph shows the percentage of the number of ratings that the system correctly predicted, out of the total number of available test ratings for the active user. Table 2 Table 3 MAE and Coverage – Split 1 Π‘omparison of prediction results – Split 1 FD Cosine Pearson Greater Same Smaller MAE 0.737490 0.762880 0.763961 FD with Cosine 22 6 22 Coverage 0.989313 0.982093 0.974973 FD with Pearson 29 5 16 Figure 4: Split 1 Table 4 Table 5 MAE and Coverage – Split 2 Π‘omparison of prediction results – Split 2 FD Cosine Pearson Greater Same Smaller MAE 0.760839 0.789274 0.787979 FD with Cosine 26 1 23 Coverage 0.984859 0.982352 0.976602 FD with Pearson 25 5 20 Figure 5: Split 2 Table 6 Table 7 MAE and Coverage – Split 3 Π‘omparison of prediction results – Split 3 FD Cosine Pearson Greater Same Smaller MAE 0.743868 0.785782 0.788918 FD with Cosine 26 4 20 Coverage 0.980292 0.978136 0.971512 FD with Pearson 28 2 20 Figure 6: Split 3 Table 8 Table 9 MAE and Coverage – Split 4 Π‘omparison of prediction results – Split 4 FD Cosine Pearson Greater Same Smaller MAE 0.787537 0.837564 0.832438 FD with Cosine 29 5 16 Coverage 0.987336 0.976205 0.970355 FD with Pearson 25 10 15 Figure 7: Split 4 Table 10 Table 11 MAE and Coverage – Split 5 Π‘omparison of prediction results – Split 5 FD Cosine Pearson Greater Same Smaller MAE 0.751824 0.765969 0.768245 FD with Cosine 24 3 23 Coverage 0.975372 0.976775 0.968327 FD with Pearson 25 5 20 Figure 8: Split 5 6. Discussions The results of the experiment conducted for each random split of 50 active users and 447 training instances are shown in figures 4-8 and tables 2-11, which depict: β€’ a table with the results of calculating the Mean Absolute Error (MAE), which is a measure of the accuracy of the recommendation system, and Coverage [14], which is a measure of the percentage of items for which the recommendation system can provide predictions; β€’ a table comparing the developed algorithm with classical algorithms (Pearson correlation and cosine similarity), where the comparison measure is the percentage of correctly predicted ratings for movies from the test dataset for each of the 50 active users. The table shows the number of users belonging to a certain group, where Greater group has a higher percentage of correctly predicted ratings compared to classical methods, Same group has the same percentage, and smaller group has a lower percentage; β€’ a graph showing the percentage of correctly predicted ratings using the implemented algorithms for each user. Based on the obtained data, the following conclusions can be drawn: β€’ the Mean Absolute Error (MAE) of the developed algorithm is better (smaller) compared to classical approaches, indicating that the deviation of predictions generated by the recommendation system from the true ratings specified by the active user of the recommendation system has decreased; β€’ the percentage of items for which the recommendation system can provide predictions (Coverage) [15] remained at the same level, and in some cases even increased; β€’ in all 5 runs, the percentage of correctly predicted ratings was higher for most users. Higher prediction values obviously illustrate that a better set of corresponding neighboring users has been found, thus increasing the accuracy of the recommendation system. 7. Conclusions In this study, we proposed and evaluated a recommendation system that leverages fuzzy logic for both user model formation and similarity computation. The research outcomes indicate several key findings: Enhanced user modeling: By incorporating fuzzy logic into the user modeling process, we were able to capture nuanced user preferences beyond simple ratings. The integration of fuzzy sets for demographic attributes and genre interest indicator resulted in a more accurate representation of user tastes. Improved similarity computation: The introduction of fuzzy distance metrics facilitated a more robust comparison between user models, considering the partial degree of membership in fuzzy sets. This approach addressed the limitations of traditional distance functions, particularly in handling diverse and imprecise user features. Superior recommendation accuracy: Experimental results demonstrated that the recommendation system utilizing fuzzy logic outperformed classical approaches, such as Pearson correlation and cosine similarity. The system achieved lower Mean Absolute Error (MAE) and higher prediction accuracy, indicating its effectiveness in providing personalized recommendations. Stable coverage: Despite the introduction of fuzzy logic, the recommendation system maintained stable coverage, ensuring that a wide range of items could be recommended to users. This suggests that the proposed approach strikes a balance between accuracy and coverage, essential for practical recommendation systems. In summary, the integration of fuzzy logic in user modeling and distance computation proved to be a promising approach for enhancing recommendation systems' performance. 8. References [1] D. Roy, M. Dutta, A systematic review and research perspective on recommender systems, Journal of Big Data, Vol. 9, Issue 59, 2022. doi:10.1186/s40537-022-00592-5. [2] P.Dahiya, N. Duhan, Comparative Analysis of Various Collaborative Filtering Algorithms, International Journal of Computer Sciences and Engineering, Vol. 7, Issue 8 (2019) 347-351. doi:10.26438/ijcse/v7i8.347351. [3] E. Γ‡ano, M. Morisio, Hybrid Recommender Systems: A Systematic Literature Review, Intelligent Data Analysis, Vol. 21, Issue 6, 2017, pp. 1487-1524. doi:10.3233/IDA-163209. [4] M. Fauzi, S. Putra, A. A. Stephanie, I. S. Edbert, D. Suhartono, Hybrid Approaches for Customer Segmentation and Product Recommendation, Proceedings of the International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), 2023, pp. 324-329. doi:10.1109/ICIMCIS60089.2023.10348619. [5] A. Vall, M. Dorfer, H. Eghbal-zadeh, M. Schedl, K. Burjorjee, G. Widmer, Feature-combination hybrid recommender systems for automated music playlist continuation, User Modeling and User-Adapted Interaction, Vol. 29 (2019) 527–572. doi:10.1007/s11257-018-9215-8. [6] S. Fremal, F. Lecron, Weighting Strategies for a Recommender System Using Item Clustering Based on Genres, Expert Systems with Applications, Vol. 77 (2017) 105-113. doi:10.1016/j.eswa.2017.01.031. [7] P.E. Sytnikova, M.O. Hrebeniuk, Recommendation system based on a compact hybrid user model, Automated control systems and automation devices, Vol. 179 (2023) 32-42. doi:10.20837/0135-1710.2023.179.032. [8] F. Fkih, Similarity measures for Collaborative Filtering-based Recommender Systems: Review and experimental comparison, Journal of King Saud University - Computer and Information Sciences, Vol. 34, Issue 9 (2022) 7646-7669. doi:10.1016/j.jksuci.2021.09.014. [9] H. Zitouni, On Solving Cold Start Problem in Recommender Systems Using Web of Data, Proceedings of the International Conference on Pattern Analysis and Intelligent Systems (PAIS), Vol. 4, 2022, pp. 1-8. doi:10.1109/PAIS56586.2022.9946899. [10] P. Kumari, G. Kaur, P. Singh, A. Kumar, Movie Recommendation System for Cold-Start Problem Using User's Demographic Data, Proceedings of the International Conference on Electrical, Computer and Energy Technologies (ICECET), ISSN 0973-6107, 2023, pp. 1-5. doi:10.1109/ICECET58911.2023.10389506. [11] R. Y.Toledo, L. Martinez, Fuzzy Tools in Recommender Systems: A Survey, International Journal of Computational Intelligence Systems, Vol. 10, (2017) 776-803. doi:10.2991/ijcis.2017.10.1.52. [12] M. Panwar, A. Sharma, O. P. Mahela, B. Khan, Fuzzy Inference System Based Intelligent Food Recommender System, Proceedings of the Asian Conference on Innovation in Technology (ASIANCON), Vol. 3, 2023, pp. 1-6. doi:10.1109/ASIANCON58793.2023.10270048. [13] F. Kaviani, Recommender System in Social Networks using Fuzzy Logic, Proceedings of the International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), Vol. 3, 2023, pp. 1-7. doi:10.1109/ICECCME57830.2023.10253120. [14] H. Koohi, K. Kiani, User based Collaborative Filtering using fuzzy C-means, Measurement, Vol. 91 (2016) 134-139. doi:10.1016/j.measurement.2016.05.058. [15] K. Najmani, L. Ajallouda, El H. Benlahmar, N. Sael, A. Zellou, Offline and Online Evaluation for Recommender Systems, Proceedings of the International Conference on Intelligent Systems and Computer Vision (ISCV), 2022, pp. 1-5. doi:10.1109/ISCV54655.2022.9806059.