<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Movie Recommendation System*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Weronika Wołowczyk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ewa Szymik</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Applied Mathematics, Silesian University of Technology</institution>
          ,
          <addr-line>Kaszubska 23, 44100 Gliwice</addr-line>
          ,
          <country country="PL">POLAND</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IVUS2024: Information Society and University Studies 2024</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The goal of the project is to deliver personalized movie suggestions based on user preferences by analyzing and processing a dataset of movies. The project's primary stages are data cleaning (as some unnecessary dataset columns were removed), exploratory data analysis (several visualizations of dataset characteristics were presented), and creating a recommendation system based on a soft-set theory. The core of the project is a recommendation system that makes movie suggestions based on user input. Users are asked to state their preferences concerning actors, genres, and keywords. Then, a soft set-based classification method is applied to score and rank the films depending on these preferences. The system calculates a total score for each movie based on its attributes, ultimately providing the top five propositions. There are also introduced methods for recommending 5 most similar movies to a given title and predicting movie ratings based on their features using k-nearest neighbor (knn) algorithm. In the first method, the algorithm searches for most similar movies based on their attributes and in the second it predicts a movie's rating by analyzing the votes of the k filmswith the most similar features. Overall, the project presents the application of algorithmic techniques and machine learning methods such as soft-sets, to provide personalized suggestions, and k nearest neighbours algorithm to analyse data and predict data attributes.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;movie recommendation</kwd>
        <kwd>softset</kwd>
        <kwd>data analysis</kwd>
        <kwd>personalized recommendations</kwd>
        <kwd>data preprocessing</kwd>
        <kwd>KNN algorithm</kwd>
        <kwd>vote predictions</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>This increase in the number of films being produced each year creates a difficulty for the viewers, to
chose a movie that suites the most to their liking. With all of the streaming services, users
have a rich library of content available to them, and picking and choosing what to watch may
get more and more complicated. That is why there is a need for recommendation systems: to
provide personalized content and, as a result, to improve the user experience by recommending only
the movies that might spark an interest and match the viewers’ individual preferences.</p>
      <p>Collaborative filtering, content-based recommendation and hybrid methods are the typical
techniques that are employed by many existing recommendation systems. Collaborative filtering
depends on preferences of other like-minded users whereas content-based filtering recommendsitems
to the user based on descriptions of items. Hybrid methods combine both approaches to leverage
their strengths and mitigate their weaknesses.</p>
      <p>This paper explores the use of soft set theory as an approach to movie recommendation
systems. Soft sets, introduced by Molodtsov in 1999, are mathematical models used for reasoning
under conditions of uncertainty and vagueness. Unlike traditional sets, where an element
either belongs to the set or does not, soft sets allow for partial membership, with elements
having degrees of belonging. This degree is typically represented by a value between 0 and 1,
indicating how strongly an element is associated with the set. Soft sets are particularly useful in
fieldssuch as decision-making and artificialintelligence, where uncertainty and vagueness are
common. In the context of movie selection, they provide a flexible classificationmethod that
can accommodate the varied nature of user preferences.</p>
      <p>The KNN (k-nearest neighbors) algorithm is a simple, non-parametric method for classification
tasks in machine learning. It operates on the principle of proximity: it consists of finding
closest objects in feature space to the element currently being tested . Therefore, regarding
feature similarity, they are called neighbours. Neighbors are derived from a set of objects used to
train the algorithm. The resulting class is the one in which there is the highest number of
neighbours. Most often, the distance between the elements is calculated using Euclidean or
Manhattan metric.</p>
      <p>The KNN classifieris used, firstly,to recommend 5 most similar movies to the one provided by
a user, and secondly, to predict movie ratings based on the _ attribute of similar movies. The steps of
the algorithm are normalizing the data, splitting the data into training set and test set, fittingthe
model, and evaluating the accuracy of the predictions to determine the effectiveness of the model.</p>
      <p>Overall, main point of this project is the development of a personalized recommendation
system that leverages user-defined preferences for genres, actors, and keywords. By applying
soft set theory, the system calculates a total score for each movie based on its alignment with
user preferences and displays the five films with the highest mark, resulting in customized
movie suggestions. Second point is the recommendation of 5 most similar movies. Users are
asked to input a title and then the system finds similar movies feature-wise. The prediction of
movie ratings based on their features is the third point. The system finds similar movies in the
training set and based on their _ attribute predicts the rating of movies from test set. Both
points use K-nearest neighbours algorithm.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>Soft set methodology offers a flexible approach for handling uncertainties and making decisions based
on multiple parameters. Its simplicity and adaptability make it a powerful tool for various
applications, including recommendation systems.
2.1. SoP Set
A soft set ( ,  ) over a universal set  is a pair where  is a mapping given by  :  →  ( ).
Here,  is a set of parameters, and  ( ) denotes the power set of  . For each parameter  ∈  ,
 () is a subset of  .</p>
      <sec id="sec-2-1">
        <title>2.2. Mathematical Model</title>
        <p>• Step 1: Define the Universal Set</p>
        <p>Let  represent a universal set containing elements that need to be analyzed and
categorized. In a movie recommendation system,  is a set of films:
• Step 2: Define the Set of Parameters</p>
        <p>Parameters  define the attributes relevant to the elements in . These parameters could be
movie genres:</p>
        <p>= {action, drama, comedy, adventure, . . .}
• Step 3: Define the Mapping</p>
        <p>The mapping  associates each parameter  ∈  with a subset of  . For instance, if the
parameter is "adventure,"  (adventure) might include films classified as adventure films:
 (adventure) = { 1,  2,  3}</p>
        <p>(drama) = { 2,  4}</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. Constructing the SoP Set</title>
      </sec>
      <sec id="sec-2-3">
        <title>2.4. Decision-Making Using SoP Sets</title>
        <p>• Step 4: Construct the Soft Set ( ,  )</p>
        <p>The soft set is constructed by pairing each parameter with its corresponding subset:
( ,  ) = {(adventure, { 1,  2,  3}), (drama, { 2,  4}), . . .}
• Step 5: Represent Soft Set in a Binary Table</p>
        <p>The soft set can be represented in a binary table for easier analysis. Each row corresponds to an
element in  , and each column corresponds to a parameter in  . An entry is 1 if the
element is associated with the parameter, and 0 otherwise.
• Step 6: Calculate Selection Values</p>
        <p>Assign weights to each parameter to reflect their importance. Multiply the binary values by
these weights and sum them up for each element in  . This gives a selection value
indicating the relevance of each element based on the given parameters.
• Step 7: Determine the Best Choice</p>
        <p>The elements with the highest selection values are considered the best choices based on the
parameters. This can be used for recommendations.
2.5. Computational Example
• Class  :</p>
        <p>where  are films.
• Set of parameters  defining movie genres:
• Set of considered parameters :
• There are 6 films in the class  :
• Assumption:  :
• Soft set  :
 = {adventure, fantasy, animation}
 = {1, 2, 3, 4, 5, 6}</p>
        <p>= { 1,  2,  3}
 (1) = {1, 2, 3, 6}
 ( 2) = { 1,  4,  6}
 ( 3) = { 1,  3,  6}
 = {action, drama, crime, adventure, science-fiction, thriller, fantasy, western, animation, . . .}
( ,  ) = {(adventure = { 2,  6}), (fantasy = { 1,  4,  6}), (animation = { 3,  6})}
• - selected value of object  ∈ 
•  =  ×  - input data of the weighted table ,  ∈ (0, 1]</p>
        <p>1 and  6.
• The same calculations are performed for actors and keywords.</p>
        <p>• In the table, it is evident that the films most corresponding to the selection parameters are
Visualization of the recommendation system:
2.6. K-nearest neighbours
The KNN algorithm is a simple classifier that consist of finding  elements in a given
dataset that are most similar to the test element. It follows the steps:
1. Data Collection: Gathering training data, which will be used to build the
model. Each data point is represented by a set of features and its corresponding
class to be predicted. In this project, data points are movies from the database,
class to be predicted is the vote_average value of the movie.
2. Determining the Value of Parameter K: The parameter K specifies how
many nearest neighbors will be considered during the classification of a new data
point. Choosing an appropriate value for K can significantly impact the
effectiveness of the model. In the movie recommendation system,  takes values
from 2 to 9.
3. Calculating Distances: For a new data point whose class is to be predicted,
distances to all points in the training set are calculated. This determines the
similarity between points. In the project, the Manhattan metric is used:
4. Selecting K Nearest Neighbors: The next step is to select K training points
that have the closest distances to the point currently tested.
5. Classifying the Point: After selecting the K nearest neighbors, the point is
classified. The method for this is a majority vote, where the class of the new data
point is determined by the dominant class among the K nearest neighbors.
6. Determining the accuracy The final step is to assess the performance of the
KNN model. This is be done by splitting the data into a training and testing set,
and then comparing the predicted classes with the actual classes in the testing set.
7. To avoid the dominance of features with larger values, feature normalization is
applied before using KNN.</p>
        <p>norm =</p>
        <p>− min
max − min</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <p>This chapter focuses on the experiments conducted to develop a machine learning
model for movie recommendations. By testing various algorithms, we aim to enhance
the accuracy and effectiveness of our recommendation system. Our goal is to bet- ter
understand the key factors that contribute to successful movie recommendations,
ultimately improving the user experience.</p>
      <sec id="sec-3-1">
        <title>3.1. Database description</title>
        <p>The dataset utilized in this study was sourced from Kaggle.com, a widely recognized
open-access platform renowned for its vast collection of publicly available datasets.
Title of database is "TMDb Movies Dataset". There are 10856 records in total, which
contain 21 columns.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Evaluation Metric</title>
        <p>Model will be measured by using metric accuracy. Accuracy is the most popular metric
and it shows how often a classificationof an ML model is correct overall.</p>
        <p>Where  (True Positives) represent instances that were accurately identified as
positive,  (True Negatives) represent instances that were accurately identified as
negative,  (False Negatives) are instances where positive cases were incorrectly
identified as negative, and  (False Positives) are instances where negative cases
were incorrectly identifiedas positive.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Model analysis</title>
        <p>The figures above depict the accuracy of our KNN model across different  values
using standard and min-max normalization techniques. Our target variable, the rounded vote
average, poses a challenge due to its unpredictability.</p>
        <p>Higher  values generally lead to improved model performance, indicating more stable
predictions as more neighbors are considered. Additionally, standard normalization
slightly outperforms min-max normalization, when applied to features like runtime
and release year.</p>
        <p>In summary, our experiments highlight the effectiveness of higher  values and
standard normalization in enhancing the predictive performance of our movie
recommendation system. These findings emphasize the importance of careful normalizationand
 value selection in predictive modeling tasks.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This paper presents the design of a personalized movie recommendation system using
soft set theory and the k-nearest neighbors (KNN) algorithm. The main goal is to
build a system that recommends top fivemovies for a given user according to their
preferences. Additional functionalities include suggesting similar movies to a given
title, and predicting movie ratings based on data features.</p>
      <p>Soft Set Theory provides a powerful tool for dealing with the uncertainty and
vagueness associated with user’ preferences. Representing user preferences as soft sets allows
calculating a total score for each movie and making personalized recommendations
that align with users’ individual preferences. This shows the flexibility and efficiency
of soft sets in decision making processes.</p>
      <p>The use of K nearest neighbours algorithm further expands the project. The KNN
classifier identifies movies similar to a user-specified title and predicts movie ratings
based on ratings of records closest in the feature space. The effectiveness of those
predictions was checked. The accuracy oscillates for different values of , increasing as
 increases and reaching the highest value of 44% when  equals to 9 (the accuracy for higher
values of  was not checked).</p>
      <p>Experimental results validate the system’s effectiveness in generating personalized
movie recommendations as recommended movies are, in fact, aligning with provided
preferences. The accuracy of KNN classifierreached only 44% because higher values of
 were not tested and it is hard to predict movie ratings based solely on their features.
Soft set theory and KNN have shown to be a potent combination for creating
recommendation system that can process diverse user inputs and provide personalized movie
suggestions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Molodtsov</surname>
          </string-name>
          , “
          <article-title>Soft set theory - First results</article-title>
          ,
          <source>” Computers &amp; Mathematics with Applications</source>
          , vol.
          <volume>37</volume>
          , no.
          <issue>4-5</issue>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>31</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ricci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rokach</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Shapira</surname>
          </string-name>
          , “Introduction to Recommender Systems Handbook,” in Recommender Systems Handbook, Springer, Boston, MA,
          <year>2011</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lops</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. De Gemmis</surname>
          </string-name>
          , and G. Semeraro, “
          <article-title>Content-based Recommender Systems: State of the Art</article-title>
          and Trends,” in Recommender Systems Handbook, Springer, Boston, MA,
          <year>2011</year>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>TMDb</surname>
          </string-name>
          , The Movie Database, available at: https://www.kaggle.com/datasets/ juzershakir/tmdb-movies-dataset/data
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>