=Paper= {{Paper |id=Vol-3910/aics2024_p17 |storemode=property |title=The Impact of Feature Quantity on Recommendation Algorithm Performance |pdfUrl=https://ceur-ws.org/Vol-3910/aics2024_p17.pdf |volume=Vol-3910 |authors=Lukas Wegmeth }} ==The Impact of Feature Quantity on Recommendation Algorithm Performance== https://ceur-ws.org/Vol-3910/aics2024_p17.pdf

The Impact of Feature Quantity on Recommendation
Algorithm Performance
Lukas Wegmeth1
1
Intelligent Systems Group, University of Siegen, Germany

Abstract
Recent model-based Recommender Systems (RecSys) algorithms emphasize using features, also called side
information, in their design, similar to algorithms in Machine Learning (ML). In contrast, some of the most
popular and traditional algorithms for RecSys solely focus on a given user-item-rating relation without including
side information. An essential category of these is matrix factorization-based algorithms, e.g., Singular Value
Decomposition and Alternating Least Squares, which are known to have high performance on RecSys datasets.
This paper aims to provide a performance comparison and assessment of RecSys and ML algorithms when side
information is included. We chose the Movielens-100K dataset for a case study since it is a standard for comparing
RecSys algorithms. We compared six different feature sets with varying quantities of features, which were
generated from the baseline data and evaluated on 19 RecSys algorithms, baseline ML algorithms, Automated
Machine Learning (AutoML) pipelines, and state-of-the-art RecSys algorithms that incorporate side information.
The results show that additional features benefit all algorithms we evaluated. However, the correlation between
feature quantity and performance is not monotonic for AutoML and RecSys. In these categories, an analysis of
feature importance revealed that the quality of features matters more than quantity. Throughout our experiments,
the average performance on the feature set with the lowest number of features is ∼6% worse compared to that
with the highest in terms of the Root Mean Squared Error. An interesting observation is that AutoML outperforms
matrix factorization-based RecSys algorithms when additional features are used. Almost all algorithms that
can include side information perform better when using the highest quantity of features. In the other cases,
the performance difference is negligible (<1%). The results show a clear positive trend for the effect of feature
quantity and the critical effects of feature quality on the evaluated algorithms.

Keywords
Feature Engineering, Recommender Systems, Automated Machine Learning

1. Introduction
Matrix factorization-based Recommender System (RecSys) algorithms are specialized for predicting
missing entries, e.g., ratings, in sparsely filled user-item matrices. Many often-used benchmark datasets
exist [1, 2], representing such a RecSys task. Some of these datasets include side information, also called
features, which are not used by the RecSys algorithms mentioned above. Instead, the data is directly
reduced to a sparse user-item matrix, ignoring side information. In contrast, Machine Learning (ML)
algorithms are broad in their applications and usually profit from the availability of additional, mean-
ingful features [3, 4, 5]. As an extension, Automated Machine Learning (AutoML) techniques further
increase ML performance through automated algorithm selection and hyperparameter optimization.
Furthermore, the same tasks RecSys algorithms intend to solve can generally be solved by (Auto)ML
algorithms. Recent advances in RecSys have led to more sophisticated model-based algorithms, with
the likes of Factorization Machines [6] and especially the latest Deep Neural Networks (DNN), that can
incorporate side information and are more similar to their ML relatives [7, 8, 9, 10, 11]. However, the
performance gap between (Auto)ML and RecSys has not been explicitly researched.
Feature engineering is a broad topic that is well documented and researched due to its positive
influences on ML [12, 13, 5, 14, 15]. It summarizes many feature-processing techniques that mainly intend
to increase prediction performance. Today, such feature engineering techniques are standard for many
ML pipelines. Among those techniques is the curation of features by selection and extraction. Additional

AICS’24: 32nd Irish Conference on Artificial Intelligence and Cognitive Science, December 09–10, 2024, Dublin, Ireland
$ lukas.wegmeth@uni-siegen.de (L. Wegmeth)
0000-0001-8848-9434 (L. Wegmeth)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
real-world features and features extracted from existing data often benefit a model’s performance. In
this context, comparing the impact of feature quantity on the performance of RecSys and ML algorithms
provides a meaningful indicator for the effects of feature engineering. However, we could not find
previous comparative studies on the effect of feature quantity on RecSys algorithms.
Due to the aforementioned positive effects of feature engineering techniques in ML algorithms, we
hypothesized that the same effects could also be shown for RecSys algorithms. The following two
research questions arise since a performance comparison between (Auto)ML and RecSys regarding
feature quantity is absent in the literature. In a RecSys problem setting, how does feature quantity
impact the performance:

• of ML, AutoML, and RecSys algorithms in general?
• of RecSys algorithms compared to (Auto)ML algorithms?

We explore these questions through a case study on the Movielens-100K [16] dataset. The code that
produced the results reported in this paper is available on our GitHub repository1 .

2. Method
We evaluated 19 algorithms from nine libraries (Table 1) on six feature sets generated from the Movielens-
100K [16] dataset (Table 2).

Table 1
An overview of the evaluated algorithms and their category. We evaluated the algorithms as implemented in
their respective library. Importantly, it is shown whether the algorithm can incorporate side information and
how their hyperparameters were tuned.
Category Library Algorithm Uses side information Hyperparameter tuning
Baseline Constant Predictor - -
Linear Regressor X -
Scikit-Learn [17]
K Nearest Neighbors X SMAC3 [18] (200 runs)
Machine Learning
Random Forest Regressor X Random search: 20 iterations per fold
Histogram Gradient Boosting Regressor X Random search: 20 iterations per fold
XGBoost [19] Extreme Gradient Boosting Regressor X -
Auto-Sklearn [20] Best algorithm varies by fold X One hour search per fold
AutoML
H2O AutoML [21] Best algorithm varies by fold X One hour search per fold
FLAML [22] Best algorithm varies by fold X One hour search per fold
SVD - -
Surprise [23]
SVDpp - -
RecSys Matrix Factorization KNNBaseline - -
User-User kNN collaborative filtering - -
Lenskit [24]
Item-Item kNN collaborative filtering - -
Biased Alternating Least Squares - -
SVDpp - Manually
LibRecommender [25]
Wide & Deep X SMAC3 [18] (300 runs)
RecSys Models
Deep Interest Network X SMAC3 [18] (100 runs)
MyFM [26] Bayesian Factorization Machine X -

Movielens-100K [16] is one of the regularly used explicit feedback algorithm performance evaluation
datasets in the RecSys community [27, 28, 29, 30]. The full dataset consists of a table with user IDs and
item IDs and their observed ratings, where each rating has a timestamp. Additionally, each user ID and
item ID contains a set of features specific to them. The user features are their age, gender, occupation,
and (North American) ZIP code. The item features are their movie genre, title, release date, and IMDb
URL. We solve the prediction of ratings as a regression task and measure and compare the performance
of a given algorithm through the Root Mean Squared Error.

1
https://code.isg.beel.org/recsys-feature-quantity
To analyze the impact of the number of features on algorithms, we cut and/or enriched the features
of the original dataset and finally grouped them, resulting in six separate feature sets (Table 2). The
default feature set contains most of the basic features of the original dataset. The idea is to use as many
original features as possible that require little to no further processing. For this reason, the item’s title,
IMDb URL, and the user’s ZIP codes were removed.
From the observed user-item-ratings relation, many statistical features can be engineered. They can
also be calculated separately in terms of the users and items. We calculated the following nine statistical
features regarding the users and items: mean, median, mode, minimum, maximum, standard deviation,
kurtosis, and skew. This provides a total of 18 additional features. Notably, these features were only
calculated on the training set after splitting the data into a separate training and test set to avoid leaking
information about the training set into the test set. These engineered features should provide helpful
additional information to the algorithms at hand.
Finally, additional real-world data points can be added to the list of available features. For this, we
chose the median and mean household income and population of a period as close as possible to the
release date of the original dataset. A set of these data points grouped by ZIP codes from the US from
2006 to 2010 [31] is the earliest publicly available data directly related to the user features.
From various combinations of the feature sets mentioned above, we created six combinations to
perform the experiments on. An overview of the sets and their names that we refer to from here on is
listed in Table 2.

Table 2
An overview of the evaluated feature sets that we generated from the base Movielens-100K [16] dataset either
by cutting or enriching features. The columns denote the contained features in the named feature sets shown in
the rows and provide the total number of features.
rating timestamp
user occupation
user ZIP code
features → user ID user stat. feat. user age number of features
ZIP code income
set name ↓ item ID item stat. feat. user gender categorical features count as one
ZIP code population
movie genre
movie release date
stripped-no-stats X - - - 2
stripped-with-stats X X - - 20
basic-no-stats X - X - 8
basic-with-stats X X X - 26
feature-expansion-no-stats X - X X 11
feature-expansion-with-stats X X X X 29

We applied additional processing steps to some of the features to make them suitable for the evaluation,
sometimes depending on the dataset or automatically as an algorithm requires. Generally, we tried to
stay as close to the original features as possible, making changes only where sensible or necessary. As a
result, we did not treat the user ID and item ID as categorical features where applicable. The movie
genre is provided as a categorical feature by default, which we did not change. However, the user’s
occupation is not provided as a categorical feature, so we transformed it into one. Movie release dates
are provided in a date format, and we converted them to a signed UNIX timestamp representation.
The user’s age and zip code are special cases. As provided in the original set, we treated the age as an
integer for the ’basic’ sets. We removed the ZIP code because it contains some non-numerical entries
and because we did not intend to filter ratings in these sets. For the ‘feature-expansion’ sets, we divided
the age by 18 to create five age categories and then treated the age as a categorical feature. We had to
keep the ZIP code to add the mean and median household income and population features. Finally, we
removed entries with ZIP codes that were not contained in the additional feature sets, which incurs a
loss in observed ratings of 7.05%. The remaining ZIP codes range from ‘00000’ to ‘99999’. To use them
as a feature, we selected only the first digit of each ZIP code and then transformed it into a categorical
feature. We chose to apply this processing step because the first digit in the ZIP code has a geographical
meaning and, therefore, serves as an estimation for the residential area of each user.
To have an equal ground for algorithm comparisons, we applied some constraints. We performed
all experiments on implementations in publicly available libraries to increase accessibility and repro-
ducibility. The Movielens-100K [16] dataset contains explicit ratings as integers ranging from one to
five. Therefore, we only chose algorithms that can take explicit ratings as input and predict ratings
in that same format. We evaluated the algorithms using five-fold cross-validation. Since one of the
research questions is about a comparison of the performance of algorithms against each other, we
performed hyperparameter tuning on algorithms that do not default to a tuned parameter setup for the
Movielens-100K [16] dataset. Depending on the algorithm, we manually tuned the hyperparameters
with a random search or using SMAC3 [18], an all-purpose hyperparameter optimization tool. We set
the time budget of AutoML tools to one hour for each fold. Table 1 lists all libraries and algorithms and
their categories, whether they use side information, and how their hyperparameters were tuned.

3. Results
Figure 1 aggregates the results gathered during the experiments and clearly shows that a higher feature
quantity generally results in a lower Root Mean Square Error (RMSE). In particular, when comparing
evaluations of the highest quantity of features with the lowest, the RMSE is 10% lower for ML, 4% lower
for AutoML, 1% lower for model-based RecSys, and 6% in total.

Figure 1: This plot shows the RMSE performance of the algorithm categories as denoted in Table 1 on the feature
sets denoted in Table 2. Each line plots the average RMSE of an algorithm category evaluated on each feature
set represented by their number of features. The vertical lines and labels on the x-axis denote the collected data
points. The first three data points do not include statistical features, while the final three do. The plot shows
that more features result in higher performance in most cases.
ML RecSysMF Total
AutoML RecSysModels

1.04

1.02
Root Mean Squared Error (lower is better)

1.00

0.98

0.96

0.94

0.92

0.90
2 8 11 20 26 29
Number of features

The evaluation shows that AutoML outperforms the traditionally strong matrix factorization-based
RecSys contenders in most of the evaluated feature sets. When only the ‘basic’ feature sets are included,
the RMSE is 1% lower. Notably, however, there is an increase in RMSE between the feature sets with
eleven features which is ‘feature-expansion-no-stats’ and 20 features which is ‘stripped-with-stats’. In
these special cases, feature quantity alone can not significantly improve the algorithm’s performance.
A likely reason is the feature importance seen in Figure 2, which shows that the feature set with the
higher feature quantity is missing essential features like the rating timestamp or item genre.
Figure 2: The ordered Gini feature importance in percent. We evaluated these with a Random Forest Regressor
that we fitted on the training data of the largest feature set. The chart shows the impact of each feature on the
trained model. These important values are not necessarily true for the evaluated algorithms but provide a good
estimation nonetheless. The feature names denote if they are an item or user feature. Statistical features are
prefixed with ’i’ and ’u’ to denote this.

20
Gini importance in percent

0
iSkewRating
iMeanRating

iStdRating
uMeanRating

iKurtRating
uStdRating

uSkewRating
uKurtRating

iModeRating
uModeRating
iMedianRating
uMedianRating
iMinRating
uMinRating
iMaxRating
uMaxRating
item_genre

iCountRating

uCountRating
item_release_date
rating_timestamp

itemId

userId

user_age
user_occupation
user_population
user_zip

user_gender
user_mean_income
user_median_income

Figure 3 provides a detailed overview of all experiments. It shows that the performance difference
and ranking in terms of the feature sets for RecSys and AutoML algorithms enormously varies by the
algorithm, which is in contrast to the ML algorithms, where the ranking is mostly the same. Notably, the
cross-validation procedure averages the results of multiple evaluations on an algorithm, and the figures
show these aggregations. However, the evaluations of an algorithm had only marginal performance
differences (<2%). Therefore, the reported averaged results shown here are relatively stable. Furthermore,
we observe that the performance of the different feature sets is divided into two groups for AutoML.
One of the groups contains the ‘stripped’ feature sets for which the search could not find a proper
result even when provided with statistical features. In the other group, all other feature sets are tightly
packed together, with a slight lead for the most extensive feature set, indicating the preference for a
good combination and a higher quantity of real-world and statistical features.
Regarding the model-based RecSys algorithms, the DNN approaches that are Wide & Deep and Deep
Interest Network do not show a clear trend toward more favorable features. However, the Bayesian
Factorization Machine is one of the most exciting results. Its performance increases clearly with feature
quantity. Though a RecSys algorithm by nature, it could also be used to solve more general ML problems
by design. It exhibits a mix of the behavior of ML and AutoML algorithms in its results, which can
be seen in Figure 3. The most significant difference is that its performance far exceeds that of any
other algorithm compared to any feature set, making it a prime example of the capability of additional
features for ML and RecSys.
As expected, ML algorithms consistently demonstrated improved performance with an increased
number of features. This shows that the provided features are of good enough quality and distribution.
The feature sets have an almost equal ranking order across all ML algorithms.
Figure 3: This figure shows the detailed evaluation results that lead to the main conclusions presented in
this paper. It shows the RMSE of the algorithms listed in Table 1 on the feature sets listed in Table 2. The bar
chart is grouped by algorithm, and each group’s bars are ordered by RMSE ascending. A lower RMSE is better.
Algorithms that can not use additional features perform the same on all feature sets. For these only, the ‘basic’
feature set is plotted. These results are aggregated over algorithm categories in Figure 1.
basic-no-stats stripped-no-stats feature-expansion-no-stats
basic-with-stats stripped-with-stats feature-expansion-with-stats
RecSys Models AutoML Machine Learning RecSys Matrix Factorization

Bayesian Factorization Machine

Deep Interest Network

Wide & Deep

FLAML

H2O

Auto-Sklearn

XGBoost

Random Forest

Histogram Gradient Boosting

Linear Regression

K Nearest-Neighbors

SVDpp (LibRecommender)
SVDpp (Surprise)
SVD
K Nearest Neighbors Baseline
Alternating Least Squares Biased
ItemItem
UserUser

Mean Predictor

0.85 0.90 0.95 1.00 1.05 1.10 1.15
RMSE (lower is better)

Overall, the results of our study indicate a clear preference for using more features. The important
caveat to this is the feature type and quality. As reported, the relation between feature quantity and
algorithm performance is not monotonic for AutoML and RecSys due to feature importance. The
evaluated model-based RecSys algorithms performed best in this study, but our evaluation shows how
even this gap can be closed by introducing additional features.
4. Discussion
We conclude that including statistical features is the most straightforward way to increase any algo-
rithm’s performance. Figure 2 shows that the mean rating per user and item is incredibly effective.
Additionally, the feature importance analysis shows that some statistical features are significant while
others are comparatively insignificant. This indicates the advantages of feature selection techniques
that were out of this research’s scope.
We obtained only simple additional features in this work. However, they positively impacted the
performance of most of the tested algorithms. The results may be significantly better if more care and
time are given to feature engineering. In addition to introducing new features, there are numerous
considerations regarding refining existing features, including those introduced in this work. One such
is that user IDs and item IDs, for example, are technically categorical features and should be treated
as such, which was not always the case within the experiments due to constraints in the algorithms.
There are also different ways to represent such categorical features, which should be explored in this
context. For example, we consciously decided to represent age as a categorical feature for one of the
feature sets. These decisions have a potentially enormous impact on the performance of any algorithm
and should be dealt with carefully.
The biggest challenge for implementing the discoveries in this paper is likely to find suitable new
features. Finding good data that supplies an existing dataset is a challenging task. However, collecting
such feature data from the start should be reasonably easy when recording a new dataset. Given our
results, we encourage future dataset collection tasks to include as many features as possible. Our
work has shown that, for our scenario, if the goal is prediction performance, it is highly likely that a
good selection of features combined with dedicated tuning will lead to better results. For future work
on algorithm evaluations, we propose that pipelines include feature engineering in terms of feature
quantity because it may significantly improve results.

References
[1] Z. Zaier, R. Godin, L. Faucher, Evaluating recommender systems, in: 2008 International Conference
on Automated Solutions for Cross Media Content and Multi-Channel Distribution, 2008, pp. 211–
217. doi:10.1109/AXMEDIS.2008.21.
[2] J. L. Herlocker, J. A. Konstan, L. G. Terveen, J. T. Riedl, Evaluating collaborative filtering recom-
mender systems, ACM Trans. Inf. Syst. 22 (2004) 5–53. URL: https://doi.org/10.1145/963770.963772.
doi:10.1145/963770.963772.
[3] V. Sugumaran, K. Ramachandran, Effect of number of features on classification of roller bear-
ing faults using svm and psvm, Expert Systems with Applications 38 (2011) 4088–4096. URL:
https://www.sciencedirect.com/science/article/pii/S0957417410010298. doi:https://doi.org/
10.1016/j.eswa.2010.09.072.
[4] A. L. Blum, P. Langley, Selection of relevant features and examples in machine learning, Ar-
tificial Intelligence 97 (1997) 245–271. URL: https://www.sciencedirect.com/science/article/pii/
S0004370297000635. doi:https://doi.org/10.1016/S0004-3702(97)00063-5, relevance.
[5] S. Khalid, T. Khalil, S. Nasreen, A survey of feature selection and feature extraction techniques in
machine learning, in: 2014 science and information conference, IEEE, 2014, pp. 372–378.
[6] S. Rendle, Factorization machines, in: 2010 IEEE International conference on data mining, IEEE,
2010, pp. 995–1000.
[7] H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado,
W. Chai, M. Ispir, et al., Wide & deep learning for recommender systems, in: Proceedings of the
1st workshop on deep learning for recommender systems, 2016, pp. 7–10.
[8] G. Zhou, X. Zhu, C. Song, Y. Fan, H. Zhu, X. Ma, Y. Yan, J. Jin, H. Li, K. Gai, Deep interest
network for click-through rate prediction, in: Proceedings of the 24th ACM SIGKDD international
conference on knowledge discovery & data mining, 2018, pp. 1059–1068.
[9] H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado,
W. Chai, M. Ispir, R. Anil, Z. Haque, L. Hong, V. Jain, X. Liu, H. Shah, Wide and deep learning for
recommender systems, in: Proceedings of the 1st Workshop on Deep Learning for Recommender
Systems, DLRS 2016, Association for Computing Machinery, New York, NY, USA, 2016, p. 7–10.
URL: https://doi.org/10.1145/2988450.2988454. doi:10.1145/2988450.2988454.
[10] M. Gridach, Hybrid deep neural networks for recommender systems, Neurocomputing 413 (2020)
23–30. URL: https://www.sciencedirect.com/science/article/pii/S0925231220309966. doi:https:
//doi.org/10.1016/j.neucom.2020.06.025.
[11] P. Covington, J. Adams, E. Sargin, Deep neural networks for youtube recommendations, in:
Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, Association
for Computing Machinery, New York, NY, USA, 2016, p. 191–198. URL: https://doi.org/10.1145/
2959100.2959190. doi:10.1145/2959100.2959190.
[12] A. Zheng, A. Casari, Feature engineering for machine learning: principles and techniques for data
scientists, " O’Reilly Media, Inc.", 2018.
[13] G. Dong, H. Liu, Feature engineering for machine learning and data analytics, CRC Press, 2018.
[14] G. Chandrashekar, F. Sahin, A survey on feature selection methods, Computers & Electrical
Engineering 40 (2014) 16–28.
[15] C. Seger, An investigation of categorical variable encoding techniques in machine learning: binary
versus one-hot and feature hashing, 2018.
[16] F. M. Harper, J. A. Konstan, The movielens datasets: History and context, ACM Trans. Interact.
Intell. Syst. 5 (2015). URL: https://doi.org/10.1145/2827872. doi:10.1145/2827872.
[17] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay,
Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–
2830.
[18] M. Lindauer, K. Eggensperger, M. Feurer, A. Biedenkapp, D. Deng, C. Benjamins, T. Ruhkopf, R. Sass,
F. Hutter, Smac3: A versatile bayesian optimization package for hyperparameter optimization,
2021. arXiv:2109.09831.
[19] T. Chen, C. Guestrin, XGBoost: A scalable tree boosting system, in: Proceedings of the 22nd
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16,
ACM, New York, NY, USA, 2016, pp. 785–794. URL: http://doi.acm.org/10.1145/2939672.2939785.
doi:10.1145/2939672.2939785.
[20] M. Feurer, A. Klein, J. Eggensperger, Katharina Springenberg, M. Blum, F. Hutter, Efficient and
robust automated machine learning, in: Advances in Neural Information Processing Systems 28
(2015), 2015, pp. 2962–2970.
[21] E. LeDell, S. Poirier, H2O AutoML: Scalable automatic machine learning, 7th ICML Workshop
on Automated Machine Learning (AutoML) (2020). URL: https://www.automl.org/wp-content/
uploads/2020/07/AutoML_2020_paper_61.pdf.
[22] C. Wang, Q. Wu, FLO: fast and lightweight hyperparameter optimization for automl, CoRR
abs/1911.04706 (2019). URL: http://arxiv.org/abs/1911.04706. arXiv:1911.04706.
[23] N. Hug, Surprise: A python library for recommender systems, Journal of Open Source Software 5
(2020) 2174. URL: https://doi.org/10.21105/joss.02174. doi:10.21105/joss.02174.
[24] M. D. Ekstrand, Lenskit for python: Next-generation software for recommender systems experi-
ments, in: Proceedings of the 29th ACM International Conference on Information & Knowledge
Management, CIKM ’20, Association for Computing Machinery, New York, NY, USA, 2020, p.
2999–3006. URL: https://doi.org/10.1145/3340531.3412778. doi:10.1145/3340531.3412778.
[25] massquantity, Librecommender, 2022. URL: https://github.com/massquantity/LibRecommender.
[26] T. Ohtsuki, myfm, 2022. URL: https://github.com/tohtsky/myFM.
[27] S. Forouzandeh, K. Berahmand, M. Rostami, Presentation of a recommender system with ensemble
learning and graph embedding: a case on movielens, Multimedia Tools and Applications 80 (2021)
7805–7832.
[28] U. Kuzelewska, Clustering algorithms in hybrid recommender system on movielens data, Studies
in logic, grammar and rhetoric 37 (2014) 125–139.
[29] Z. Yang, L. Xu, Z. Cai, Z. Xu, Re-scale adaboost for attack detection in collaborative filtering
recommender systems, Knowledge-Based Systems 100 (2016) 74–88.
[30] M. F. Aljunid, M. Dh, An efficient deep learning approach for collaborative filtering recommender
system, Procedia Computer Science 171 (2020) 829–836.
[31] U. o. M. Institute for Social Research, Median household income [2006-2010], 2020. URL: https:
//www.psc.isr.umich.edu/dis/census/Features/tract2zip/.