=Paper=
{{Paper
|id=Vol-2947/paper23
|storemode=property
|title=Quality and Diversity of Recommender Systems Users’ Choices: a Simulation Perspective
|pdfUrl=https://ceur-ws.org/Vol-2947/paper23.pdf
|volume=Vol-2947
|authors=Naieme Hazrati,Francesco Ricci
|dblpUrl=https://dblp.org/rec/conf/iir/Hazrati021
}}
==Quality and Diversity of Recommender Systems Users’ Choices: a Simulation Perspective==
<pdf width="1500px">https://ceur-ws.org/Vol-2947/paper23.pdf</pdf>
<pre>
Quality and Diversity of Recommender Systems
Users’ Choices: a Simulation Perspective
Discussion Paper

Naieme Hazrati, Francesco Ricci
Free University of Bolzano, Bolzano, Italy


                                      Abstract
                                      Recommender Systems (RSs) generate personalised suggestions for items and can influence collective
                                      users’ choices behaviour. The impact of operational RSs on users’ decisions can be assessed by analysing
                                      the actual choices’ diversity, and quality, e.g., the users’ satisfaction for their choices. But, in order to
                                      estimate the potential impact of an RS in new scenarios and for not yet deployed RSs, simulating user-
                                      system interactions can be valuable. We here illustrate a simulation framework consisting of users,
                                      items, and alternative RSs. We simulate users’ choices over consecutive time intervals, by assuming
                                      that an RS influences the users’ choices with recommendations. We measure global properties of the
                                      simulated choices, such as their diversity and quality. The obtained results, and the proposed simulation
                                      framework, can be used by a system designer in order to anticipate the effect of a candidate RS in its
                                      long term usage.

                                      Keywords
                                      recommender system, simulation, choice behaviour, diversity


1. Introduction
Recommender Systems (RSs) literature has shown that these information filtering techniques
can profoundly affect individuals’ choices [1]. Hence, nowadays, there is a growing attention
and concern to better understand how RSs can bias collective users’ choice behaviour. This
important topic has been studied mostly by relying on off-line analysis of their performance and,
more recently, by developing algorithmic simulations of the users’ choices for items when users
are also exposed to recommendations. Important metrics, such as the diversity and the choice
quality of the bulk of simulated choices have been considered [2, 3, 4, 5, 6]. These studies have
obtained some interesting results showing their validity and importance for understanding RSs’
effect on users [2, 3]. However, they are far from being complete in the analysis of the effect of
alternative usage conditions of the RS, e.g., the number of recommendations or the types of RSs.
Moreover, their reliability in properly simulating realistic usage settings could be improved. For
instance, some previous simulation studies made critical modelling simplifications of the user-
recommendation interactions, such as, assuming that the users can only choose recommended
items, or that users have simple preference models, which are not correctly estimated from the
observation of their past behaviour [3, 2].

IIR 2021 – 11th Italian Information Retrieval Workshop, September 13–15, 2021, Bari, Italy
" nhazrati@unibz.it (N. Hazrati); fricci@unibz.it (F. Ricci)
                                    © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
   In this short contribution, we illustrate a flexible simulation framework, initially proposed
in [4], that copes with some of the identified limitations: simple and unrealistic set of possible
choices; use of synthetic data sets of users and items; not taking into account the dynamic of
new users and new items. The simulation framework described in this paper addresses these
issues by leveraging existing data sets of real logged users’ choices (Amazon data sets), in order
to properly analyse the effect of an RS. We simulate an iterative choice-making procedure of a
collection of users in the presence of recommendations produced by alternative RSs, as well as,
when no RS is used. Simulated users make choices for a time interval (a month), and then the RS
is retrained by also considering the data of the simulated choices. This simulation is repeated
for a certain number of consecutive time intervals, while new items and new users can also
enter the system. The user’s simulated choices are influenced by the users’ utility for the items,
which are estimated by a de-biased rating prediction model [7]. The user’s simulated choices are
among the items in her ‘awareness set’, which contains the items that she is supposed to know,
plus the recommended items, which are added to the user’s awareness set. The persuasive effect
of the RS is simulated by increasing the perceived utility of the recommended items, which
makes the recommendations more likely to be chosen.
   Using the above-mentioned simulation framework, we focus on the following important and
yet not clearly answered research questions:
   - RQ1: How personalised and non-personalised RSs affect the evolution of choice diversity?
What features of the RSs determine their specific impact?
   - RQ2: Do personalised RSs suggest items that users rate higher than non-personalised RSs?
   - RQ3: Does a larger users’ awareness of the catalogue of the items, i.e., being aware of a
larger number of items, lead to better choices, that is, higher users’ rating for the choices?
   In the remaining part of this article we first outline the most important modeling aspects of
the proposed simulation framework and we then illustrate its usage, by leveraging behaviour
data stored in two Amazon data sets.


2. Simulation
We simulate the iterative process of users’ choice making for items. Users select items in
monthly time intervals. We use true users’ choices data, collected by an eCommerce platform
(Amazon), up to a certain time point 𝑡0 , as starting point of the simulation. We use this initial
data set to train an RS, and then we simulate the users’ choices in successive months. At the end
of each month, the RS is re-trained by adding to the training data also the simulated choices of
that month. To simulate users’ choices, we estimate the users’ preferences for the items (ratings).
This is performed with another model, different from the RS, which computes an unbiased
prediction of the ratings that the simulated users would give to the items [7]. This prediction
model is trained by using the full data set of true users’ choices, in order to model as correctly as
possible the preferences of the real users, which are here simulated. This ‘unbiased’ prediction
model disentangles the predicted ratings from observation biases, leading to predicted ratings
that better represent users’ intrinsic preferences.
   In each month interval, the users select items one after another. When a user 𝑢 is simulated
to make a choice, first her awareness set 𝐴𝑢 is built. This is the set of items that the user is
supposed to know and that can choose. We assume that 𝐴𝑢 contains a fixed number of items
(e.g., 2000) among the most popular ones (most frequently chosen previously) and with the
largest estimated ratings for 𝑢. In other words, we assume that 𝑢 has some knowledge of the
items’ catalogue, which is not derived from the recommendations; this knowledge is assumed
to be influenced by available information on the items and by the user’s preferences. Then, an
RS suggests a set of items to 𝑢 (50 in our study), which are added to the user’s awareness set.
Finally, the user makes a choice based on a multinomial logit choice model (MLM). We adopt
this model because it is a simple but effective approach, which has been previously validated.
In MLM a user’s choice is drawn by using a probability distribution over the possible choices
(i.e., the items in 𝐴𝑢 ):

                                                         𝑒𝑣𝑢𝑖
                                 𝑝(𝑢 𝑐ℎ𝑜𝑜𝑠𝑒𝑠 𝑖) = ∑︀         𝑣𝑢𝑗
                                                       𝑗∈𝐴𝑢 𝑒
Here, 𝑣𝑢𝑖 is the utility of the item 𝑖, and, if the item is not recommended, 𝑣𝑢𝑖 is proportional
to the estimated rating of the item ˆ𝑟𝑢𝑖 . Conversely, if the item 𝑖 is recommended, the utility is
supposed to be larger, i.e., 𝑣𝑢𝑖 ∝ 𝛿 * ˆ𝑟𝑢𝑖 , with 𝛿 > 1. That is, if an item is recommended, the
persuasive effect of the RS is modelled as if it is giving to the simulated user the impression
that the item is more valuable, and therefore, according to MLM definition, it is also more likely
to be chosen.
   Five rather different RSs are studied in our simulation:
    • PCF - Popularity-based CF: is a nearest neighbourhood collaborative filtering RS that
      suggests the most popular items among the choices of nearest neighbour users [2].
    • LPCF - Low Popularity-based CF: is similar to PCF, but it penalises the score of popular
      items, computed by PCF, by multiplying it with the inverse of their popularity [2].
    • FM - Factor Model: is a collaborative filtering RS based on matrix factorization [8].
    • POP - Popularity-based: is a non-personalised RS that recommends the most popular
      items to the users.
    • AR - Average Rating: is another non-personalised RS that recommends items with the
      highest average ratings.
   We analyse the simulated choices by considering metrics that are computed on the set of all
the simulated choices of the users, and make clear the variety of the effects of RSs on users’
choice behaviour: (a) the Gini index of the chosen items [2]; (b) Choice Coverage of the catalogue
(percentage) [3]; (c) Popularity of the chosen items; (d) Average predicted rating of the choices
(Choice’s Rating) [3]; (e) Recommendation Coverage of the catalogue (percentage); (f) Probability
of acceptance of the recommendations (Recommendation Acceptance).
   We use two Amazon collection’s data sets to conduct the simulation: APPS and Games. [9].
These data sets contain timestamped ratings of users for items distributed on several months.
Ratings signal actual choices, as are normally provided after the purchase. In our analysis,
we simulate choices performed in the last ten months (of the recorded data), while using the
previous months’ data to bootstrap the simulation with the required observations of previous
users’ choices.
3. Results
A selection of the simulation results are shown in Table 1, where the values of the considered
metrics, are computed at the end of the ten months of simulated choices. LPCF produces a
high diversity (lower Gini), comparable to when no RS is used (Apps data set) and even larger
than that (Games data set). Hence, in practice, only relying on methods that explicitly penalise
popular items, as in LPCF, an RS can increase choice diversity compared with a baseline where
no RS is influencing users’ choices. In general, personalised RSs produce a larger choices
diversity than non-personalised ones, which instead encourage choices over more popular items
(Popularity metric) and cover a smaller part of the catalogue (Choice Coverage metric).

Table 1
Diversity and quality of the simulated choices in Amazon APPS and Games data sets.
                              Gini                                           Choice Coverage                           Recommendation Coverage
 RS →    PCF    LPCF     FM     POP       AR     No RS    PCF      LPCF       FM        POP       AR       No RS    PCF    LPCF    FM     POP    AR
 Apps    0.82   0.76     0.80    0.89     0.91   0.74     0.50      0.56      0.52      0.42      0.38      0.60    4.28    0.43   0.62   0.03   0.01
 Games   0.89   0.83     0.89    0.96     0.96   0.89     0.20      0.27      0.20      0.13      0.13     0.191    4.43    0.24   0.41   0.04   0.01
                       Choice’s Ratings                                         Popularity                            Recommendation Acceptance
 RS →    PCF    LPCF     FM     POP       AR     No RS    PCF      LPCF       FM        POP       AR       No RS    PCF    LPCF    FM     POP    AR
 Apps    4.06   4.00     3.98    4.18     4.28   3.97    0.00035   0.00021   0.00024   0.00216   0.00232   0.0001   0.41   0.34    0.34   0.49   0.56
 Games   4.25   4.18     4.18    4.39     4.43   4.25    0.00028   0.00018   0.00021   0.00132   0.00138   0.0002   0.51   0.41    0.43   0.57   0.57


   In the second research question, we ask if personalised RSs produce better choices, i.e., with
higher Choice’s Rating, compared to non-personalised RSs. Surprisingly, non-personalised
RSs result in choices for items with a larger predicted rating than personalised RSs. Hence,
non-personalised RSs can be strong baselines if the goal is to nudge users to choose items that
they will like. This means that if there is no need to diversify the choices, a non-personalised
RS may suffice.
   In the third research question, we ask if having a larger awareness set size, i.e., a better
knowledge of the items’ catalogue, results in a higher Choice’s Rating. The results, not presented
here for lack of space, show that when awareness set grows, choices are more diverse, and there
is also a clear decrease in the acceptance of the recommendations, which leads to choices with
smaller Choice’s Rating. Hence, being aware of more items does not help users to make better
choices but helps them to make more diverse ones. This result is due to the fact that if the users
make choices among the items belonging to a larger set of options then there is an increased
probability to choose among less good items, but more diverse.


4. Conclusion
We have illustrated a simulation framework that is able to produce a simulated, but realistic,
succession of users’ choices for items. Users’ preference data are extracted from a data set of
observed choices. Users are supposed to make choices among two types of items: that they
are likely to know and that an RS explicitly suggest to them. Users’ simulated choices are
determined by a choice model that is based on the estimated utility of the items. This approach
enables to study the effect of RSs on users choices going beyond previous studies that were
limited to the analysis of properties and biases of the recommendation algorithms alone [10].
We have obtained interesting findings, such as, non-personalised RSs can produce choices that
are rated higher than those produced by personalised RSs. Moreover, we found that choices on
average are rated lower when the awareness set size of the simulated users is larger.
   These, and other findings not here described for lack of space, shed light on the complex
effect of exposing users to recommendations. The practical value of this study is related to
possibility to anticipate, without deploying an RS, the potential effect of the RS on the users’
choices. Hence, by using the proposed framework, developers can better pre-select candidate
RS algorithms, hence also reducing potential undesired negative effects and the system.


References
 [1] F. Ricci, L. Rokach, B. Shapira, Recommender systems: introduction and challenges, in:
     Recommender systems handbook, Springer, 2015, pp. 1–34.
 [2] D. Fleder, K. Hosanagar, Blockbuster culture’s next rise or fall: The impact of recommender
     systems on sales diversity, Management science 55 (2009) 697–712.
 [3] Z. Szlávik, W. Kowalczyk, M. Schut, Diversity measurement of recommender systems
     under different user choice models, in: Fifth International AAAI Conference on Weblogs
     and Social Media, 2011.
 [4] N. Hazrati, M. Elahi, F. Ricci, Simulating the impact of recommender systems on the
     evolution of collective users’ choices, in: Proceedings of the 31st ACM Conference on
     Hypertext and Social Media, 2020, pp. 207–212.
 [5] S. Yao, Y. Halpern, N. Thain, X. Wang, K. Lee, F. Prost, E. H. Chi, J. Chen, A. Beutel, Mea-
     suring recommender system effects with simulated users, arXiv preprint arXiv:2101.04526
     (2021).
 [6] D. Bountouridis, J. Harambam, M. Makhortykh, M. Marrero, N. Tintarev, C. Hauff, Siren:
     A simulation framework for understanding the effects of recommender systems in online
     news environments, in: Proceedings of the Conference on Fairness, Accountability, and
     Transparency, ACM, 2019, pp. 150–159.
 [7] T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, T. Joachims, Recommendations as
     treatments: Debiasing learning and evaluation, in: international conference on machine
     learning, PMLR, 2016, pp. 1670–1679.
 [8] Y. Hu, Y. Koren, C. Volinsky, Collaborative filtering for implicit feedback datasets, in: 2008
     Eighth IEEE International Conference on Data Mining, Ieee, 2008, pp. 263–272.
 [9] R. He, J. McAuley, Ups and downs: Modeling the visual evolution of fashion trends with
     one-class collaborative filtering, in: proceedings of the 25th international conference on
     world wide web, 2016, pp. 507–517.
[10] J. Huang, H. Oosterhuis, M. de Rijke, H. van Hoof, Keeping dataset biases out of the
     simulation: A debiased simulator for reinforcement learning based recommender systems,
     in: Fourteenth ACM Conference on Recommender Systems, 2020, pp. 190–199.

</pre>