1. Introduction

Italian Information Retrieval Workshop, September

10.1145/3308558.3313710

Measuring the Ranking Quality of Recommendations in a Two-Dimensional Carousel Setting

Nicolò Felicioni

Maurizio Ferrari Dacrema

Fernando B. Pérez Maurera

0 1

Paolo Cremonesi

1 0 ContentWise , Milan , Italy 1 Politecnico di Milano , Milan , Italy

2021

1 3 15

Movie-on-demand and music streaming services usually provide the user with multiple recommendation lists, i.e., carousels, in a two-dimensional user interface, each generated according to diferent criteria (e.g., TV series, popular artists, etc.). In this two-dimensional setting it is not appropriate to use traditional ranking metrics designed for a single ranking list. It is well known that users do not explore a two-dimensional interface one row at a time, but rather focus their attention in a triangular area at the top-left corner. Furthermore, it is frequent for user interfaces to hide some items or lists due to space constraints, which can be shown by performing certain actions (i.e., click, swipe). In this paper we extend the widely used NDCG to a two-dimensional recommendation setting with a formulation that allows to account both the two-dimensional user exploration behaviour and interface-specific design. We also compare the proposed extension against single-list NDCG highlighting that they can lead to a diferent choice of the optimal algorithm in ofline evaluation.

eol>Recommender Systems User Interface Evaluation

1. Introduction

Traditionally, in the Information Retrieval and Recommender Systems domains, the objective has been to provide the user with the best possible ranked list of results [ 1, 2, 3 ]. For this reason, many metrics were developed to evaluate the quality of a one-dimensional ranked list. A common assumption is that users will navigate the list according to its order, therefore it is better for a correct recommendation to be at the beginning of the list.

There are however several scenarios that do not fit into these assumptions, mainly when the results are presented in a two-dimensional grid rather than a single list. This is true both in information retrieval [ 4 ] and in recommendation systems applications, in particular for video-on-demand streaming services [ 5, 6, 7 ] and music streaming platforms [ 8, 9 ]. Those services usually provide users with multiple rows of thematically coherent recommendations (e.g., the most popular movies, a specific genre, new releases, and so on, see Figure 1). These rows are referred to as widgets, shelves or as carousels.

A simple way to adapt one-dimensional ranking metrics to a two-dimensional interface is to concatenate all recommendation lists into a single one. This strategy does not make realistic assumptions and, we argue, is not appropriate. First, it is known that users do not explore each carousel sequentially from the first to the last, as concatenating them assumes. Rather, users start from the top-left corner of the screen and proceed to explore the items both to the right and to the bottom [ 10, 11 ]. This efect is also known as "golden triangle" or "F-pattern". A visual example from an information retrieval application [ 4 ] is shown in Figure 2. Another example from a video streaming service [ 7 ] is shown in Figure 3. In addition to this user behaviour, many websites and mobile applications present carousels that are swipeable [ 9 ], i.e., the user can swipe horizontally or vertically to reveal more items as well as lists that were not previously visible. This is a common way to overcome the limited space available in the user interface allowing to fit more recommendations and carousels that the user can easily browse. However this puts additional overhead on the user that has to actively interact with the system to access the recommendations. Hence, it is preferable for a correct recommendation to be visible with the least possible number of user actions, as also noticed in [ 12 ].

In order to take those factors into account, in this paper we propose to extend the onedimensional NDCG metric to consider both the two-dimensional user exploration behaviour and the user interface characteristics. We show that the two metrics can lead to diferent results when used to select which recommenders to use in the carousel interface.

The rest of the paper is organized as follows, in Section 2 we summarize the characteristics of a carousel setting, in Section 3 we formulate an extended version of NDCG, in Section 4 we perform an ofline comparison of the results in a single list and carousel interface. Finally in Section 5 we draw the conclusions. ion 2 t i s o P ow 3 R 0 1 4 5 104 103 102 FFigiguurree 23::HVeiastumaalipzaotfiothneonfutmhbeenruomfibneterroafctuiosnesr per posiintitoenraocntitohnesscorneeena. cMhopstoisnitteioranctoend aiteumsesrarinetleorcfaatecde i[n7]th.e A drsetmroawrcsaatniodnobnetthweeernstthpeosfiitriostnsanodf tsheecolinstd. Vhaallufeosfare log-scaled.

the columns is visible.

Episodes of TV series 123, 831 85.36% 2. MCovihesaracteristics o1f3, a733Caro9.u47s%el Settsiinnggle item are 23, 939 and 1 (6, 260 and 1 if counting interactions TV Movies and shows 5, 722 3.94% from impressions), respectively.

Movies and clips in series 1, 788 1.23% For impressions with direct links to interactions, the average The carousel interface layo1u4t5, a07n4d the w10a0y% it is usunaullmybegreonf einrtaetraecdtiobnys rveciediveeod-openr-idmepmresasniodn ias n2,dwhere the

Total music streaming platforms has important characterimstaixci msutmhaatnddtihsetimnigniumiusmh niutmfrboermof aintseirnacgtiloen-slirsecteived by a setup [ 13 ]: single item are 213 and 1, respectively. 5.1 Analysis of the dataset In Figure 2, we show a heatmap that indicates the most interacted positions of the recommendation lists based on the row position on CIonntteentrWf aisce eIm: prAesstiwonoscdonitmainesn1s0,io45n7,a8l10uisneterraicntitoenrs;f3a0c7e,45w3 ith mthuelstcirpeleen.cSapreociucsaelllys,.wSeosmeeethcatamrooustsientlesraocrtiornesch-appen beimpressions with direct links to interactions; and 23, 342, 617 impressions owithout direct link to interactions. The dataset also con

mmendations may be hidden due to limited tpwaegenetshiezersatnthdreebreowacpcoseitsisonibs,laendotnhley rvstiatenuisteemr positions. tains 42, 1a53ctuisoernss;1(4i5.,e0.7,4ciltiecmks, asnwd2ip8,e8)8.1 series.

In Table 4, we highlight the distribution of the interactions when 5.2 Comparison with other datasets grRouepceodbmy imnteeranctdioanttiyopen, ws:herTeh97e.8l%isotfsthsehdoatwasent its ocotmhperisuedsers aAres pgreevnioeursalytemdenwtiiotnheddiinferSeecntitonal2g, ocurrirtehnmtlys,noorimpressions of view and access interactions. Similarly, in Table 5, we present datasets are publicly available to the community. As such, we gaththe distribution of interpacrtoiovnisdbeyristeamntdypne,owshienrgel9e6.p23o%sto-fpthreocessienregd and reporatepdptlhieeidrs.tatistics useinagchtheinondeisvdiedsucraibled on works by diferent step is While interactiornesccoormresmpoennddtoaetpioisnodelsisotf TdVoseesriens oantdcmoonvtiaesi.nLadstulyp,licattehsa,t tuhseed sthaomseedaittaesmets.may appear in multiple in Table 6c,wareoshuosweltsh.e1d2istribution of item types, where the same To the best of our knowledge, ContentWise Impressions is the episodes of TV series and movies item types represent 94.83% of the rst dataset with impressions to be open-sourced. In previous total items. years, other articles have used private datasets[ 7, 14 ], which were UWseeorbsBerevhedathvaitouuserrs:, itTemhse, aunsdesrerwiesi,llprfeoscenutsloonng-ttahiledtisotrpi--left tnroitarneglelaeseodftoththeesccormemenunriatyt.hOethretrhs awnereexdpislcolorsiendgunder nonbutions. Ftohr eusecras,r2o7u.9s6e%lmsossetpqoupeunlatriuaslelrys.areFausrsotchiaetremdwoirthe, theryediwstriilbluteioxnpclloaurseestohnechraellcenogmes[m1, e2,n1d3,a2t0i]o, nwsherienonly a few 80% of thedinifeterreancttions. For items, 12.06% most popular items corret-hey nreeseedarchers have access to them. Furrethveeramlotrhe,eCmon.te3ntWise Imways according to which actions to perform in order to spond with 80% of the interactions. For series, 4.05% most popular pressions provides both impressions present in the interactions series appear in 80% of the interactions. and without any associated interaction. Both LinkedIn PYMK Im

ThWeahveirlaegeanucmabreoruofsienltelraacytioonustpemruasyer isse2e48m(22siif mcouilnatirngto a tprraedssiitoinosnaandl LminekregdIen-Sliksiltl Eenmdobrseedmdeinntg[ 1,4 ]walhsoeprreesent both dimrecutlitnitperleactrieoncsofmrommiemnpdreastsiioonns)l,iwshtearreethceommaxbi minuemdainndttoheone, tihmipsreisssinonost. Othnethceaosthee.rIhnanad,roetahelrsdcaetansaetrsi[o1,, 1t3h]eornely provided minimum number of interactions made by a single user are 13, 517 impressions present in the interactions. and 2 (2, 886 and 1 if counting direct interactions from impressions), Another advantage of ContentWise Impressions is that it is subrespec1tiAvelsyi.gnificant example are content aggregators, which comsabminpeledcainroauwsealys tforobemeadsiifelyreunsatbplerofovridreesresa:rcNh eptuflixrp, oses without YoFourt uitbeem,sP,rthimeaevVeriadgeeon,uetmcb.er of interactions received per item requiring signi cant computation resources. While researchers can is 72 (22F5oirf ecxouamntpinlge, iinntetrhacetiNonestflixfrohmomimeparegsesisohnos)w,wnhienreFitghuere 1 thinedTeeVd spereripersoceSspsaacnedFsourbcseamapppleebairgsgberotdhatianseths,eifTnVeeded, this mCaoxmimeudmiesanadntdheNmewinRimeluemasnesu mcabreoruosfeilnst.eractions received by a may result in di erent articles relying on di erent subsampling, 3Usually users tend to navigate more easily with simple swipes rather than repeated mouse clicks, hence their behaviour, as it is known, will change according to the device. are multiple constraints. First, the carousels may be generated by diferent content providers, each of them unaware of how the other lists are generated or by whom. This means that the composition of the layout as well as the recommendations of the other providers are, in general, not known. It is for this reason that diferent carousels may contain similar recommendations. Furthermore, a content provider that wishes to select the optimal carousels to display has limited degrees of freedom and can only alter the content and relative ordering of those it is tasked to provide. Finding strategies to select the optimal carousel layout is a complex problem [ 14 ].

3. Extending one-dimensional NDCG

One of the most used metrics for ranked list evaluation is the Discounted Cumulated Gain (DCG), as well as its Normalized version (NDCG) [ 15, 16 ]. This metric comes from the information retrieval domain and is widely used to evaluate recommendation systems. The DCG metric relies on two assumptions:

1. highly relevant results are more valuable for a user;

2. within a list of results, it is preferable to have relevant results in the first positions Let be the recommendation list length, i.e., cutof, and () the relevance of the item in position . The DCG is defined as the following discounted sum of gains:

= ∑︁ () · ()

=1 The function is responsible for rewarding highly relevant results, while the function introduces a penalization that should increase the further the item is from the beginning of the list.

One of the most used formulations for the DCG is the following [ 17 ]:

= ∑︁ 2() − 1

=1 log2( + 1)

Hence, () = 2() − 1 and () = log2(1+1) . Notice that this formulation is only one of many possible formulations for the DCG. Several other ways of rewarding and discounting results have been proposed in previous research [ 18, 19 ]. In the following, we will start from this formulation and extend it since it is one of the most used. Other types of gain and discount functions can be extended in an analogous way. We leave the analysis of diferent gains and discounts as future work.

In a two-dimensional scenario, the standard DCG definition could be naively adapted in the following way. Let ℎ be the horizontal dimension of the interface (i.e., the length of each carousel) and the vertical dimension of the interface (i.e., the number of carousels). The carousels can be concatenated in a single list of length = · ℎ items on which the standard DCG formulation can be applied. This strategy assumes that the users will explore all carousels sequentially, from the first to the last, which, as previously discussed, is not consistent to the user behaviour and does not account for the interface navigation constraints. Therefore, we suggest researchers do not apply this strategy as it does not represent a realistic scenario.

Thus, inspired by [ 15 ], we make the following assumptions the two-dimensional DCG should meet:

1. highly relevant results are more valuable for a user;

2. a relevant result is valuable to the user only when it is first seen; 3. within a grid of results, it is preferable to have relevant results close to the top-left corner 4. it is preferable that relevant items are immediately visible to the user or can be made visible with few user actions

In order to account for this set of assumptions, we propose to extend the metric in the following way:

ℎ 2 = ∑︁ ∑︁ (, ) · (, )

=1 =1

As in the one-dimensional version, the function is responsible for rewarding highly relevant results, according to assumptions (1) and (2). The function, instead, should account for the penalty related to the position and number of user actions, according to assumptions (3) and (4).

Inspired by the one-dimensional version, we fix (, ) = 2(,) − 1. Instead, the will depend on the position in the layout, allowing ample freedom on how to define it in diferent use cases.

The normalized version of this metric, N2DCG will be defined as as 2 = 2/2. I2DCG will be the 2DCG of the ideal ranking. In a single list setting the ideal ranking is the list which contains the relevant items in decreasing relevance from the beginning of the list. In the generalized two-dimensional layout it contains the user’s most relevant items, ranked according to decreasing relevance in positions with decreasing position discount. The ideal ranking meets the following constraints: for any pair of cells (, ), (, ) of the matrix, (, ) ≥ (, ) if (, ) > (, ).

Relevance As stated in assumption (2), a relevant item is valuable for the user only when it is ifrst encountered. This means that if a relevant item appears multiple times, each in a diferent carousels, it should be considered as relevant only in its best position. We define such position as the one with the highest discount. Function (, ) should be modified accordingly. Single List Discount It is possible to represent in this formulation the traditional single list DCG by calculating the position of cell in coordinates , if all carousels lists would be concatenated:

(, ) = (2(( − 1) · ℎ + + 1))− 1

As previously mentioned, this formulation is not grounded in a realistic scenario because it does not reflect the user behaviour (see Figure 4a), therefore we argue it should not be applied. (a) Single list.

(b) Golden triangle behaviour. ion 3 it sseuopo l r aC 4 1 2 5 6 1 2

Reco3mmendation po4sition 5

6 (c) Golden triangle and user actions penalty.

Golden Triangle Discount In order to account for the golden triangle behaviour, as per assumption (3), the position discount should decrease as the distance of the cell from the top-right corner increases:

(, ) = (2( · + · ))− 1

The coeficients , are two weights that can be used to account for diferent types of user behaviors. For instance, let us assume a scenario where users are more inclined to explore the

...

Figure 5: An example interface where 3 carousels, with 4 items each, are visible. A horizontal swipe reveals 4 items, while a vertical swipe reveals one additional carousel. vertical dimension. In this case, should be set to a low value in order to penalize less the vertical dimension. In order to make the discount start from 1, and should be ≥ 1 since the base of the logarithm used is 2. Notice that this is true only because we are extending a logarithmic discount function. For other discount functions [ 18, 19 ] the constraints can change.

The resulting discount is shown in Figure 4b (we set = = 1 for simplicity). User Actions Discount Lastly, in order to account for assumption (4) the position discount should decrease the more actions are required by the user to make that position visible. In a carousel interface there is an initial rectangular portion of the recommendations that are immediately shown to the user. We refer to the number of items visible as ℎ and to the number of carousels visible as , see Figure 5. In order to reveal more items, the user needs to perform a certain action, i.e., click on a desktop, swipe on mobile devices. Each of these actions will reveal a certain number of new items within the currently visualized recommendation lists. Diferent platforms and devices will correspond to diferent swipe steps, i.e., the number of items that will be revealed after a single swipe. We will call this quantity ℎ ∈ {1, 2, . . . , ℎ}. For example, on Netflix every click will replace all items displayed on the clicked carousel, in which case ℎ = ℎ. The same principle holds for the vertical dimension, where the user can navigate performing actions that will each display new carousels.

Based on this definition, we now add to the triangle penalty a term to account for the number of actions that the user will need to perform in order to visualize the item. To do so we define some auxiliary functions. The first one is used to check whether at least a user action, i.e., swipe, is needed to visualize that item in a certain position given that the interface initially shows positions: isSwipeNeeded(, ) = {︃1, if − > 0 0, otherwise Then, we define a function to count the number of actions needed to visualize an item, given that each action shows positions: swipes(, , ) = isSwipeNeeded(, ) · ︂⌈ − ⌉︂ In the particular case where = , calculating the number of swipes becomes simpler: swipes(, ) = ︂⌊ ⌋︂

The final discount will account for both the triangle discount and the number of user actions, as previously defined: (, ) = (2( · + · + · swipes(, , ))

+ · swipes(, ℎ, ℎ))− 1

Notice that this formulation accounts for both vertical and horizontal swipes. The coeficients , , , are four positive weights that can be used to account for diferent types of user behaviors. The first two weights ( and ) control the general penalization of the vertical and horizontal dimensions, respectively. As we previously said, they should be ≥ 1 in order for the total discount to start from 1. Controlling and , instead, it is possible to penalize more or less the user actions needed to reveal a certain item. For example, it could be that items presented together in the same carousel have a similar probability of interaction (see the first 10 elements of the first carousel in Figure 3). Hence, the horizontal dimension should be penalized less. Another possibility is that, on a desktop device, the horizontal swipe done with a mouse click will have a higher weight than the same swipe done with a touch on a mobile device.

For illustrative purposes, let us consider a possible scenario for a mobile device, where the screen contains 4 carousels and 3 recommendations each. We set the horizontal and vertical steps to 1, , , , are set to 1 as well. The resulting discount is shown in Figure 4c.

4. Experiments

In this section we provide an example of the diferent behaviour of NDCG and N2DCG in an ofline experimental scenario. We consider a setting where given a set of recommendation models and a certain number of carousels, the goal is to select which models to use to generate each carousel. We show that the two metrics yield to diferent carousel layouts. In order to represent a scenario where a carousel interface would be used, we selected the widely known movie recommendations dataset MovieLens10M dataset [ 20 ], containing 70k users, 10k items and 10M ratings.

The set of models that can be selected, i.e., M, contains several simple and widely known models that have shown to provide competitive results in recent evaluations [ 21 ]. For NonPersonalized models we selected a TopPopular recommender. As KNN models we included ItemKNN [ 22 ] and UserKNN [ 23 ], both computing the similarity with cosine and shrinkage. We included the Graph-based models P3 [ 24 ] and RP3 [ 25 ], which define a bipartite graph of users and items and simulate a random walk. We added various Matrix Factorization models, some developed for explicit interactions: PureSVD [ 2 ], FunkSVD [ 21 ] and Non-negative MF (NMF) [ 26 ]; as well as others developed for implicit interactions: MF BPR [ 27 ], IALS [ 28 ]. We included the widely known Item-Based machine learning models SLIM [ 29 ], SLIM BPR and the more recent EASE [ 30 ]. Finally, we included the Content-based model ItemKNN CBF, which computes the item similarities from item features. using cosine similarity with shrinkage.

Optimizing NDCG Optimizing N2DCG UserKNN FunkSVD NMF IALS

MF BPR

SLIM

FunkSVD

UserKNN MF BPR

NMF IALS

We split the data by randomly selecting 80% of interactions for the training set and 10% for validation and test set. Each model was optimized on the validation data, following the best practices and value ranges reported in [ 21 ], using a Bayesian search with 50 cases.

Since the purpose of this paper is not to propose an algorithm for the selection of carousels but to show that the two metrics lead to diferent results, we rely on a simple greedy strategy. At the beginning the page is empty and all candidate algorithms are evaluated independently on the validation data. The model with the best recommendation quality is selected as first carousel. The process repeats for the following carousels, however, in this case, the candidate model will be evaluated by taking into account all the previous carousels. According to the definition of relevance provided in Section 3, a correct recommendation of an item by the candidate model may overtake another of the same item in a previous carousels if it has a better position discount. For example, a correct recommendation at the end of the second carousel could be overtaken by the same recommendation but at the beginning of the third carousel, if it has a better position discount.

We repeated this procedure first optimizing NDCG, and then optimizing N2DCG. We consider a hypothetical interface with a total of 6 carousels, each composed of 10 items. The interface will initially show 3 carousels and 2 items. The user can display 1 additional item in a given carousel with each horizontal swipe and 1 new carousel with a vertical swipe. For this interface, we set = = 1 and = = 2, in order to penalize more the swipes.

The resulting layouts are shown in Table 1. As we can see, the layouts have almost completely diferent orders of the chosen algorithms. For instance, optimizing N2DCG results in selecting SLIM as the first carousel, while the same algorithm was selected at the bottom of the layout that optimizes one-dimensional NDCG. UserKNN instead was the first algorithm when optimizing NDCG, but it is only the third carousel during N2DCG optimization.

Notice also how the 6 algorithms selected in both procedures are the same, only the order changes. Indeed, it is expected that NDCG and N2DCG will not produce completely diferent layouts but will difer the longer and more pronounced the efects of user actions become.

5. Conclusions

In this paper we have described a user interface with multiple carousels, typical of movie-ondemand and music streaming services, and based on its characteristics proposed an extended version of the widely used NDCG metric. The proposed formulation accounts for the known user behaviour of exploring the pages not one row at a time but focusing on the top-left corner and then navigating in both directions. The proposed formulation also allows to penalize correct recommendations that are only visible to the user after performing actions. Lastly, we show that the two metrics can lead to the selection of a diferent carousel layout. Future works include validating the proposed metric with user studies as well as applying it to select the optimal carousel layout, by defining which is the best carousel to put in a certain position or which is the best ordering of a given set of carousels. Also, further studies can be done on diferent gain and discount functions, similar to previous research works conducted on the one-dimensional DCG.

[1]

J. L.

Herlocker ,

J. A.

Konstan ,

L. G.

Terveen ,

Riedl , Evaluating collaborative filtering recommender systems , ACM Trans. Inf. Syst . 22 ( 2004 ) 5 - 53 . URL: https://doi.org/10.1145/ 963770.963772. doi: 10 .1145/963770.963772.

[2]

Cremonesi ,

Koren ,

Turrin , Performance of recommender algorithms on top-n recommendation tasks , in: X. Amatriain , M.

Torrens , P.

Resnick , M. Zanker (Eds.), Proceedings of the 2010 ACM Conference on Recommender Systems, RecSys 2010 , Barcelona, Spain, September 26-30 , 2010 , ACM, 2010 , pp. 39 - 46 . URL: https://doi.org/10.1145/1864708. 1864721. doi: 10 .1145/1864708.1864721.

[3]

Sanderson , W. B. Croft , The history of information retrieval research , Proc. IEEE 100 ( 2012 ) 1444 - 1451 . URL: https://doi.org/10.1109/JPROC. 2012 . 2189916 . doi: 10 .1109/JPROC. 2012 . 2189916 .

[4]

Chierichetti ,

Kumar ,

Raghavan , Optimizing two-dimensional search results presentation , in: I. King , W. Nejdl , H. Li (Eds.), Proceedings of the Forth International Conference on Web Search and Web Data Mining, WSDM 2011 ,

Hong

Kong , China, February 9- 12 , 2011 , ACM, 2011 , pp. 257 - 266 . URL: https://doi.org/10.1145/1935826.1935873. doi: 10 .1145/1935826.1935873.

[5]

Wu ,

C. V.

Alvino ,

A. J.

Smola ,

Basilico , Using navigation to improve recommendations in real-time , in: S. Sen,

Geyer ,

Freyne , P. Castells (Eds.), Proceedings of the 10th ACM Conference on Recommender Systems , Boston, MA, USA, September 15 - 19 , 2016 , ACM, 2016 , pp. 341 - 348 . URL: https://doi.org/10.1145/2959100.2959174. doi: 10 .1145/2959100. 2959174.

[6]

Elahi ,

Chandrashekar , Learning representations of hierarchical slates in collaborative ifltering , in: R. L. T. Santos , L. B.

Marinho , E. M.

Daly , L.

Chen , K.

Falk , N.

Koenigstein , E. S. de Moura (Eds.), RecSys 2020: Fourteenth ACM Conference on Recommender Systems , Virtual Event, Brazil, September 22-26 , 2020 , ACM, 2020 , pp. 703 - 707 . URL: https://doi. org/10.1145/3383313.3418484. doi: 10 .1145/3383313.3418484.

[7]

F. B.

Pérez Maurera ,

M. Ferrari

Dacrema ,

Saule ,

Scriminaci ,

Cremonesi , Contentwise impressions: An industrial dataset with impressions included , in: M. d'Aquin , S.

Dietze , C.

Hauf , E. Curry, P.

Cudré-Mauroux (Eds.), CIKM '20: The 29th ACM International Conference on Information and Knowledge Management , Virtual Event, Ireland, October 19-23 , 2020 , ACM, 2020 , pp. 3093 - 3100 . URL: https://doi.org/10.1145/3340531.3412774. doi: 10 .1145/3340531.3412774.

[8]

Gruson ,

Chandar ,

Charbuillet ,

McInerney ,

Hansen ,

Tardieu ,

Carterette , Ofline evaluation to make decisions about playlistrecommendation algorithms , in: J. S. Culpepper,

Mofat ,

P. N.

Bennett , K. Lerman (Eds.), Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM 2019 , Melbourne , VIC , Australia, February 11-15 , 2019 , ACM, 2019 , pp. 420 - 428 . URL: https://doi.org/10.1145/ 3289600.3291027. doi: 10 .1145/3289600.3291027.

[9]

Bendada , G. Salha, T. Bontempelli, Carousel personalization in music streaming apps with contextual bandits , in: R. L. T. Santos , L. B.

Marinho , E. M.

Daly , L.

Chen , K.

Falk , N.

Koenigstein , E. S. de Moura (Eds.), RecSys 2020: Fourteenth ACM Conference on Recommender Systems , Virtual Event, Brazil, September 22-26 , 2020 , ACM, 2020 , pp. 420 - 425 . URL: https://doi.org/10.1145/3383313.3412217. doi: 10 .1145/3383313.3412217.

[10]

Kammerer ,

Gerjets , How the interface design influences users' spontaneous trustworthiness evaluations of web search results: comparing a list and a grid interface , in: C. H. Morimoto , H. O.

Istance , A.

Hyrskykari , Q. Ji (Eds.), Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications , ETRA 2010 , Austin, Texas, USA, March 22 -24, 2010 , ACM, 2010 , pp. 299 - 306 . URL: https://doi.org/10.1145/1743666.1743736. doi: 10 .1145/1743666.1743736.

[11]

Zhao ,

Chang ,

F. M.

Harper ,

J. A.

Konstan , Gaze prediction for recommender systems , in: S. Sen,

Geyer ,

Freyne , P. Castells (Eds.), Proceedings of the 10th ACM Conference on Recommender Systems , Boston, MA, USA, September 15 - 19 , 2016 , ACM, 2016 , pp. 131 - 138 . URL: https://doi.org/10.1145/2959100.2959150. doi: 10 .1145/2959100.2959150.

[12]

Järvelin ,

S. L.

Price ,

L. M. L.

Delcambre ,

M. L.

Nielsen , Discounted cumulated gain based evaluation of multiple-query IR sessions , in: C. Macdonald , I.

Ounis , V.

Plachouras , I. Ruthven , R. W. White (Eds.), Advances in Information Retrieval , 30th European Conference on IR Research , ECIR 2008 , Glasgow, UK, March 30-April 3, 2008 . Proceedings, volume 4956 of Lecture Notes in Computer Science, Springer, 2008 , pp. 4 - 15 . URL: https://doi.org/10.1007/978-3- 540 -78646- 7 _4. doi: 10 .1007/978-3- 540 -78646-7\_4.

[13]

Felicioni ,

M. Ferrari

Dacrema ,

Cremonesi , A methodology for the ofline evaluation of recommender systems in a user interface with multiple carousels , in: J. Masthof , E.

Herder , N.

Tintarev , M. Tkalcic (Eds.), Adjunct Publication of the 29th ACM Conference on User Modeling, Adaptation and Personalization , UMAP 2021 , Utrecht, The Netherlands , June 21-25, 2021 , ACM, 2021 , pp. 10 - 15 . URL: https://doi.org/10.1145/3450614.3461680. doi: 10 .1145/3450614.3461680.

[14]

Ferrari Dacrema ,

Felicioni ,

Cremonesi , Optimizing the selection of recommendation carousels with quantum computing , in: Proceedings of the Fifteenth ACM Conference on Recommender Systems , 2021 . doi: 10 .1145/3460231.3478853.

[15]

Järvelin , J. Kekäläinen, IR evaluation methods for retrieving highly relevant documents , in: E. J. Yannakoudakis , N. J.

Belkin , P.

Ingwersen , M. Leong (Eds.), SIGIR 2000: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 24-28 , 2000 , Athens, Greece, ACM , 2000 , pp. 41 - 48 . URL: https://doi.org/10.1145/345508.345545. doi: 10 .1145/345508.345545.

[16]

Järvelin ,

Kekäläinen , Cumulated gain-based evaluation of IR techniques , ACM Trans. Inf. Syst . 20 ( 2002 ) 422 - 446 . URL: http://doi.acm. org/10 .1145/582415.582418. doi: 10 .1145/ 582415.582418.

[17] C. J. C. Burges , T.

Shaked , E.

Renshaw , A.

Lazier , M.

Deeds , N.

Hamilton , G. N.

Hullender , Learning to rank using gradient descent , in: L. D. Raedt , S. Wrobel (Eds.), Machine Learning, Proceedings of the Twenty-Second International Conference (ICML 2005 ), Bonn, Germany, August 7 - 11 , 2005 , volume 119 of ACM International Conference Proceeding Series, ACM, 2005 , pp. 89 - 96 . URL: https://doi.org/10.1145/1102351.1102363. doi: 10 .1145/1102351. 1102363.

[18]

Kanoulas ,

J. A.

Aslam , Empirical justification of the gain and discount function for ndcg , in: D. W. Cheung, I. Song,

W. W.

Chu ,

Hu ,

J. J.

Lin (Eds.), Proceedings of the 18th ACM Conference on Information and Knowledge Management , CIKM 2009 ,

Hong

Kong , China, November 2- 6 , 2009 , ACM, 2009 , pp. 611 - 620 . URL: https://doi.org/10.1145/1645953.1646032. doi: 10 .1145/1645953.1646032.

[19]

Zhou ,

Zha ,

Chang , G. Xue, Learning the gain values and discount factors of discounted cumulative gains , IEEE Trans. Knowl. Data Eng . 26 ( 2014 ) 391 - 404 . URL: https://doi.org/10.1109/TKDE. 2012 . 252 . doi: 10 .1109/TKDE. 2012 . 252 .

[20]

F. M.

Harper ,

J. A.

Konstan , The movielens datasets: History and context , ACM Trans. Interact. Intell. Syst . 5 ( 2016 ) 19 : 1 - 19 : 19 . URL: https://doi.org/10.1145/2827872. doi: 10 . 1145/2827872.

[21]

Ferrari Dacrema ,

Boglio ,

Cremonesi ,

Jannach , A troubling analysis of reproducibility and progress in recommender systems research , ACM Trans. Inf. Syst . 39 ( 2021 ). URL: https://doi.org/10.1145/3434185. doi: 10 .1145/3434185.

[22]

Sarwar , G. Karypis,

Konstan ,

Riedl , Item-based collaborative filtering recommendation algorithms , in: Proceedings of the 10th International Conference on World Wide Web (WWW '01) , 2001 , pp. 285 - 295 .

[23]

B. M.

Sarwar , G. Karypis,

J. A.

Konstan ,

Riedl , Item-based collaborative filtering recommendation algorithms , in: V. Y. Shen , N.

Saito , M. R.

Lyu , M. E. Zurko (Eds.), Proceedings of the Tenth International World Wide Web Conference, WWW 10 , Hong

Kong

, China, May 1- 5 , 2001 , ACM, 2001 , pp. 285 - 295 . URL: https://doi.org/10.1145/371920.372071. doi: 10 .1145/371920.372071.

[24]

Cooper ,

Lee ,

Radzik ,

Siantos , Random walks in recommender systems: exact computation and simulations , in: C. Chung , A. Z.

Broder , K.

Shim , T. Suel (Eds.), 23rd International World Wide Web Conference, WWW '14 , Seoul, Republic of Korea, April 7- 11 , 2014 , Companion Volume, ACM , 2014 , pp. 811 - 816 . URL: https://doi.org/10.1145/ 2567948.2579244. doi: 10 .1145/2567948.2579244.

[25]

Paudel ,

Christofel ,

Newell ,

Bernstein , Updatable, accurate, diverse, and scalable recommendations for interactive applications , ACM Trans. Interact. Intell. Syst . 7 ( 2017 ) 1: 1 - 1 : 34 . URL: https://doi.org/10.1145/2955101. doi: 10 .1145/2955101.

[26]

Cichocki ,

A. H.

Phan , Fast local algorithms for large scale nonnegative matrix and tensor factorizations , IEICE Trans. Fundam. Electron. Commun. Comput. Sci . 92- A ( 2009 ) 708 - 721 . URL: https://doi.org/10.1587/transfun.E92.A. 708 . doi: 10 .1587/transfun.E92.A. 708 .

[27]

Rendle ,

Freudenthaler ,

Gantner ,

Schmidt-Thieme , BPR: bayesian personalized ranking from implicit feedback , in: J. A. Bilmes , A. Y. Ng (Eds.), UAI 2009, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence , Montreal, QC, Canada, June 18-21, 2009 , AUAI Press, 2009 , pp. 452 - 461 . URL: https://dslpitt.org/uai/ displayArticleDetails.jsp ?mmnu=1&smnu=2&article_id=1630&proceeding _id= 25 .

[28]

Hu ,

Koren ,

Volinsky , Collaborative filtering for implicit feedback datasets , in: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008 ), December 15-19 , 2008 , Pisa, Italy, IEEE Computer Society, 2008 , pp. 263 - 272 . URL: https: //doi.org/10.1109/ICDM. 2008 . 22 . doi: 10 .1109/ICDM. 2008 . 22 .

[29]

Ning , G. Karypis, SLIM: Sparse linear methods for top-n recommender systems , in: Proceedings of the 11th IEEE International Conference on Data Mining (ICDM '11) , 2011 , pp. 497 - 506 .

[30]

Steck , Embarrassingly shallow autoencoders for sparse data , in: L. Liu,

R. W.

White ,

Mantrach ,

Silvestri , J. J. McAuley ,

Baeza-Yates , L. Zia (Eds.), The World Wide Web Conference, WWW 2019 , San Francisco, CA, USA, May 13 -17, 2019 , ACM, 2019 , pp. 3251 -