<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Utilising Crowdsourcing to Assess the Efectiveness of Item-based Explanations of Merchant Recommendations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Denis Krasilnikov</string-name>
          <email>N@K</email>
          <email>R@K</email>
          <email>d.i.krasilnikov@tinkoff</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleg Lashinin</string-name>
          <email>o.a.lashinin@tinkoff</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maksim Tsygankov</string-name>
          <email>M@K</email>
          <email>m.r.tsygankov@tinkoff.ru</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>MarinaAnanyeva</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SergeyKolesnikov</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop Proceedings</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lomonosov Moscow State University</institution>
          ,
          <addr-line>Ulitsa Kolmogorova, 1, bld. 52, Moscow, 119234, Russian Federation</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Research University Higher School of Economics</institution>
          ,
          <addr-line>Myasnitskaya Ulitsa, 20, Moscow, 101000, Russian Federation</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Tinkof</institution>
          ,
          <addr-line>2-Ya Khutorskaya Ulitsa, 38A, bld. 26, Moscow, 117198, Russian Federation</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>The explainability of recommendations is a common research topic among researchers and providers of recommender systems. Numerous approaches and inference types were developed in order to find explanations for recommendations. For example, we can send users the following recommendation with an explanation: ”Since you recently made a purchase from merchant X, we suggest you merchant Y”. A variety of methods can be used to produce the (X, Y) item pairs with this explanation logic. Despite this, some users might not understand the logical connection between the recommendation Y and explanation X. In this study, we validate 23,000 recommendation explanations with the help of 400 crowdworkers. Additionally, we suggest a novel method for evaluating the quality of the (X, Y) item pair explanations based on crowdworkers' responses. Finally, we evaluate 9 diferent approaches and produce interesting ifndings. We hope that, in future research, our method will be expanded upon and further studied for additional types of explanations and domains.</p>
      </abstract>
      <kwd-group>
        <kwd>recommender systems</kwd>
        <kwd>explainable recommendations</kwd>
        <kwd>evaluation study</kwd>
        <kwd>crowdsoursing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>These days, recommender systems are an integral part of many areas in people’s lives. They
are capable of influencing people’s choice of films, clothing, tourist destinations, food- and
health-related habits, and much more. As a rule, such algorithms leverage previous users’
actions in order to show current users items that can potentially be interesting to them. Content
that is personalized in such a way is attractive to users, driving them to interact more with
various online stores, dating services, music, videos, and others.</p>
      <p>
        Recent works highlight the main problems of merchant recommendations. For instance, some
banks provide merchant reward syste m1,s2[
        <xref ref-type="bibr" rid="ref3">, 3</xref>
        ]. When bank clients make transactions with
particular merchants, they receive cashback automatically. With many such ofers available,
personalization of the rewards section improves user p1r]o.fitIn[ order to do so, it is possible
to find the most relevant ofers for each individual user based on their transaction history. For
example, some recent works demonstrated experiment results on diferent real-world transaction
datasets4[
        <xref ref-type="bibr" rid="ref5 ref6">, 5, 6</xref>
        ]. These works proved that recommendation models are capable of accurately
predict users’ future behavior by analysing past transactions. However, it is important to note
providing explanations for such merchant recommendations is not well studied yet.
      </p>
      <p>In this paper, we research the problem of ofering users explanations as well as merchant
recommendations. Formally, we have a dataDsewtith transaction histories of anonymous users
with a number of merchants. Our task is to not only suggest the most appropriate merchants
for each user, but to also explain each personalized suggestion.</p>
      <p>There are various types of explanatio7n]sfo[r results provided by recommender systems. One
of them is item-based explanations, where textual patterns consist of a few items, connected by
certain conditions. For instance, we can show the user the following message: ”We recommend
you merchant Y because you purchased from merchant X”. This communication can be fully
defined by the (X, Y) merchant pair. Both of them should be represented in the dDa.tTahseet
item Y is received from a recommender modMel built on historical transactions. The item X
must be in the history of the user for whom we are providing the recommendation explanation.
Otherwise, the statement in the communication will obviously be wrong.</p>
      <p>We chose this method due to a number of advantages. Firstly, it does not require additional
knowledge about merchants. Secondly, it is quite simple to implement in an interface for testing
with real users. Finally, there are many diferent approaches and heuristics to retrieve the (X, Y)
merchant pairs. However, not all pairs may be valid. For example, a merchant pair consisting of
a bar and children’s store can be perceived negatively by real users. To avoid such situations,
we suggest pre-screening some merchant pairs using crowdsourcing platforms. If some of them
look questionable together when considered by real users, then they should be filtered out using
additional labeling. This idea is illustrateFdigiunre 1.</p>
      <p>In this paper, we research the validation→ofYXpairs for further use in recommender systems
for real clients. We provide an extensive survey for 23,000 merchant pairs. 400 crowdworkers
share their opinions as if they were seeing these pairs in a scenario with real recommender
systems. Based on the results of these surveys, we evaluate 9 approaches for explainable
recommendations. This helps us estimate the quality of recommendation explanations in ofline
experiments based on the feelings of real people. Specifically, the main contribution of this
paper can be listed as follows:
• We have asked 400 real people on a crowdsourcing platform to evaluate 23,000 pairs of
explanations. We share our results in an anonymized dataset.
• We describe a new way to evaluate explanations of merchant recommendations based
on users’ feedback. Our approach makes it possible to separate the development of
algorithms and the evaluation of explanation quality into independent processes. As
a result, we can collect user feedback once and then use it many times for diferent
approaches.
• We provide the results of experiments with 9 recommender algorithms as well as heuristics
that generate the most appropriate explanation pairs (X, Y). We demonstrate how the</p>
      <p>ranking quality of these pairs matches the opinions of real people.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Explanations based on X→ Y pairs can be considered a relatively well-studied topic in the
research community. For instance, iAL8S] [is capable of not only generating high-quality
recommendations9[], but can also provide X→ Y explanations. To do so, its optimization
algorithm learns the weights of item-item similarities. Moreover, iALS takes into account the
significance of interactions to each user. Another w1o0r]kd[emonstrated a way of mining
post-hoc explanations. This method is based on association rules and can be applied to any
latent factor recommender model. A recent st1u1d]yu[sed a causal rule learning model to
retrieve personal post-hoc explanations. Finally, there are studies that analyze the influence of
input data on output scor1e2s,[13]</p>
      <p>Crowdsourcing can be very helpful in the field of explainable recommendations. For instance,
crowdworkers can generate textual explanat1io4]n,sp[rovide information about cold start
items [15] and evaluate the explanations ofered for recommendati1o6n].sI[n a recent work
[17], crowdworkers helped improve the quality of recommendations via a human-in-the-loop
framework. The crowdsourced opinions helped increase accuracy of personalized suggestions.
Moreover, according t1o8[, 19], crowdworkers are capable of evaluating diferent explanations
in flexible experiment settings.</p>
      <p>
        To the best of our knowledge, explainable merchant recommendations are not well-studied
in the broad research community. Recent works tend to consider only the quality of merchant
recommendations4[
        <xref ref-type="bibr" rid="ref5 ref6">, 5, 6</xref>
        ]. However, we are of the opinion that the explainability of such
models is a topic that is worth investigating in future works.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset Collection</title>
      <p>In this section, we describe our approach to collecting and processing data.</p>
      <p>The TTRS dataset (Tinkof Transaction Recommender System) served as the source for all of
our data. This open-source dataset with detailed statistics was prov5]id.eTdTRinS [contains
real clients’ transactions with some merchants such as brands, retail chains and services. In
the open-source version of the dataset, each transaction contains the user id, m1e,rchant id
merchant category and transaction timestamp. As this dataset was provided by us, we can
enrich it with additional information. Concretely, merchant names of brands were used in our
crowdsoursing experiments.</p>
      <p>We chose only a se t  of top 350 most popular merchants. According to original data,
these merchants accounted for ab8o7u%tof all transactions. The specifications of this dataset
can be found inTable 1.</p>
      <p>Additionally, we create a square sparse matrwixith a shape of (350, 350) to represent the
”relevance” of items. In order to collect this data, we sampled around 23,000 unique (X, Y) pairs
randomly, wher e≠  and ,  ∈   . This number was determined by our study’s limited
budget. Since it is possible to create 349 * 349 = 121,801 pairs in total, we have selected about
19% of all possible pairs.</p>
      <p>
        To conduct our survey, we used an internal crowdsourcing platform at Tinkof. Since machine
learning algorithms require a large amount of labeled 2d0a]t,ama[chine learning developers
can use this platform for their needs. The Tinkof crowdsourcing platform makes it possible
to label datasets based on people’s opinions. We used this platform to ask crowdworkers the
following question: ”We recommend you merchant Y because you bought from merchant X.
Based on this explanation, do you think the model works correctly?”. This question
is preceded by a brief description of both merchants. In addition, respondents were given
instructions before starting to answer the questions. The instruction requested crowdworkers
to imagine a situation in which they actually made a purchase from merchant X. Afterwards,
they received a communication recommending merchant Y. The wording of the question was
chosen based on the analysis of previous wor1k8s, 1[
        <xref ref-type="bibr" rid="ref9">9, 16</xref>
        ].
      </p>
      <p>The basic intuition is that if the (X, Y) pair seems illogical, the crowdworker will answer in
the negative. If the pair is perceived as logical by the respondent, then we will get a positive
answer from them.</p>
      <p>During the experiment, we asked 400 people to respond to our questions. To improve the
quality of the experiment, we asked five diferent crowdworkers to answer every question. This
made it possible for us to determine how many votes are needed to consider the recommendation
explanation correct (3 out of 5, 4 out of 5, or 5 out of 5). We received 2,350 pairs where at least 3
1Both user id and merchant id columns are anonymized
out of 5 people said that the model works correctly. Thus, the cell of [m]a[ t]rix equals 1 if
3 out of 5 respondents label the pair as correct. Othe[r ]w[ i]se, is 0.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluating Explanation Quality</title>
      <p>In this section, we describe a new method of evaluati→ngYXexplanations. We summarize our
notation iTnable 2. Let’s assume that we have a set of u serasnd a set of item s= { 1, … ,  || }.
We have a training part of a dataset which can be represented by the set of transactions
 ,,
= {   ,
 ,  }, where</p>
      <p>,   ,   are the user, item and timestamp respectively. For simplicity,
we will consider only the following sub s e,,
t
⊂ {   ,
 ,  } such that it contains only the
maximum timestamps for each (,  ) pair. This makes it so that the recommender system should
only predict new merchants for users. The modeling of recurring transactions remains to be
researched in future works.</p>
      <p>A user interacted with a set ()of unique items ,,
= { ,1, 1, … ,  ,|()|,
() }, ordered
by timestamps. Let’s assume that there is a met htohdat can generate explainability scores
 ,,
= { ,1, , … ,  ,|()|,
}. If we have items and , and ,,
&lt;  ,, , we will be explaining the
recommendation of Y with it e.mIt is important to have valid pairs with higher scores and
invalid (illogical) pairs with the lower scores.</p>
      <p>Therefore, it is possible to compute quality ranking metrics u,n, derfor some subset of
(X, Y) pairs and users from. The proposed approach is described in detail in Algori1t.hTmhe
key idea of this method is to take all the items uisnetreracted with. Then, we leave only the
pairs of merchants that meet the following conditions: (a) Y was recommended, (b) X is in the
user’s purchase history, (c) (X, Y) pair is validated by the respondents. If forthuesrere are at
least two or more candidatesfor explaining each recommended it e m,we can sort these
and people’s opinions. Higher metric values prove that a me thcoadn efectively retrieve
candidates b y,</p>
      <p>, . Finally, quality ranking metrics compare the sorted lists of candidates
explanation pairs, while low values may indicate that users might dislike certain explanations
because they find them incorrect.</p>
      <p>Algorithm 1: The algorithm proposed for evaluating explanation
pairs</p>
      <p>Data: a ground truth matrix  , a set of transaction  ,, for each user from  ,
a trained recommender model  , a method  for explanation
generation, selected quality ranking metrics.</p>
      <p>Result: Calculated ranking quality metrics
1 foreach user  ∈</p>
      <p>do
generate top-K recommendations   with model  ;
foreach   ∈   do
compute  ,  ,  = { ,  ,  ∣  = 1, … ,  () ∧ [</p>
      <p>][  ] ∈ {0, 1}};
2
3
4
5
6
7
8
9
10
11
12
13
14
15 end
end
if | ,  ,  | = 0 then</p>
      <p>continue;
else
let  ,  = {[][  ] ∣  ∈ items from  ,, } ;
sort  ,  ,  and  ,</p>
      <p>by  ,  ,  ;
foreach metric ∈ metrics do</p>
      <p>calculate metric( ,  ,  ,  ,  );
end</p>
      <p>end</p>
    </sec>
    <sec id="sec-5">
      <title>5. Methods to Rank Explanation Pairs</title>
      <p>In this section, we briefly describe the methods of generating explanation pairs which we
included in our work.</p>
      <p>
        • Random. This method simply generates random sco r, ,
calculate the relative improvement of other approaches.
es . It is included in order to
• Chrono. Some sequential recommenders assume that a user’s future interactions are
caused by their recent interactio21n,s2[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Chrono is a heuristic approach that works
under this assumption. Specifically, it gives higher scores for the most recent items. The
last recent item has the highest scor(e) , because of the assumption that future test
in,
teractions are mostly caused by the last interaction. Form a,,lly, =let
      </p>
      <p>Here  is all possible dates for use.r
• MostPop. This baseline ranks items according to their popularity in the training part of
the dataset. This popularity is defined as the total number of transactions.
• PersonTop. In this approach, we calculate the personal frequencies of user interactions
with every item. The more a user purchased from a certain merchant, the higher the
value of ,, . If a user buys from two diferent merchants an equal number of times, the
order between them is defined by their popularity, similar to MostPop.

∈ 
1</p>
      <p>∈
( max ())
.
• Similar Category MostPop. Merchants in our dataset have categories. People may
consider it reasonable if they see merchants X and Y from one category. Therefore, we
can assign higher scores to merchants from the same category and lower to those from
2 +   , if X and Y belong to the same category
diferent ones. Formally,  = {</p>
      <p>, if X and Y are from diferent categories</p>
      <p>
        Where  ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] is the score from MostPop.
• Implicit ALS [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This model not only shows competitive performance on top-n
recommendations9[], but is capable of generating explanations for recommendations.
• Similar Category + iALS. This method is similar tSoimilar Category MostPop with
items sorted according to iALS scores.
• Association Rules. This method makes it possible to generate explanations for any
recommender model [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. There are diferent metrics to compute the rule. In our work,
we include confidence, support and leverage.
• EASE [23]. This approach is a shallow autoencoder that has an item-item weight matrix
 [ ][ ] . This matrix can be considered a method to calcu l,a, te. Formally, ,, =
 [ ][ ] ,
      </p>
      <p>It is important to note tRhaantdom, MostPop, Association Rules, EASE do not depend on
 ,, . They rank all items, and the relative order of items does not change if new interactions
are made. AlternativelCy,hrono, PersonTop, iALS, Similar Category + iALS take into
account the set o fand rank explanation pairs in person.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Experiments</title>
      <p>We use the feedback collected from the crowdworkers to evaluate the explanation quality of
diferent heuristics and algorithms. The choice of recommender model that provides
recommendations is not the focus of our work. Therefore, we take an MF-based metho8d], iALS [
which is a powerful recommender model for the top-n recommendation9t,a2s4k] [that is
capable of generating explanations for its recommendations. We used the last month of user
transactions as a test set and the penultimate month as a validation set to determine the best
model hypeparameters.</p>
      <p>To evaluate the ranking quality, we use standard metrics such as Recall@K, NDCG@K,
MAP@K. If the list of candidates is smaller than a part ic,uwlaersimply pad the end of the
list with zeros. Furthermore, we do not calculate ranking metrics for lists of candidates if they
lack any examples that can create a valid (X, Y) pair.
6.1. Results
The results of our experiments are provideTdaibnle 3. The rows are sorted by Recall@1.
Since it is dificult to explain the recommendation of any merchant Y by the most popular
merchant X,MostPop clearly performs the poorest at ordering explanationAsR.As
uses pairs of items, its results are slightly superioMrotsotPop. Chrono, AR  ,
SimCatMostPop, EASE andPersonParty take into account some logical heuristics, including
spectively. The best value is boldfaced.
MostPop
Random
 
Chrono
  
SimCatMostPop
EASE
PersonPop</p>
      <p>SimCatIALS
IALS
diferent item-to-item relations or the notion that customers frequently purchase from particular
merchants. Therefore, these approaches produced relatively good rAesRults.
provided
quality results because it takes into account both X and Y separately in addition to→the pair X
Y. However, iALS-based models take the top of the leaderboard. The most accurate ranking
was produced byimplicit ALS. An interesting point to consider is that this method was able
to generate rankings that were more accurate than category-basedSismorCtaintIgA. LS
performed worse thainALS, possibly due to the fact that two merchants in the same category can
have a diferent target audience.</p>
      <p>It is important to note that the best-perfoirAmLiSngmethod has a Recall@1 of 0.37. It
means that in 63% of cases, this model retrieves a pair (X, Y) that is labeled as incorrect by
crowdworkers. On the other hand, this result is about 20% better on the Recall@1 metric than
the results of MostPop.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Limitations and Future Work</title>
      <p>Our research has some limitations that we plan to overcome in future works. Firstly, the number
of available merchants can be very large and it can be expensive to label most of the provided
item pairs. This problem can be potentially addressed if we can find a way to predict the
respondent’s answers based on partial data labeling. Secondly, we did not study the use of
unlabeled pairs. For instance, the people’s opinions can be predicted by factorising the matrix
 . Finally, we considered only a small set of approaches for ranking explanation candidates.
Approaches with casual explanation2s5][is something to consider in future work.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>In this paper, we studied the validation of explanations for merchant recommendations. We
validated the explanation pairs using a crowdsourcing platform. This made it possible for us
to attempt a new approach to evaluating the quality of explanations of recommendations in
the ofline scenario. We also considered 9 diferent approaches for generating explanations
and compared them based on the data gathered from crowdworkers. The results of our
experiments have shown that even well-known approaches may generate invalid explanations
that are considered illogical by real users. We hope that this method will allow researchers to
develop explainable models and test them in an ofline scenario based on data gathered from
crowdsourcing.
[11] S. Xu, Y. Li, S. Liu, Z. Fu, X. Chen, Y. Zhang, Learning post-hoc causal explanations for
recommendation, arXiv preprint arXiv:2006.16977 (2020).
[12] W. Cheng, Y. Shen, L. Huang, Y. Zhu, Incorporating interpretability into latent factor
models via fast influence analysis, in: Proceedings of the 25th ACM SIGKDD International
Conference on Knowledge Discovery; Data Mining, KDD ’19, Association for Computing
Machinery, New York, NY, USA, 2019, p. 885–893. URLh:ttps://doi.org/10.1145/3292500.
3330857. doi:10.1145/3292500.3330857.
[13] V. W. Anelli, A. Bellogín, T. Di Noia, F. M. Donini, V. Paparella, C. Pomo, An analysis of
local explanation with lime-rs (2022).
[14] S. Chang, F. M. Harper, L. G. Terveen, Crowd-based personalized natural language
explanations for recommendations, in: Proceedings of the 10th ACM conference on recommender
systems, 2016, pp. 175–182.
[15] D.-G. Hong, Y.-C. Lee, J. Lee, S.-W. Kim, Crowdstart: Warming up cold-start items
using crowdsourcing, Expert Systems with Applications 138 (2019) 112813. UhRtLt:ps:
//www.sciencedirect.com/science/article/pii/S09574174193050.9d3oi:https://doi.org/
10.1016/j.eswa.2019.07.030.
[16] P. Kouki, J. Schafer, J. Pujara, J. O’Donovan, L. Getoor, Personalized explanations for
hybrid recommender systems, in: Proceedings of the 24th International Conference on
Intelligent User Interfaces, 2019, pp. 379–390.
[17] A. Ghazimatin, S. Pramanik, R. Saha Roy, G. Weikum, Elixir: learning from user feedback
on explanations to improve recommender models, in: Proceedings of the Web Conference
2021, 2021, pp. 3850–3860.
[18] K. Balog, F. Radlinski, Measuring recommendation explanation quality: The conflicting
goals of explanations, in: Proceedings of the 43rd international ACM SIGIR conference on
research and development in information retrieval, 2020, pp. 329–338.
[19] X. Chen, Y. Zhang, J.-R. Wen, Measuring ”why” in recommender systems: a comprehensive
survey on the evaluation of explainable recommendation, arXiv preprint arXiv:2202.06466
(2022).
[20] A. Drutsa, V. Farafonova, V. Fedorova, O. Megorskaya, E. Zerminova, O. Zhilinskaya,
Practice of eficient data collection via crowdsourcing at large-scale, arXiv preprint
arXiv:1912.04444 (2019).
[21] W.-C. Kang, J. McAuley, Self-attentive sequential recommendation, in: 2018 IEEE
international conference on data mining (ICDM), IEEE, 2018, pp. 197–206.
[22] R. He, J. McAuley, Fusing similarity models with markov chains for sparse sequential
recommendation, in: 2016 IEEE 16th international conference on data mining (ICDM),
IEEE, 2016, pp. 191–200.
[23] H. Steck, Embarrassingly shallow autoencoders for sparse data, arXiv preprint
arXiv:1905.03375 (2019).
[24] M. Ferrari Dacrema, S. Boglio, P. Cremonesi, D. Jannach, A troubling analysis of
reproducibility and progress in recommender systems research, ACM Transactions on
Information Systems (TOIS) 39 (2021) 1–49.
[25] S. Xu, Y. Li, S. Liu, Z. Fu, Y. Ge, X. Chen, Y. Zhang, Learning causal explanations for
recommendation, in: The 1st International Workshop on Causality in Search and
Recommendation, 2021.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N. Ranjbar</given-names>
            <surname>Kermany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pizzato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Scott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Leontjeva</surname>
          </string-name>
          ,
          <article-title>A multi-stakeholder recommender system for rewards recommendations</article-title>
          ,
          <source>in: Proceedings of the 16th ACM Conference on Recommender Systems</source>
          , RecSys '22,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          , p.
          <fpage>484</fpage>
          -
          <lpage>487</lpage>
          . URL:https://doi.org/10.1145/3523227.35473 8.8doi:
          <fpage>10</fpage>
          . 1145/3523227.3547388.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Neussner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ginina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kryvinska</surname>
          </string-name>
          , et al.,
          <article-title>Novel approaches to increasing customer loyalty: Example of “cashback” in austria</article-title>
          ,
          <source>J Fin Mark</source>
          .
          <year>2022</year>
          ;
          <volume>6</volume>
          (
          <issue>2</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          . 2
          <string-name>
            <given-names>J Fin</given-names>
            <surname>Mark</surname>
          </string-name>
          2022 Volume
          <volume>6</volume>
          <issue>Issue 2</issue>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Poleshchuk</surname>
          </string-name>
          ,
          <article-title>Increasing bank customers' loyalty through innovative loyalty programs (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Reibman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <article-title>Sequential recommendation model for next purchase prediction</article-title>
          ,
          <source>arXiv preprint arXiv:2207.06225</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kolesnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lashinin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pechatov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kosov</surname>
          </string-name>
          , Ttrs:
          <article-title>Tinkof transactions recommender system benchmark</article-title>
          ,
          <source>arXiv preprint arXiv:2110.05589</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Christensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Pcard: Personalized restaurants recommendation from card payment transaction records</article-title>
          ,
          <source>in: The World Wide Web Conference</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2687</fpage>
          -
          <lpage>2693</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kouki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schafer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pujara</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. O'Donovan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Getoor</surname>
          </string-name>
          ,
          <article-title>Personalized explanations for hybrid recommender systems</article-title>
          ,
          <source>in: Proceedings of the 24th International Conference on Intelligent User Interfaces</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>379</fpage>
          -
          <lpage>390</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Volinsky</surname>
          </string-name>
          ,
          <article-title>Collaborative filtering for implicit feedback datasets</article-title>
          , in: 2008 Eighth IEEE international conference on data mining, Ieee,
          <year>2008</year>
          , pp.
          <fpage>263</fpage>
          -
          <lpage>272</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rendle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Krichene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Y. Koren,
          <article-title>Revisiting the performance of ials on item recommendation benchmarks</article-title>
          ,
          <source>in: Proceedings of the 16th ACM Conference on Recommender Systems</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>427</fpage>
          -
          <lpage>435</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Peake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Explanation mining: Post hoc interpretability of latent factor models for recommendation systems</article-title>
          ,
          <source>in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>2060</fpage>
          -
          <lpage>2069</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>