<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>X4SR: Post-Hoc Explanations for Session-based Recommendations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jyoti Narwariya</string-name>
          <email>R@20</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Priyanka Gupta</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Garima Gupta</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lovekesh Vig</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gautam Shrof</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>TCS Research</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>New Delhi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>India</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Session-based recommendation (SR) approaches have extensively employed deep neural networks (DNN) to provide high-quality recommendations based on a user's current interactions and item features. However, these approaches are black-box models, providing recommendations that are not understandable to the users and system designers. To further trust and transparency in the recommendation system and extrapolate insights into customer behavior, it is essential to provide explanations for why an item is recommended to a certain user. In this paper, we propose a novel post-hoc explainability method that provides explanations for a benchmark recommendation system NISER[1] at two levels; (i) Local explanations, where the method provides explanations for why an item is recommended in the current session, and (ii) Global explanations, where the method provides explanation for why an item is recommended in general across all sessions. Our method utilizes the learned item and session embeddings from the recommendation model in order to determine the most influential items for a recommendation. In contrast to using proxy models like LIME[2] or SHAP[3], utilizing the same model embeddings that were used for recommendations ensures that the explanations generated are of high fidelity and reflect the models' true behavior. Through quantitative evaluation on two publicly available datasets, we demonstrate that our approach is able to generate quality explanations in terms of salient items for a recommendation. To the best of our knowledge, our method is the first to provide a quantitative evaluation in terms of commonly used metrics for recommendation systems. We also demonstrate the value of providing verbalized explanations for various examples using LLMs1 to improve readability of explanations.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Session-based Recommendation</kwd>
        <kwd>Explainable Recommendation</kwd>
        <kwd>Post-hoc explanations</kwd>
        <kwd>Verbalized explanations</kwd>
        <kwd>LLMs</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Recommendation systems (RS) are an integral component of e-commerce, online advertising
and streaming applications allowing systems to provide relevant content, boost sales and
improve user experience. Our interest lies in SR systems [
        <xref ref-type="bibr" rid="ref1">4, 5, 6, 7, 1</xref>
        ] where the system has to
dynamically make recommendations based on current session interactions without any prior
user history. Nearest neighbour-based approaches like STAN[4] recommend items based on
similar prior sessions. Such methods are understandable and provide reasonable explanations.
However, there appears to be a trade-of between a model’s ability to learn complex user
behavior and its interpretability. Modern recommendation systems utilize high dimensional
latent features, i.e. item or session embeddings to achieve state-of-the-art performance [
        <xref ref-type="bibr" rid="ref1">5, 6,
7, 1</xref>
        ]. DNN-based SR approaches provide high-quality recommendations based on the user’s
current interactions and the items’ latent features, but at the cost of interpretability, trust,
and transparency. Generating explanations along with recommendations is essential to build
trust, and improve user satisfaction, while assisting system designers to rectify irrelevant
recommendations [8].
      </p>
      <p>
        In order to make DNNs interpretable, one prominent technique employed is to learn a less
complex proxy model to locally mimic and understand a DNNs behavior [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. However,
this requires additional training and the explanations are not guaranteed to mimic the exact
pattern of reasoning in the SR model. Another post-hoc approach [9] generates personalized
post-hoc explanations based on item-level causal rules to explain the behaviors of a sequential
recommendation model. However, it compromises on recommendation accuracy by constraining
the model to rely on the causal rules. CGSR [10] provided explanations on session and item
levels by generating a set of scores, i.e., causality and correlation scores. However, it can not be
applied to any other SR approach due to unavailability of causality scores. SSR [11] generates
explanations within a session by considering three factors: sequential patterns, repetition clicks,
and item similarities. However, it does not provide explanations at aggregate level (i.e. global
explanations). In contrast to using a proxy model or rules, our approach does not require
additional training nor compromises on recommendation accuracy. We propose using the
session and item representations learned from a DNN-based SR model to generate explanations
that are of high fidelity, are trustworthy, and reflect the models’ true behavior. Toward this, we
propose X4SR: Post-hoc Explanations for Session-based Recommendations. We demonstrate
the enhanced ability of X4SR to generate quality explanations in terms of explaining items at
two levels: Local explanations: explanation of recommended items for the current session,
and Global explanations: explanations for the recommended item at an aggregate level. Local
explanations are important for end-users/customers to trust the system, and global explanations
are useful for a business user to understand aggregated customer behavior. The generated
explaining items along with meta-information, i.e. current session (in case of local explanations)
and similar prior sessions (in case of local and global explanation) can be parsed and reasoned
over via LLMs to get verbalized explanations that are understandable to the user as well as the
system designer. While several explanation approaches have been proposed in the literature,
none of them evaluate the generated explanations quantitatively. X4RS provides quantitative
evaluation in terms of commonly used metrics in RS such as Recall and MRR.
      </p>
      <p>
        In this work, we employ NISER [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a well known session based recommendation benchmark
to validate our explainability approach, though it should be noted that X4RS is model agnostic
and can be employed for any embedding based SR model. We summarized the key contributions
as follows: (i) We propose a post-hoc method to generate explanations that reflect the models’
true behavior at two levels: Local and Global, (ii) To the best of our knowledge, our approach
is first to provide a quantitative evaluation in terms of commonly used metrics such as Recall
and MRR, and (iii) We provide verbalized explanations via LLMs to improve the readability of
explanations.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Proposed Approach</title>
      <sec id="sec-2-1">
        <title>2.1. Problem Setting</title>
        <p>
          Let  and  be the set of prior (train) sessions and current (test) sessions, respectively. We
consider ℐ to be the set of  items observed in set . Given any current session  ∈ ,
which is a sequence of  item-click events, ℐ = {,1, ,2, . . . , ,}, where , ∈ ℐ, the SR model
ℳ predicts a recommendation list of top  items, ℐ = {1, 2, . . . , } ⊂ ℐ . From a trained SR
model ℳ, we obtain learned item embedding i ∈ R for each item of ℐ and denote the item
embedding set as I. Similarly, we obtain learned session embedding as s ∈ R for all prior and
current sessions denoted by S and S, respectively. In this work, we consider the benchmark
SR model NISER [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] as ℳ.
        </p>
        <p>The goal for explainable SR approaches is to explain each recommended item  ∈ ℐ at the
session level (why item  is recommended in current session  ∈ ), as well as at a global
level (why item  is recommended across all user sessions).</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. X4RS: Post-hoc Explanation</title>
        <p>X4RS generates explanations in terms of explaining items for each recommendation using
learned latent embeddings of sessions and items S, S and I. The explanations are given at
two levels:</p>
        <p>A) Local Explanations: To generate explanations for a recommended item  ∈ ℐ for
session , we first compute cosine similarity between the current session embeddings s and
prior session embeddings for sessions where  is present in their history S. We thus obtain
, . All items in prior candidate sessions are
the top- most similar candidate prior sessions 
added to the candidate items set ℐ, . Then, we obtain relevant items 
, from the candidate
items set based on: i) pair-wise similarity between all the items in candidate item set, items-pair
with similarity greater than equal to threshold  , ii) items that occur most frequently in the
candidate items set. Further, explaining items from session  are the ones having maximum
similarity with the relevant items. We summarize the process in Algorithm 1.</p>
        <p>B) Global Explanations: To generate generalized explanations for item  , we first obtain

all the prior sessions in which  is present in the session history as , and cluster them using
DBSCAN [12]. We consider density-based clustering i.e. DBSCAN instead of distance-based
clustering e.g., -Means because it allows to learn clusters of arbitrary shape with no prior
knowledge of number of clusters. We estimate centroid of each cluster as average of embeddings
of sessions present in respective cluster. For each cluster , we obtain top- candidate prior
sessions , that are most similar to the centroid of the cluster . Further, we consider set of all
items that are present in candidate prior sessions as candidate items set ℐ, . Next, we obtain
explaining items based on pair-wise similarity between all the items in the candidate items set
(similarity ≥  ) and most frequent items in candidate item set. The explaining items for each
cluster corresponds to diferent user behavior patterns, which shows that diferent metapath are
responsible for positive interaction (e.g. click, buy, etc.) on item  . We summarize the process
in Algorithm 2.</p>
        <p>Algorithm 1 Local explanations</p>
        <p>Given recommended items ℐ, Item clicked history in session s as ℐ, learned item embeddings I, prior
and current sessions embeddings S and S.
for each current session  in  do
for each recommended item,  ∈ ℐ = {1, 2, ..., } do
 = {′ |  ∈ ℐ′ }, ∀′ ∈ 
Candidate prior sessions</p>
        <p>, = arg max((S, s))
Candidate items set ℐ, = ∪ {′ ∈ ℐ′ }, ∀′ ∈</p>
        <p>,
1 = Most frequent items across , , 1 ⊂ ℐ</p>
        <p>,
2 = {i’ | ((i’, I, )) ≥  }, ∀′ ∈ ℐ,</p>
        <p>, = 1 ∪ 2 − 
Relevant items 
, = (I, (X, ⊕ i ))
, = arg max′∈ℐ ,
end for
,</p>
        <p>Explaining items for session s,  = Most frequent items from  ∀  ∈ ℐ ◁ Here, we consider
top-2 frequency for selecting explaining items in session s.
end for
◁ ⊕ : concatenation</p>
        <sec id="sec-2-2-1">
          <title>Algorithm 2 Global explanations</title>
          <p>Given recommended items ℐ, Item clicked history in session s as ℐ, learned item embeddings I, prior
and current sessions embeddings S and S.
for Each item  = 1, 2, . . . ,  ∈ I do
 = {′ |  ∈ ℐ′ }, ∀′ ∈ 
Clusters  = DBSCAN(S, ,  _)
for c in clusters  do</p>
          <p>Centroid c = (S, ), where , ∈ 
Candidate prior session , = arg max((S, c))
Candidate items set ℐ, = ∪ {′ ∈ ℐ′ }, ∀′ ∈</p>
          <p>,
1 = Most frequent item across , , 1 ⊂ ,
2 = {i’ | ((i’, I, )) ≥  }, ∀′ ∈ ℐ</p>
          <p>,</p>
          <p>Explaining items , = 1 ∪ 2 - 
end for
end for</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Evaluation</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset Details</title>
        <p>We evaluate eficacy of X4RS on two publicly available datasets, i.e. Diginetica (DN) and
Amazon Musical Instruments (AMI). The DN1 dataset is a large-scale real-world transactional
data from CIKM Cup 2016 challenge. The AMI ratings dataset2 is a public dataset from Amazon,
which contains timestamped user-item interactions from May 1996 to Oct 2014 and metadata
contains items’ descriptions, categories, brands, etc. We follow [7] for dataset pre-processing.
For DN, we filter out items which have frequency less than 5, followed by removal of sessions</p>
        <sec id="sec-3-1-1">
          <title>1https://competitions.codalab.org/competitions/11161</title>
          <p>2https://nijianmo.github.io/amazon/index.html
of length 1. We consider sessions from last 1 week as test data. Finally, we consider 0.7 |
30, 574 sessions for training| testing with average session length 5.12 and 43, 097 items. For
AMI, we consider most frequent 10 users data, remove items with frequency less than 5. We
consider user’s transactions (i.e. users’ ratings) lying within 20 minutes as a session. In addition,
the sessions from last 1 day are used as the test data for AMI. Finally, we consider 18, 128|
6, 126 sessions for training| testing with average session length 6.45 and 2, 451 items. For both
datasets, we split the remaining data chronologically as a training set and validation set for
training and model selection purposes respectively. We filtered out all sessions of length less
than 3 from testing data for explanation that allows to obtain atleast 2 explaining items in the
session.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Evaluation</title>
        <p>
          We consider two evaluation settings: quantitative, and qualitative. For quantitative evaluation,
we remove/replace the explaining items generated by X4RS from test sessions and observe the
performance of the SR model. The idea is to validate if explaining items are necessary for the
SR model to recommend the item that was actually clicked or bought in the original test session.
We compare performance on: i) original test sessions (OTS), ii) by removing non-explaining
items (-NX), iii) by removing items based on popularity index (-P), where popularity index
of an item is calculated by dividing total sales/clicks of the item by total sales/clicks of all
items, iv) by removing the explaining items (-X) obtained from our approach, v) by replacing
the specific items with items that are at the highest distance based on cosine similarity, i.e.,
replacing explaining items (-X+F), and replacing popular items (-P+F). We use the standard
evaluation metrics Recall (R@) and Mean Reciprocal Rank (MRR@) as used in [
          <xref ref-type="bibr" rid="ref1">7, 1</xref>
          ]. R@
represents the proportion of test instances which has target item in the top- items. MRR@
is the average of reciprocal ranks of target item in the recommendation list. For qualitative
evaluation, we input explaining items along with metadata, current test sessions, and candidate
prior sessions to GPT-3 and obtain verbalized explanations that are understandable by business
users and system designers. We conducted a user study to validate the responses obtained
from GPT-3. We received feedback scores between 1 and 5 (higher the score better the quality
of generated text) from 20 users for 10 explanations each for both the levels, i.e., local and
global. Note: Post-hoc approaches [
          <xref ref-type="bibr" rid="ref2 ref3">2, 3, 9, 11</xref>
          ] in literature utilize items’ features for providing
explanations that are not available in this work. Therefore, we are unable to compare the
proposed approach with these approaches and consider alternate procedures to baseline.
        </p>
        <p>
          Hyperparameter Setup We use a hold-out validation set for model selection using Recall
(R@20) as the performance metric for all experiments in Table 2. Following [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], we use  = 100
and a learning rate of 0.001 with the Adam optimizer. We employ grid-search over  in {
0.65, 0.5, 0.4, 0.3, 0.25}. The best parameters on the validation set are  = 0.5 and 0.25 for
DN and AMI, respectively. While explaining items at the global level, we used  = 0.001 and

_ = 4 for || &gt;= 20 otherwise _ = 2 are the best on the validation
set. We use  = 5 for obtaining candidate prior sessions.
12.42 (-1%)
11.20 (-11%)
10.38 (-17%)
6.51(-48%)
4.40 (-65%)
        </p>
        <p>26.64
25.89 (-3%)
20.81 (-22%)
18.89 (-29%)
15.80 (-41%)
8.68 (-67%)</p>
        <p>AMI</p>
        <p>MRR@20 (% ↓)</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Results and Observations</title>
        <p>Quantitative: Table 2 and 1 show the performance for local and global explanation, respectively.
From table 2, we observe that by removing non-explaining items from test sessions (-NX),
Recall@20 (R@20) and MRR@20 are dropped by 1% and 3% as compared to OTS for DN and
AMI, respectively. Slight percentage drops indicate that non-explaining items are irrelevant to
recommend the target item. Further, if popular items are removed (-P), we observe considerable
drops i.e., 9% and 22% in R@20, and 11% and 35% in MRR@20, indicating popular items are
relevant. However, we observe significant percentage drops when explaining items are removed
(-X) i.e, R@20 by 16% and 29% and MRR@20 by 17% and 41% for DN and AMI, respectively.
This indicates that explaining items generated by our approach are crucial to recommend the
target item. Moreover, we observe a further drop in R@20 by 64% and 67%, and MRR@20
by 65% and 82% if explaining items are replaced with the least similar items out of all the
items (-X+F) due to additional noise. Also, drop in R@20 by 48% and 41%, MRR@20 by 48%
and 62% if popular items are replaced instead of explaining items. However, drops are better
in case of replacing explaining items. This further validates the eficiency of our approach to
generate explaining items. Similarly, from table 1, we observe significant percentage drops
while removing explaining items (-X) in terms of R@20 as 100%, 46%, 31% and 33%, 20%,
12% for Long-tail, Mid, Head item for DN and AMI, respectively that is significantly better than
removing non-explaining items (-NX) i.e., 0%, 7%, 15% and 0%, 6%, 7%. Also, it is comparable
We also observed similar drops if popular items are replaced instead, i.e., drops in R@20 are
100%, 72%, 69%, and 100%, 47%, 30% for DN and AMI, respectively. Similar percentage drops</p>
        <p>Qualitative Analysis: Case Study on Amazon Musical Instrument Dataset: We consider
Amazon dataset due to the availability of the meta information of the items, which is not the
case with Diginetica. First, we study why item “876: Thomastik-Infeld Accordion Accessory” of
brand “Thomastik-Infeld” is recommended in a session 16. Figure 1a shows pair-wise similarity
between candidate items set. The pairs with high similarity, i.e. 0.33 and 0.32 are [876, 1771]
and [717, 310], respectively. Hence, relevant items based on similarity are ‘1771: Line 6 Relay
G50 Wireless Guitar System’, ‘717: Pedaltrain MINI With Soft Case, Instrument Cable; Stage &amp;
Studio Cables; and ‘310: Fender F Neckplate Chrome’. The relevant items based on frequency
are as follows: “45: On-Stage Professional Grade Folding Orchestral Sheet Music Stand’, ‘310:
Fender F Neckplate Chrome’, ‘422: Classic Series Instrument Cable with Right Angle Plug’, ‘875:
Behringer Ultimate Guitar-to-USB Audio Interface’. Further, similarity between relevant items
and current session items is shown in figure 1b. We observe that explaining items in current
session ‘1214: Fender Precision Bass Pickups’ and ‘1631: Electric Guitar Bass Pickguard Screws’
is close to item 45, item 310 and recommend item 876. This is because they belong to the
guitar accessories category. From figure 1a and 1b, we can conclude that item 876 is recommended
in session 16 because explaining items and relevant items are related to musical accessories. We
obtained the similar verbalized explanations from GPT-3 as shown in figure 2a.</p>
        <p>Further, we study why an item “102: Phosphor Bronze Acoustic Guitar Strings, Custom Light,
which belongs to the ‘Acoustic Guitar Strings’ category and ‘D’Addario’ brand is recommended
in general. Figure 1c shows that it is similar to ‘255: Planet Waves Acoustic Guitar Quick-Release
System’, ‘974: Martin M Acoustic Guitar Bridge Pins’, ‘283: Snark SN1 Guitar Tuner’, ‘696:
Ernie Ball Earthwood Light Phosphor Bronze Acoustic String Set’ and ‘103: D’Addario Phosphor
Bronze Acoustic Guitar Strings, Medium’ with similarities 0.35, 0.35, 0.26 and 0.24 respectively,
i.e. all explaining items related to guitar accessories and all relevant prior sessions contains
same brand ‘D’Addario’ items. We observe similar explanations from GPT-3 as shown in figure
2b that item 102 is in the same category (Acoustic Guitar Strings) and same brand (D’Addario) as
the explaining item 103. Additionally, it is a lighter gauge string than the other items, which may
be more suitable for some customers.</p>
        <p>Further, we evaluate verbalised explanations using feedback scores from user-study. 67% of
users gave 4 and 5 scores for all global and local explanations and around 20% of users gave a
score of 3 for these verbalised explanations. On average local and global explainability scores
(out of 5) are 3.85 and 4.13, respectively.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>
        We highlighted the issue of trust and transparency in benchmark DNN-based recommendation
systems such as NISER[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We showed that in contrast to learn proxy models like [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ], which
require additional training, X4RS use learned item and session embeddings from NISER to
generate high fidelity, trustworthy explanations. We observed a significant drop in Recall (R @20)
and MRR@20 after removing explaining items from test sessions that validate the quality of the
explanations. Further, we observed that verbalized explanations obtained from GPT-3 improved
the readability of explanations for users and system designers. In future, we would like to
explore X4RS with other approaches such as NARM[13], GRU4Rec[5], STAMP[6], SASRec [14]
and CL4Rec[15].
[4] D. Garg, P. Gupta, P. Malhotra, L. Vig, G. Shrof, Sequence and time aware neighborhood
for session-based recommendations: Stan, in: Proceedings of the 42nd international
ACM SIGIR conference on research and development in information retrieval, 2019, pp.
1069–1072.
[5] B. Hidasi, A. Karatzoglou, L. Baltrunas, D. Tikk, Session-based recommendations with
recurrent neural networks, arXiv preprint arXiv:1511.06939 (2015).
[6] Q. Liu, Y. Zeng, R. Mokhosi, H. Zhang, Stamp: short-term attention/memory priority
model for session-based recommendation, in: Proceedings of the 24th ACM SIGKDD
international conference on knowledge discovery &amp; data mining, 2018, pp. 1831–1839.
[7] S. Wu, Y. Tang, Y. Zhu, L. Wang, X. Xie, T. Tan, Session-based recommendation with
graph neural networks, in: Proceedings of the Thirty-Third AAAI Conference on Artificial
Intelligence, 2019.
[8] Y. Zhang, X. Chen, et al., Explainable recommendation: A survey and new perspectives,
      </p>
      <p>Foundations and Trends® in Information Retrieval 14 (2020) 1–101.
[9] S. Xu, Y. Li, S. Liu, Z. Fu, X. Chen, Y. Zhang, Learning post-hoc causal explanations for
recommendation, arXiv preprint arXiv:2006.16977 (2020).
[10] C. Geng, H. Wu, H. Fang, Causality and correlation graph modeling for efective and
explainable session-based recommendation, arXiv preprint arXiv:2201.10782 (2022).
[11] J. Chen, W. Wu, W. Hu, W. Zheng, L. He, Ssr: Explainable session-based recommendation,
in: 2021 International Joint Conference on Neural Networks (IJCNN), IEEE, 2021, pp. 1–8.
[12] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., A density-based algorithm for discovering
clusters in large spatial databases with noise., in: kdd, volume 96, 1996, pp. 226–231.
[13] J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, J. Ma, Neural attentive session-based
recommendation, in: Proceedings of the 2017 ACM on Conference on Information and Knowledge
Management, 2017, pp. 1419–1428.
[14] W.-C. Kang, J. McAuley, Self-attentive sequential recommendation, in: 2018 IEEE
international conference on data mining (ICDM), IEEE, 2018, pp. 197–206.
[15] X. Xie, F. Sun, Z. Liu, S. Wu, J. Gao, J. Zhang, B. Ding, B. Cui, Contrastive learning
for sequential recommendation, in: 2022 IEEE 38th international conference on data
engineering (ICDE), IEEE, 2022, pp. 1259–1273.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Garg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Malhotra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vig</surname>
          </string-name>
          , G. Shrof, Niser:
          <article-title>Normalized item and session representations with graph neural networks</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>04276</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>" why should i trust you?" explaining the predictions of any classifier</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A unified approach to interpreting model predictions</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>