<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.1145/3394486.3403226</article-id>
      <title-group>
        <article-title>Constancy in LIME-RS Explanations for Recom mendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vito Walter Anelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alejandro Bellogín</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tommaso Di Noia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Maria Donini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincenzo Paparella</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudio Pomo</string-name>
          <email>claudio.pomo@poliba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Environments (ComplexRec) Joint Workshop @ RecSys 2021</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Politecnico di Bari</institution>
          ,
          <addr-line>via Orabona, 4, 70125 Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidad Autónoma de Madrid</institution>
          ,
          <addr-line>Ciudad Universitaria de Cantoblanco, 28049 Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Università degli Studi della Tuscia</institution>
          ,
          <addr-line>via Santa Maria in Gradi, 4, 01100 Viterbo</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>particular Machine Learning task, the LIME approach</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>23</fpage>
      <lpage>27</lpage>
      <abstract>
        <p>Explainable Recommendation has attracted a lot of attention due to a renewed interest in explainable artificial intelligence. In particular, post-hoc approaches have proved to be the most easily applicable ones to increasingly complex recommendation models, which are then treated as black boxes. The most recent literature has shown that for post-hoc explanations based on local surrogate models, there are problems related to the robustness of the approach itself. This consideration becomes even more relevant in human-related tasks like recommendation. The explanation also has the arduous task of enhancing increasingly relevant aspects of user experience such as transparency or trustworthiness. This paper aims to show how the characteristics of a classical post-hoc model based on surrogates is strongly model-dependent and does not prove to be accountable for the explanations generated.</p>
      </abstract>
      <kwd-group>
        <kwd>explainable recommendation</kwd>
        <kwd>post-hoc explanation</kwd>
        <kwd>local surrogate model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The explanation of a recommendation list plays an
increasingly important role in the interaction of a user
with a recommender system: the pervasiveness of
economic interest and the inscrutability of most Artificial
countability in the behavior of the systems they interact
with. Given the explanation that a system can provide
to a user we identify at least two characteristics that the
explanation part should enforce [
        <xref ref-type="bibr" rid="ref6">1, 2, 3</xref>
        ]:
• Adherence to reality: the explanation should mention
only features that really pertain to the recommended
item. For instance, if the system recommends the
dation by saying “because it is a War Movie” since it is
by no means an adherent description of that movie;
• Constancy in the behavior: when the explanation is
generated based on some sample, and such a sample is
drawn with a probability distribution, the entire
process should not exhibit a random behavior to the user.
      </p>
      <sec id="sec-1-1">
        <title>For instance, if the explanation for recommending the movie “The Matrix” to the same user is first “because it is a Dystopian Science Fiction”, and then “because it</title>
        <p>Systems (KaRS) &amp; 5th Edition of Recommendation in Complex
can also be applied to recommendation. LIME-RS [5]
mendation task and can be considered in all respects as
a black-box explainer. This means that it generates an
explanation by drawing a huge number of (random) calls
to the system, collecting the answers, building a model
of behavior of the system, and then constructing the
explanation for the particular recommended item. While
the fact of adopting a black-box approach lets LIME-RS
to be applicable for every recommender system, the way
of building a model by drawing a huge random sample
of system behaviors makes it lose both adherence and
constancy, as our experiments show later on this paper.</p>
      </sec>
      <sec id="sec-1-2">
        <title>This suggests that the direct application of LIME-RS to recommender systems is not advisable, and that further research is needed to assess the usefulness of LIME-RS in explaining recommendations.</title>
      </sec>
      <sec id="sec-1-3">
        <title>The paper is organized as follows: Section 2 reviews</title>
        <p>the state of the art on explanation in recommendation; to provide an explanation opens the way to new
chalSection 3 details LIME to make the paper self-contained. lenges [1, 18, 19, 20].</p>
        <p>Section 4 shows the results of experiments with two main- There are two diferent approaches to address this type
stream recommendation models: Attribute Item-kNN of issue.
and Vector Space Model. We discuss the outcomes of the
experiments in Section 5, and conclude with Section 6.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>In recent years, the theme of Explanation in Artificial</title>
        <p>
          Intelligence has come to the foreground, capturing the
attention not only of the Machine Learning and related
communities – that deal more specifically with the
algorithmic part – but also of fields closer to Social Sciences,
such as Sociology or Cognitivism, which look with great
interest to this area of research [1]. The growing interest
in this area is also dictated by new regulations of both
Europe [6] and US [7] with respect to sensitive issues in
the field of personal data processing, and legal
responsibility. This trend has also touched the research field
of recommender systems [
          <xref ref-type="bibr" rid="ref2 ref31 ref42">8, 9, 10, 11</xref>
          ]. However, topics
such as explanation are by no means new to this field.
In fact, we can date back to 2014 the introduction of the
term “explainable recommendation” [12], although the
need to provide an explanation that accompanies the
recommendation is a need that emerged as early as 1999
by Schafer et al. [13], when people began trying to
explain a recommendation with other similar items familiar
to the user who received that recommendation.
        </p>
        <p>
          Catalyzation of interest around the topic of
explanation of recommendations coincides also with the
awareness achieved in considering metrics beyond accuracy
as fundamental in evaluating a recommendation
system [14, 15]. Indeed, all of the well-known metrics of
novelty, diversity, and serendipity are intended to
improve the user experience, and in this respect, a key role
is played by explanation [3, 16]. “Why are you
recommending that?”—this is the question that usually
accompanies the user when a suggestion is provided. Tintarev
and Masthof [
          <xref ref-type="bibr" rid="ref6">2</xref>
          ] detailed in a scrupulous way the
aspects involved in the process of explanation when we
talk about recommendation. They identified 7 aspects:
user’s trust, satisfaction, persuasiveness, eficiency,
efectiveness, scrutability, and transparency.
        </p>
        <p>This is the starting point to define Explainable
Recommendation as a task that aims to provide suggestions
to the users and make them aware of the
recommendation process, explaining also why that specific object
has been suggested. Gedikli et al. [3] evaluated
diferent types of explanations and drew a set of guidelines
to decide what the best explanation that should equip a
recommendation system is. This is due to the fact that
popular recommendation systems are based on Matrix
Factorization (MF) [17]; for this type of model, trying
• On the one hand, the model-intrinsic explanation
strategy aims to create a user-friendly
recommendation model or encapsulates an explaining
mechanism. However, as Lipton [21] points out, this
strategy will weigh in on the trade-of between
the transparency and accuracy of the model.
Indeed, if the goal becomes to justify
recommendations, the purpose of the system is no longer
to provide only personalized recommendations,
resulting in a distortion of the recommendation
process.
• On the other hand, we have a model-agnostic [22]
approach, also known as post-hoc [23], which
does not require to intervene on the internal
mechanisms of the recommendation model and
therefore does not afect its performance in terms
of accuracy.</p>
        <p>
          Most recommendation algorithms take an MF-approach,
and thus the entire recommendation process is based on
the interaction of latent factors that bring out the level of
liking for an item with respect to a user. Many post-hoc
explanation methods have been proposed for precisely
these types of recommendation models. It seems
evident that the most dificult challenge for this type of
approach lies in making these latent factors explicit and
understandable for the user [9]. Peake and Wang [23]
generate an explanation by exploiting the association
rules between features; Tao et al. [24] in their work, find
benefit from regression trees to drive learning, and then
explain the latent space; instead, Gao et al. [
          <xref ref-type="bibr" rid="ref1">25</xref>
          ] try a deep
model based on attention mechanisms to make relevant
features emerge. Along the same lines are Pan et al. [11],
who present a feature mapping approach that maps the
uninterpretable general features onto the interpretable
aspect features. Among other approaches to consider, [12]
proposes an explicit factor model that builds a mapping
between the interpretable features and the latent space.
On the same line we also find the work by Fusco et al.
[26]. In their work, they provide an approach to identify,
in a neural model, which features contribute most to the
recommendation. However, these post-hoc explanation
approaches turn out to be built for very specific
models. Purely model-agnostic approaches include the recent
work of Tsang et al. [27], who present GLIDER, an
approach to estimate interactions between features rather
than on the significance of features as in the original
LIME [
          <xref ref-type="bibr" rid="ref9">4</xref>
          ] algorithm. This type of solution is constructed
regardless of the recommendation model.
        </p>
        <p>Our paper focuses on the operation of LIME, a
modelagnostic method for a surrogate-based local explanation.</p>
      </sec>
      <sec id="sec-2-2">
        <title>When a user-item pair is provided, this model returns as</title>
        <p>an outcome of the explanation a set of feature weights,  () = argminℒ ( , ,   ) + Ω() (1)
for any recommender system. However, the recommen- ∈
dation task is very specific, so there is a version called where ℒ represents the fidelity of the surrogate model to
LIME-RS [5] that applies the explanation model tech- the original  , and  represents a particular instance of the
nique to the recommendation domain. In this way, any class  of all possible explainable models. Among all the
recommender is seen as a black box, so LIME-RS plays the possible models, the one most frequently chosen is based
role of a model-agnostic explainer whose result is a set on a linear prediction. In this case, an explanation refers
of interpretable features and their relative importance. to the weights of the most important interpretable
fea</p>
        <p>The goal of LIME-RS is to exploit the predictive power tures, which, when combined, minimize the divergence
of the recommendation (black box) model to generate from the black-box model. The function   measures the
an explanation about the suggestion of a particular item distance between the instance to be explained  ∈  , and
for a user. In this respect, it exploits a neighborhood the samples  ′ ∈  extracted from the training set to
drawn according to a generic distribution compared to train the model  . Finally, Ω() represents the complexity
the candidate item for the explanation. It seems obvious of the explanation model.
that the choice of the neighborhood plays a crucial role Two pieces of evidence make the application of LIME
within the process of explanation generation by LIME-RS. possible: (i) the existence of a feature space  on which
We can compare this sample extraction action to a per- to train the surrogate model of  , (ii) and the presence
turbation of the user-item pair we are using to generate of a surjective function that maps the space mentioned
the explanation. In the case of LIME-RS this perturba- above ( ) to the original space of instances ( ). Going
tion must generate consistent samples with respect to into more detail, we consider the fidelity function ℒ as
the source dataset. We see that this choice represents a the mean square deviation between the prediction for a
critical issue for all the post-hoc models which base their generic instance  ′ ∈  of the black-box model and that
expressiveness on the locality of the instance to explain. generated for the counterpart  ′ ∈  by the surrogate</p>
        <p>This trend is confirmed in several papers addressing model. Starting from these considerations we can express
this issue of surrogate-based explanation systems such as ℒ with the following formula:
LIME and SHAP [28]. In two recent papers, Alvarez-Melis
and Jaakkola [29] have shown how the explanations
generated with LIME are not very robust: their contribution ℒ ( , ,   ) = ∑   ( ′) ⋅ ( ( ′) −  ( ′))2 (2)
aims to bring out how small variations or perturbations  ′∈ , ′∈
in the input data cause significant variations in the expla- In the formula above   plays a fundamental role as
nation of that specific input [ 30]. In their paper, a new it expresses the distance between the instance to be
strategy is introduced to strengthen these methods by explained and the sampled instance used to build the
exploiting local Lipschitz continuity. By deeply inves- surrogate model. From a generic perspective, we can
tigating this drawback, they introduced self-explaining express this function as a kernel function like   =
models in stages, progressively generalizing linear classi- (−(,  ′)2/ 2), where  is any measure of distance.
ifers to complex yet architecturally explicit models. The full impact of this distance is captured when the</p>
        <p>Saito et al. [31] also explored this issue by turning their fidelity function also considers the transformation of the
gaze to diferent types of sampling to make the result of surrogate sample in the original space. As mentioned
an explanation generated through LIME more robust. In earlier, we consider a surjective function  that maps the
particular, in their work, they introduce the possibility original space into the feature space  ∶  →  . We
of generating realistic samples produced with a Genera- can also consider the function that allows us to move
tive Adversarial Network. Finally, Slack et al. [32] adopt in the opposite direction  −1 ∶  →  . At this point,
a similar solution in order to control the perturbation Equation (2) becomes:
generating neighborhood data points by attempting to
mitigate the generation of unreliable explanations while
maintaining a stable black-box model of prediction. ℒ ( , ,   ,  ) = ∑ ′∈   ( −1 ( ′)) ⋅ ( ( −1 ( ′)) −  ( ′))2 (3)</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Background Technology</title>
      <sec id="sec-3-1">
        <title>From a formal point of view, we can define a LIME</title>
        <p>generated explanation for a generic instance  ∈ 
produced by a model  as:</p>
      </sec>
      <sec id="sec-3-2">
        <title>From this last equation, we can grasp the criticality of</title>
        <p>the surjective mapping function. Indeed, the
neighborhood in  -space cannot be guaranteed with the
transformation in  -space. Thus, some samples selected to train
the surrogate model could not satisfy the neighborhood
criterion for which they were chosen.</p>
        <p>We must therefore stress on the centrality of the
sampling function: how do we extract the neighborhood of
our instance to be explained? If we look at the application lens Small [33], and Yahoo! Movies1. Their characteristics
of LIME to the recommendation domain, we can compare are shown in Table 1.
this sampling action to a local perturbation around our
instance  ; however, this perturbation aims to generate Table 1
 samples  ′, which might contain inconsistencies: as Characteristics of the datasets involved in the experiments.
an example, suppose we want to explain James’s feeling
about the movie The Matrix. The original triple of the in- Users Items Transactions Sparsity
stance to be explained associates to the user-item pair the Movielens 1M 6040 3675 797758 0,9640
genre of the movie (representing the explainable space) YMaohvoioel!eMnsoSvmiesall 7663160 88492990 16800640149 00,,99987553
and in this case it is of the type ⟨ ,  ℎ  ,  - ⟩ .</p>
        <p>A perturbation around this instance could generate
inconsistencies of the type ⟨ ,  ℎ  ,   ⟩ . For As for the choice of the models to be used in this work
this reason, in LIME-RS the perturbation considers only is concerned, we selected two well-known
recommendareal and not synthetic data. This choice is dictated by tion models that are able to exploit the information
conthe avoidance of the out-of-sample (OOS) process phe- tent of the items to produce a recommendation: Attribute
nomenon. Closely related to this problem predicted by Item kNN (Att-Item-kNN) and Vector Space Model (VSM).
OOS is that the interpretation examples selected in LIME- The two chosen models represent the simplest solution
RS represent the ability to capture the locality through to address the recommendation problem by exploiting
disturbance mechanisms efectively. One of the disad- the content associated with the items in the catalog.
vantages of LIME-like methods is that they sometimes Att-Item-kNN exploits the characteristics of
fail to estimate an appropriate local replacement model neighborhood-based models but expresses the
represenbut instead generate a model that focuses on explaining tation of the items in terms of their content and, based
the examples and is also afected by more general trends on this representation, it computes a similarity between
in the data. users. Starting from this similarity and exploiting</p>
        <p>This issue is central to our work, and it involves two the collaborative contribution in terms of interactions
aspects: (i) the first one concerns the sampling function between users and items, Att-Item-kNN tries to estimate
of the samples precisely. In the LIME-RS implementation, the level of liking of the items in the catalog. VSM
this function is driven by the popularity distribution of represents both users and items in a new space to
the items within the dataset. (ii) The second critical issue link users and items to the considered information
concerns the model’s ability to wittily discriminate the content. Once obtained this new representation, with
user’s taste from the neighborhood extracted to build the an appropriate function of similarity, VSM estimates
surrogate model. A model that squashes too much on bias which are the most appealing items for a specific user.
or is inaccurate cannot bring out the peculiarities of user The implementation of both models are available in the
taste that are critical in building the explainable model ELLIOT [34] evaluation framework. This benchmarking
which are, in turn, useful in generating the explanation framework was used to select the best configuration
for the instance of interest. for the two recommendation models by exploiting the</p>
        <p>These observations dictate the two research questions corresponding configuration file 2.
that motivated our work: Our experiments start by selecting the best
configurations based on nDCG [35, 36] for the two models on
RQ1 Can we consider the surrogate-based model on the considered datasets. Then, we generate the top-10
which LIME-RS is built to generate always the list of recommendations for each user, and we take into
same explanations, or does the extraction of a dif- account the first item  1 on these lists for each user  .
Fiferent neighborhood severely impact the system’s nally, each recommendation pair (,  1) is explained with
constancy? LIME-RS. The explanation consists of a weighted vector
RQ2 Are LIME-RS explanations adherent to item con- (,  )  where  is the genre of the movies in the dataset
tent, despite the fact that the sampling function – i.e., the features – and  is the weight associated to 
is uncritical and based only on popularity? by LIME-RS within the explanation. Then, this vector is
sorted by descending weights. In this way, the genres of
the movies which play a key role within the
recommen4. Experiments dation, as explained by LIME-RS, are highlighted at the
ifrst positions of the vector. These operations are then
This section is devoted to illustrating how the experimen- repeated  = 10 times and changing the seed each time,
tal campaign was conducted. The datasets used for this
phase of experimentation are Movielens 1M [33],
Movie</p>
      </sec>
      <sec id="sec-3-3">
        <title>1R4 - Yahoo! Movies User Ratings and Descriptive Content</title>
        <p>Information, v.1.0 http://webscope.sandbox.yahoo.com/.</p>
        <p>2https://tny.sh/basic_limers
as 10 is likely to be a good choice to detect a general pat- Table 2
tern in the behavior of LIME-RS. At this point, for each
pair (,  1), we have a group of 10 explanations ordered
by descending values of  , which we exploit to answer
our two research questions.
frequency   
RQ1. Empirically, since in a real scenario of
recommendation a too verbose explanation is not useful, we consider
only the first five features in the sorted vector
representing the explanation of each recommendation. In order to
verify the constancy of the behavior of LIME-RS, given a
(,  1) pair, we exploit the  previously generated
explanations for this pair. Then for  = 1, 2, … , 5 , we define
  as the multiset of genres that appear in  -th position
– for instance, if “Sci-Fi” occurs in the first position of 7
explanations, then “Sci-Fi” occurs 7 times in the multiset
 1, and similarly for other genres and multisets. Then,
we compute the frequency of genres in each position as
follows: given a position  , a genre  , and the number
 of generated explanations for a given pair (,  1), the
of  in  -th position is computed as:
   =
||{ |  ∈</p>
        <p>}||

where || ⋅ || denotes the cardinality of a multiset. Then, all
this information is collected for each user in five lists —
one for each of the  positions — of pairs ⟨, 
frequency. One can observe that the computed frequency
is an estimation of the probability that a given genre is
put in that position within the explanation generated by
LIME-RS sorted by values. Hence, the pair ⟨, max (   )⟩
describes the genre with the highest frequency in the
 -th position of the explanation for a pair (,  1). Finally,
it makes sense to compute the mean   of the highest
probability values in each position  of the explanations
for each pair (,  1). Formally, by setting a position  , the
  ⟩ sorted by
mean   is computed as:
  =</p>
        <p>| |
∑
=1 max (   )
| |
(4)
(5)</p>
      </sec>
      <sec id="sec-3-4">
        <title>RS concerning the  -th feature.</title>
        <p>where  is the set of users for whom it was possible to
generate a recommendation for. Observing the value
of   , we can state to what extent LIME-RS is constant
in providing the explanations until the  -th feature: the
higher the value of   , the higher the constancy of
LIME</p>
      </sec>
      <sec id="sec-3-5">
        <title>By looking at Table 2, we can see that for Att-ItemkNN the LIME-RS explanation model is reliable as long as it considers at most three features in the weighted</title>
        <p>Constancy of LIME-RS. A value equal to 0 means that the
genre(s) provided by LIME-RS in the first  position(s) is always
diferent (worst case: completely inconstant behavior); A value
equal to 1 means that the genre(s) provided by LIME-RS in
the first  position(s) is always the same (total constancy).</p>
        <p>Movielens 1M
Movielens Small
Yahoo! Movies
Movielens 1M
Movielens Small
Yahoo! Movies
the set   of its first  genres with the set of genres   1
characterizing the first recommended item, for  = 1, 2, 3 .</p>
        <p>Upon completion of this operation for all the 
explanations generated for each (,  1) pair, we computed the
number of times we obtained an empty intersection of
these sets, normalized by the total number of
explanations  × | | , in order to understand to what extent an
explanation is (not) adherent to the item. Formally, for a
given value of  , the value ℎ</p>
        <p>is computed as:
ℎ 
 =
∑
×| |
=1
[(  ∩   1) = ∅]
 × | |</p>
        <p>(6)
where  is the set of users of the dataset for whom it
was possible to generate a recommendation,  is the
vector presented as an explanation of the recommen- number of generated explanations for each pair (,  1),
dation. Extending the explanation to four features, we
have a constancy that falls below 65%, while arriving at
an explanation with five features is more likely to run
into explanations that exhibit an unacceptably random
behavior. On the other hand, we can see that for VSM
and by Σ[⋯] we mean that we sum 1 if the condition
inside [⋯] is true, and 0 otherwise. One can note that
ℎ 
case in which for none of the  explanations under
consideration at least one genre of the item is in the first 
 ∈ [0, 1], where a value of 1 indicates the worst
features of the explanation. In contrast, the lower the (except for the first feature with Movielens 1M) better
pervalue of ℎ   , the higher the adherence of LIME-RS. formance than Att-Item-kNN in terms of constancy, with
peaks up to 97%. A straightforward consequence of these
Table 3 observations could be analyzed in terms of confidence or
Adherence of LIME-RS. For value equals to 1 no genre provided probability. If the constancy steadily decreases, it means
by LIME-RS in the first  real genres of the movie (worst case); that the probability that LIME-RS suggests the same
exFor value equals to 0 at least one genre provided by LIME-RS planatory feature decreases. In practical terms, we could
in the first  genres is always among the real genres of the say that LIME-RS is less confident about its explanation.
movie. In fact, this is the behavior of Att-Item-kNN. Conversely,
ℎAtt-Item1 -kNℎN 2 ℎ 3 ”VdSeMtersmhoinwissthicig”hbevhalauveisoro.f WconitshtaVnScMy, ,reLsIuMltEin-RgSinisa mmoorree
confident of its explanations. This could increase user’s
MMoovviieelleennss 1SMmall 00,,22377644 00,,01610551 00,,00148880 trustworthiness, since LIME-RS behavior is more reliable.
Yahoo! Movies 0,3597 0,1202 0,0476 However, these results could also be interpreted
toVSM gether with the ones from Table 3. They show how</p>
        <p>often at least one feature – out of  features provided
MMoovviieelleennss 1SMmall 00,,45338547 00,,12657349 00,,01400838 by LIME-RS– adheres to the features that describe the
Yahoo! Movies 0,1013 0,01348 0,0021 item being explained. In other words, they measure the
probability that LIME-RS succeeds in reconstructing at
least one feature of a specific item. Combining the
re</p>
        <p>Observing the results from Table 3, Att-Item-KNN per- sults of Table 2 and those of Table 3, Att-Item-kNN, as
forms well in terms of adherence since, in approximately already mentioned, shows good performance regarding
75% of cases, even considering only the main feature of adherence and identifies 3 times out of 4 the first
fundathe explanation, it falls into the set of the item genres, mental feature of the explanation among those present
as for Movielens dataset family. This performance is in the set of features originally associated with the item.
a 10% lower for Yahoo! Movies. In contrast with this As expected, if the number  of LIME-RS-reconstructed
result, VSM shows poor performances on both dataset features increases, the number of times such a set has a
of the Movielens family, by failing half the time about nonempty intersection (with the features belonging to
Movielens 1M as regards adherence. A surprising result the item) – i.e., adherence – increases. It could be noted
is achieved for Yahoo! Movies dataset because, enlarging that Att-Item-kNN on Yahoo! Movies shows the worst
the study to the first three features among the explana- behavior in terms of adherence. VSM shows a diferent
tion, the error is almost completely absent. The reasons behavior. Despite the excellent performance regarding
we found to explain this diference in the performances constancy, it could be observed that on both Movielens
concern the characteristics and the quality of the dataset, datasets, the performance in terms of adherence is poor,
as we highlight later on. and worse for Movielens 1M than for Movielens Small.
Surprisingly, on Yahoo! Movies, VSM performs much
5. Discussion better, and the errors are almost negligible.
The diference between the two models could be due
This work investigates how well a post-hoc approach to many reasons. In the following we analyze possible
based on local surrogates – such as the LIME-RS algo- relations between such behaviors and two of them:
poprithm – explains a recommendation. Instead of studying ularity bias in the dataset and characteristics of side
inthe impact of explanations on users (that is a well-studied formation. On the one hand, if the dataset is afected
topic in the literature and is beyond our scope), we fo- by popularity bias, it would be a well-studied cause of
cus on objective evidences that could emerge. In this confusion for LIME-RS. On the other hand, the
characterrespect, we have designed specific experiments, which istics of the side information associated with the datasets
introduced two diferent metrics, to evaluate adherence could dramatically influence the performance of the two
and constancy for this kind of algorithms. For instance, recommendation models. To assess these hypotheses, we
Table 2 shows a diferent behavior for Att-Item-kNN and have evaluated (see Table 4) the recommendation lists
VSM. On the one hand, Att-Item-kNN seems to guarantee produced by Att-Item-kNN and VSM considering nDCG,
a good constancy in explanations up to the third feature. Hit Rate (HR), Mean Average Precision (MAP), and Mean
This suggests that an explanation that exploits the first Reciprocal Rank (MRR). Table 4 shows that the chosen
three features of the list produced by LIME-RS could be datasets are strongly afected by popularity bias. Indeed,
barely considered as reliable (i.e., reaching a constancy of MostPop is the best performing approach, and the two
0.69 on Movielens 1M). On the other hand, VSM exhibits ”personalized” models fail to produce accurate results.
a much more ”stable” behavior, demonstrating in all cases This triggers the second aspect that concerns the quality
Table 4 ommendation scenario. We propose two diferent
meaResults of the experiments on the models involved in the sures to understand how reliable an explanation based
experiments. Models are optimized according to the value of on LIME-RS is: (i) constancy was used to assess the
imnDCG. pact of the random sampling phase of LIME-RS on the
model nDCG Recall HR Precision MAP MRR provided explanation – ideally the explanation should</p>
        <p>Movielens 1m remain constant in spite of the sample used to obtain
RMaonsdtPomop 00,,00804551 00,,00307298 00,,40584689 00,0,100948 00,0,101954 00,,20220654 it; (ii) adherence was proposed to understand the
reconAtt-Item-kNN 0,0229 0,0165 0,2425 0,0383 0,0387 0,0888 structive power of LIME-RS with respect to the features
VSM 0,0173 0,0109 0,2106 0,0292 0,0306 0,0741 that belong to the item involved in the explanation –
ide</p>
        <p>Movielens Small ally, LIME-RS should provide an explanation that always
RMaonsdtPomop 00,,00701350 00,,00308193 00,,30940922 00,,00704489 00,,00901628 00,,10926015 adheres to the actual features of the recommended item.
Att-Item-kNN 0,0124 0,0068 0,1459 0,0197 0,0191 0,0484 To test both constancy and adherence, we trained and
VSM 0,0085 0,0056 0,1000 0,0111 0,0123 0.0350 optimized two content-based recommendation models:
Random 0,0005 0,00Y0a8hoo!0M,00o5v1ies 0,0005 0,0005 0,0015 Attribute Item-kNN (Att-Item-kNN), and a classical
VecMostPop 0,2188 0,2589 0,596 0,1067 0,1501 0,3447 tor Space Model. For each model, and for all datasets
AVtStM-Item-kNN 00,,00123115 00,,00127612 00,,01715948 00,,00018312 00,,00019525 00,,00246315 exploited in the study, we generated recommendation
lists for all users. We exploited the first item of these
top-10 lists to produce the explanations that were then
of the content. The results suggest that the side infor- the subject of our investigation. It turned out that for
mation is not good enough to boost the recommenda- models built with a large collaborative input such as
tion systems in producing meaningful recommendations. Att-Item-kNN, LIME-RS produces fairly constant
explaIn fact, the three datasets seem to have an informative nations up to a length of three features. Moreover, these
content that is not adequate to generate appealing rec- explanations turn out to be adherent with respect to the
ommendations. We observe that, from an informative item between 65% and 75% of the cases in which only
point of view, the Yahoo! Movies dataset is slightly more the first feature of the weighted vector of explanations
complete: 22 genres against the 18 genres available on is considered. VSM shows a diferent behavior where
Movielens. Although the VSM model does not show ex- explanations are much more constant, but sufer a lot in
cellent performance, in combination with LIME-RS, it terms of adherence, except for the Yahoo! Movies dataset
provides explanations that are very reliable in terms of for which the explanation model showed outstanding
constancy (see Table 2) and adherence (see Table 3) to performance despite the poor ability of VSM to provide
the actual content of the items being explained. sound recommendations to users.
From the designer perspective, there is also a prag- In our experiments, some evidence started to emerge
matic way to look at the experimental results. Suppose a highlighting that the adopted explanation model is
condideveloper needs an of-the-shelf way of generating expla- tioned not only by the accuracy of the black-box model it
nations for recommendations, and chooses LIME-RS to tries to explain but also by the quality of the side
informado that. Our results suggest that if the explainer employs tion used to train the model. The latter result deserves to
a Movielens dataset with Att-Item-kNN model, then it be adequately investigated to search for a link at a higher
is better to run the explainer several times. Indeed, the level of detail. We plan to apply our experiments also to
ifrst feature obtained for the explanation could change other recommendation models, to see whether the
probaround 1 time every 5 trials (first column of Table 2), lems with adherence and constancy that we found for the
and once such a feature is obtained, it is better to check two tested models show up also in other situations. We
whether this feature is really among the ones describing will also investigate what impact structured knowledge
the item, since 1 time out of 4 the feature can be wrong has on this performance by exploiting models capable of
(first column of Table 3). Moreover, if the explainer em- leveraging this type of content. In addition, it would also
ploys the Yahoo! Movies dataset with VSM model, then be the case to try diferent reference domains with richer
probably there is no need to run the explainer twice, since datasets of side information to understand what impact
its behavior is constant 97% of the times, while the fea- content quality has on this type of explainer.
ture is wrong only 10% of the times. However, the low
performance of such a model is to be taken into account.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>The authors acknowledge partial support of
PID20196. Conclusion 108965GB-I00, PON ARS01_00876 BIO-D, Casa delle
Tecnologie Emergenti della Città di Matera, PON ARS01_00821
In this paper we shed a first light on the efectiveness FLET4.0, PIA Servizi Locali 2.0, H2020 Passapartout - Grant n.
of LIME-RS as a black-box explanation model in a rec- 101016956, PIA ERP4.0, and IPZS-PRJ4_IA_NORMATIVO.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          25,
          <year>2019</year>
          , ACM,
          <year>2019</year>
          , pp.
          <fpage>765</fpage>
          -
          <lpage>774</lpage>
          . URL: https://doi.org/ detection, in: 8th International Conference on Learn-
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          10.1145/3331184.3331254. doi:
          <volume>10</volume>
          .1145/3331184.3331254. ing Representations,
          <source>ICLR</source>
          <year>2020</year>
          ,
          <string-name>
            <given-names>Addis</given-names>
            <surname>Ababa</surname>
          </string-name>
          , Ethiopia, [20]
          <string-name>
            <given-names>G.</given-names>
            <surname>Cornacchia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Donini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Narducci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pomo</surname>
          </string-name>
          ,
          <source>April 26-30</source>
          ,
          <year>2020</year>
          , OpenReview.net,
          <year>2020</year>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>mendation for enterprise decision support systems</article-title>
          , in: [28]
          <string-name>
            <given-names>E.</given-names>
            <surname>Strumbelj</surname>
          </string-name>
          , I. Kononenko, Explaining prediction
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>tion Systems Engineering</surname>
          </string-name>
          Workshops - CAiSE 2021 Inter- tributions, Knowl. Inf. Syst.
          <volume>41</volume>
          (
          <year>2014</year>
          )
          <fpage>647</fpage>
          -
          <lpage>665</lpage>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>national Workshops</surname>
            , Melbourne,
            <given-names>VIC</given-names>
          </string-name>
          , Australia, June 28 https://doi.org/10.1007/s10115-013-0679-x. doi:
          <volume>10</volume>
          .1007/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>- July 2</source>
          ,
          <year>2021</year>
          , Proceedings, volume
          <volume>423</volume>
          of Lecture Notes s10115-
          <fpage>013</fpage>
          - 0679- x.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>in Business Information Processing</source>
          , Springer,
          <year>2021</year>
          , pp. [29]
          <string-name>
            <given-names>D.</given-names>
            <surname>Alvarez-Melis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Jaakkola</surname>
          </string-name>
          ,
          <string-name>
            <surname>Towards</surname>
          </string-name>
          ro-
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          39-
          <fpage>47</fpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -79022-6
          <article-title>_ bust interpretability with self-explaining neural net-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          4. doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>030</fpage>
          - 79022- 6\_4. works, in: S. Bengio,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Wallach</surname>
          </string-name>
          , H. Larochelle, [21]
          <string-name>
            <given-names>Z. C.</given-names>
            <surname>Lipton</surname>
          </string-name>
          ,
          <article-title>The mythos of model interpretability, Com-</article-title>
          K. Grauman,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cesa-Bianchi</surname>
          </string-name>
          , R. Garnett (Eds.), Ad-
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>mun. ACM</source>
          <volume>61</volume>
          (
          <year>2018</year>
          )
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          . URL: https://doi.org/10.1145/ vances in Neural Information Processing Systems
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          3233231. doi:
          <volume>10</volume>
          .1145/3233231. 31: Annual Conference on Neural Information Pro[22]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <source>A cessing Systems</source>
          <year>2018</year>
          ,
          <article-title>NeurIPS 2018</article-title>
          , December 3-
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>reinforcement learning framework for explainable rec- 8</source>
          ,
          <year>2018</year>
          , Montréal, Canada,
          <year>2018</year>
          , pp.
          <fpage>7786</fpage>
          -
          <lpage>7795</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>ommendation, in: IEEE International Conference on URL: https://proceedings.neurips.cc/paper/2018/hash/</mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Data</given-names>
            <surname>Mining</surname>
          </string-name>
          ,
          <string-name>
            <surname>ICDM</surname>
          </string-name>
          <year>2018</year>
          , Singapore, November
          <volume>17</volume>
          -
          <issue>20</issue>
          ,
          <fpage>3e9f0fc9b2f89e043bc6233994dfcf76</fpage>
          -
          <lpage>Abstract</lpage>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <year>2018</year>
          , IEEE Computer Society,
          <year>2018</year>
          , pp.
          <fpage>587</fpage>
          -
          <lpage>596</lpage>
          . URL: [30]
          <string-name>
            <given-names>D.</given-names>
            <surname>Alvarez-Melis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Jaakkola</surname>
          </string-name>
          ,
          <article-title>On the robustness of</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          https://doi.org/10.1109/ICDM.
          <year>2018</year>
          .
          <volume>00074</volume>
          . doi:
          <volume>10</volume>
          .1109/ interpretability methods, CoRR abs/
          <year>1806</year>
          .08049 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>ICDM.</surname>
          </string-name>
          <year>2018</year>
          .00074. URL: http://arxiv.org/abs/
          <year>1806</year>
          .08049. arXiv:
          <year>1806</year>
          .
          <volume>08049</volume>
          . [23]
          <string-name>
            <given-names>G.</given-names>
            <surname>Peake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          , Explanation mining: Post hoc in- [31]
          <string-name>
            <given-names>S.</given-names>
            <surname>Saito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Capel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hu</surname>
          </string-name>
          , Improving LIME
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <article-title>tion systems</article-title>
          , in: Y.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Farooq</surname>
          </string-name>
          (Eds.), Proceed- abs/
          <year>2006</year>
          .12302 (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>ings of the 24th ACM SIGKDD International Confer- 12302</source>
          . arXiv:
          <year>2006</year>
          .12302.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>ence on Knowledge Discovery &amp; Data Mining</article-title>
          , KDD [32]
          <string-name>
            <given-names>D.</given-names>
            <surname>Slack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hilgard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lakkaraju</surname>
          </string-name>
          , Fooling
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          2018, London, UK,
          <year>August</year>
          19-
          <issue>23</issue>
          ,
          <year>2018</year>
          , ACM,
          <year>2018</year>
          ,
          <article-title>LIME and SHAP: adversarial attacks on post hoc expla-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          pp.
          <fpage>2060</fpage>
          -
          <lpage>2069</lpage>
          . URL: https://doi.org/10.1145/3219819. nation methods, in: A. N.
          <string-name>
            <surname>Markham</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Powles</surname>
          </string-name>
          , T. Walsh,
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          3220072. doi:
          <volume>10</volume>
          .1145/3219819.3220072. A. L. Washington (Eds.), AIES '20: AAAI/ACM Confer[24]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>The fact: Taming ence on AI, Ethics, and</article-title>
          <string-name>
            <surname>Society</surname>
          </string-name>
          , New York, NY, USA,
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <article-title>latent factor models for explainability with factoriza-</article-title>
          <source>February 7-8</source>
          ,
          <year>2020</year>
          , ACM,
          <year>2020</year>
          , pp.
          <fpage>180</fpage>
          -
          <lpage>186</lpage>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <article-title>tion trees</article-title>
          , in: B.
          <string-name>
            <surname>Piwowarski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chevalier</surname>
          </string-name>
          , É. Gaussier, //doi.org/10.1145/3375627.3375830. doi:
          <volume>10</volume>
          .1145/3375627.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Y.</given-names>
            <surname>Maarek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scholer</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of 3375830.</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <source>the 42nd International ACM SIGIR Conference on Re-</source>
          [33]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Harper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          , The movielens datasets:
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <article-title>search and Development in Information Retrieval, SI- History and context</article-title>
          ,
          <source>ACM Trans. Interact. Intell. Syst. 5</source>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <source>GIR</source>
          <year>2019</year>
          , Paris, France,
          <source>July 21-25</source>
          ,
          <year>2019</year>
          , ACM,
          <year>2019</year>
          , pp. (
          <year>2016</year>
          )
          <volume>19</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          :
          <fpage>19</fpage>
          . URL: https://doi.org/10.1145/2827872.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          295-
          <fpage>304</fpage>
          . URL: https://doi.org/10.1145/3331184.3331244. doi:
          <volume>10</volume>
          .1145/2827872.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <source>doi:10.1145/3331184</source>
          .3331244. [34]
          <string-name>
            <given-names>V. W.</given-names>
            <surname>Anelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bellogín</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Malitesta</surname>
          </string-name>
          , [25]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <surname>Explainable recom- F. A. Merra</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Pomo</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          <string-name>
            <surname>Donini</surname>
          </string-name>
          , T. D. Noia, El-
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>ligence</surname>
          </string-name>
          ,
          <source>AAAI</source>
          <year>2019</year>
          ,
          <string-name>
            <given-names>The</given-names>
            <surname>Thirty-First Innovative Appli- F. Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Suel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Castells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jones</surname>
          </string-name>
          , T. Sakai
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <source>cations of Artificial Intelligence Conference</source>
          ,
          <string-name>
            <surname>IAAI</surname>
          </string-name>
          <year>2019</year>
          , (Eds.),
          <source>SIGIR '21: The 44th International ACM SIGIR</source>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <source>in Artificial Intelligence, EAAI</source>
          <year>2019</year>
          , Honolulu, Hawaii, tion Retrieval, Virtual Event, Canada,
          <source>July 11-15</source>
          ,
          <year>2021</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <surname>USA</surname>
          </string-name>
          , January 27 - February 1,
          <year>2019</year>
          , AAAI Press,
          <year>2019</year>
          , ACM,
          <year>2021</year>
          , pp.
          <fpage>2405</fpage>
          -
          <lpage>2414</lpage>
          . URL: https://doi.org/10.1145/
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          pp.
          <fpage>3622</fpage>
          -
          <lpage>3629</lpage>
          . URL: https://doi.org/10.1609/aaai.v33i01.
          <volume>3404835</volume>
          .3463245. doi:
          <volume>10</volume>
          .1145/3404835.3463245.
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          33013622. doi:
          <volume>10</volume>
          .1609/aaai.v33i01.
          <fpage>33013622</fpage>
          . [35]
          <string-name>
            <given-names>V. W.</given-names>
            <surname>Anelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. D.</given-names>
            <surname>Noia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Sciascio</surname>
          </string-name>
          , C. Pomo, [26]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fusco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vasileiadis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wardatzky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ragone</surname>
          </string-name>
          ,
          <article-title>On the discriminative power of hyper-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          (Ed.),
          <source>Proceedings of the Twenty-Eighth International ceedings of the 13th ACM Conference on Recommender</source>
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <source>Joint Conference on Artificial Intelligence, IJCAI</source>
          <year>2019</year>
          ,
          <article-title>Systems</article-title>
          , RecSys
          <year>2019</year>
          , Copenhagen, Denmark, Septem-
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <string-name>
            <surname>Macao</surname>
          </string-name>
          , China,
          <source>August 10-16</source>
          ,
          <year>2019</year>
          , ijcai.org,
          <year>2019</year>
          , pp.
          <source>ber 16-20</source>
          ,
          <year>2019</year>
          , ACM,
          <year>2019</year>
          , pp.
          <fpage>447</fpage>
          -
          <lpage>451</lpage>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          2343-
          <fpage>2349</fpage>
          . URL: https://doi.org/10.24963/ijcai.
          <year>2019</year>
          /325. //doi.org/10.1145/3298689.3347010. doi:
          <volume>10</volume>
          .1145/3298689.
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <source>doi:10</source>
          .24963/ijcai.
          <year>2019</year>
          /325. 3347010. [27]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tsang</surname>
          </string-name>
          , D. Cheng, H. Liu,
          <string-name>
            <given-names>X.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , Y. Liu, [36]
          <string-name>
            <given-names>W.</given-names>
            <surname>Krichene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rendle</surname>
          </string-name>
          ,
          <article-title>On sampled metrics for item</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <source>ing ad-recommendation systems via neural interaction Prakash (Eds.)</source>
          ,
          <source>KDD '20: The 26th ACM SIGKDD</source>
          Con-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>