Comparative Explanations for Recommendation: Research
                         Directions
                         Meysam Varasteh1,* , Elizabeth McKinnie2 , Amanda Aird2 , Daniel Acuña1 and Robin Burke2
                         1
                             Department of Computer Science, University of Colorado, Boulder, USA
                         2
                             Department of Information Science, University of Colorado, Boulder, USA


                                        Abstract
                                        Explanations have a long history in recommender systems. Researchers have studied the different roles explana-
                                        tions can play, the value of explanations for users, and different techniques for generating explanations for a given
                                        output. To date, we have rarely seen recommender systems make use of comparative explanations, a technique
                                        that social scientists emphasize as important in human explanatory behavior. We believe that comparative
                                        explanation could be a very powerful tool to augment explanations that recommender systems currently provide
                                        and to offer new types of transparency. In this paper, we provide a taxonomy of different types of comparative
                                        explanations for recommender systems, emphasizing in particular the potential value of comparative explanations
                                        for recommender system providers. We suggest directions for future research to realize this potential rather than
                                        providing solutions.

                                        Keywords
                                        recommender systems, explanation, multistakeholder recommendation


                         1. Introduction
                         In an extensive survey of the social science literature, the psychologist Tim Miller concluded that when
                         people ask “Why P?” questions, they are typically asking “Why P rather than Q?,” where Q is often
                         implicit in the context [1]. An explanation that answers such a question Miller terms as constrastive.
                         That term has taken on a specific meaning in fair machine learning and explainable AI (XAI), so we will
                         use the synonym comparative to mean a general strategy of using comparison as a means of creating
                         explanations. We believe that comparative explanations can provide greater insight into recommender
                         system operation and help engender user trust. In this paper, we examine the little-studied realm
                         of comparative explanations for recommender systems and provide a taxonomy of different types of
                         such explanations. We also propose comparative explanations as a useful component of interfaces for
                         providers, stakeholders who are rarely considered in designing recommender system interfaces.
                            Following Miller, we will define comparative explanation in recommendation as comparing two
                         different (actual or possible) outcomes from a recommender system with the aim of providing greater
                         transparency into the system’s operation. Note that this definition excludes comparing a recommended
                         item to an item already rated by the consumer. We examine comparative explanations by considering
                         two aspects of recommendation explanations:

                                 • Audience: To whom is the explanation being delivered? Recommender systems are best under-
                                   stood as multistakeholder applications [2] and explanations delivered to different stakeholder
                                   audiences will require different approaches. There are the three main stakeholders of any recom-
                                   mender system [2]: consumer, provider, and system. We concentrate on consumer and provider,
                                   and note that very little attention has been paid to provider-side explanations in recommender
                                   systems research.


                          IntRS’24: Joint Workshop on Interfaces and Human Decision Making for Recommender Systems, October 18, 2024, Bari (Italy)
                          $ meysam.varasteh@colorado.edu (M. Varasteh); elizabeth.mckinnie@colorado.edu (E. McKinnie);
                          amanda.aird@colorado.edu (A. Aird); daniel.acuna@colorado.edu (D. Acuña); robin.burke@colorado.edu (R. Burke)
                           0009-0003-0346-4951 (M. Varasteh); 0009-0002-8721-5700 (E. McKinnie); 0009-0002-0348-5843 (A. Aird);
                          0000-0002-7765-1595 (D. Acuña); 0000-0001-5766-6434 (R. Burke)
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
    • Scope: Comparative explanations will differ in form depending on whether we are comparing
      recommendations of individual items or the general behavior of the recommender system over
      time or over some set of items.

   Because this paper is interested in comparative explanation and because we are taking a multistake-
holder approach, there are a wide variety of different scopes to be considered, most of which have not
seen any research attention. For our purposes in this paper, we consider scope as a two-level construct.
First, there is the question of what aspect of recommendation output is being compared. Possible
answers include:

    • The presence of an item in a recommendation list,
    • The rank of an item in a recommendation list, or
    • A pattern of recommendation that spans multiple items in a list or multiple lists.

 The next aspect of scope for comparative explanation is the question of the source of the comparison.
We can compare:

    • Recommendations given to a single recommendation consumer,
    • Recommendations given to two different consumers: cross-consumer comparison, or
    • Recommendations of items from a given provider.1

  Finally, we have the question of audience. Not all types of explanation are appropriate for all audiences.
There may be privacy or business reasons why not all types of explanations should be available to all
audiences.2 Table 1 illustrates the space of comparative explanations using a letter code to uniquely
identify each possibility. We use dashes to indicate those explanations that are unlikely to be used in
real systems because of potential violations of users’ privacy or confidentiality expectations.
  For example, consider an item-oriented cross-consumer explanation being delivered to an individual,
the missing iccC cell in the table. Such an explanation would answer a question like “Why is User
A being recommended Item X and I am not?”. There are several reasons to doubt whether such an
explanation could or should be part of an explainer’s output. First, this question assumes that the user
knows what is being recommended to someone else, and in addition, any answer to this question would
by necessity divulge information about User A’s profile. Similar considerations exist about divulging
user information to providers or cross-provider information, which might be considered confidential
business data in an e-commerce setting. In this paper, we focus on explanation types that would not
create such risks to confidentiality.
  There are nine types of comparative explanations that we consider. Examples of the questions such
explanations would respond to are shown here:

    • Consumer-oriented questions
          – Why is Item A being recommended to me instead of Item B? (icC)
          – Why is Item A being ranked ahead of Item B in my recommendations? (rcC)
          – Why are items of type A being recommended to me more than items of type B? (tcC)
    • Provider-oriented questions
          – Why is my Item A being recommended to users of type X and not users of type Y? (iccP)
          – Why is my Item A being recommended and my Item B is not being recommended? (ipP)
          – Why is my Item A being ranked higher to users of type X than users of type Y? (rccP)
          – Why is my Item A being ranked higher than my Item B in recommendation lists? (rpP)
1
  It is possible to have comparative explanations across providers, but these would likely violate providers’ privacy and so
  would be accessible only to system stakeholders.
2
  As noted above, we are setting aside the System stakeholder for future work. Providing explanations to this set of users
  amounts to building a dashboard for examining all aspects of recommendation outputs. This is a worthy task, but we are
  more interested in users who would not be experts in the design and operation of the system.
                                       Source of Comparison               Audience
                                                                   Consumer (C) Provider (P)
                                      Consumer (c)                 icC          —
                                      Cross-Consumer (cc)          —            iccP
                        Item (i)
                                      Provider (p)                 —            ipP
                                      Consumer                     rcC          —
               Scope                  Cross-Consumer               —            rccP
                        Rank (r)
                                      Provider                     —            rpP
                                      Consumer                     tcC          —
                                      Cross-Consumer               —            tccP
                        Pattern (t)
                                      Provider                     —            tpP
Table 1
Different types of comparative explanations. Blank entries indicate explanations that would be likely to violate
privacy or confidentiality.


         – Why are my items of type A being recommended to users of type X and not users of type
           Y? (tccP)
         – Why are my items of type A being recommended more often than my items of type B? (tpP)

In the rest of this paper, we will examine in detail hypothetical scenarios that illustrate each type of
comparative explanation, discuss the benefit of such explanations, and consider what would be required
to generate appropriate explanations in response to these hypothetical questions. Our scenarios are
non-exhaustive and created to illustrate our comparative explanation taxonomy.


2. Related Work
This paper brings together ideas from explanation in recommender systems, multistakeholder recom-
mendation, and comparative / constrastive explanation from explainable AI.

2.1. Explanation in Recommender Systems
Explanation has a long history in recommender systems research, starting with the seminal work of
Herlocker et al. [3] and summarized more recently in the survey in [4]. This research trajectory has
concentrated entirely on consumers as the explanation audience and with very limited exceptions, has
concentrated on explanations for individual recommended items. As Tintarev and Masthoff discuss
in [4], research has explored additional types of explanations meant to place recommendations in
context or provide users with support in understanding their own goals and consumption behavior.
Comparative explanations could serve these roles as well.
   The comparative explanations in recommendation explored in [5, 6, 7] fall under a slightly different
framework than that explored here. These works aim to justify a single recommendation by comparing
it to items the user has already rated. In these cases, comparison serves to provide a known anchor
point against which a recommendation is evaluated and draws from sentences extracted from reviews
[5] or recipes [6, 7] as the underlying explanatory text. Our vision of comparative explanation involves
comparing two possible recommender system outputs in line with Miller’s concept, not comparing user
input with user output.

2.2. Contrastive Explanation in XAI
An explanation can be considered an answer to the question, "Why P?" where "P" represents the explicit
event that occurred and needs to be explained. However, studies in social science and philosophy show
that "why" questions are often more complex than this straightforward approach [8, 9, 10]. These
studies indicate that such questions go beyond the occurrence of event P and expect the explanation to
address more than just a single event.
   This raises an important question in the field of Explainable AI (XAI): "What constitutes a good
explanation?" To address this, Miller in [1] proposes three key criteria for effective explanations. First,
explanations should be contrastive, meaning they should explain why a particular input yields a specific
output rather than an alternative output. Second, explanations should be selective, presenting only the
relevant information and avoiding the inclusion of all possible causes to reduce cognitive load for both
the explainer and the explainee. Lastly, explanations should be social, recognizing that they are a form
of communication between the explainer and the explainee.
   Contrastive explanation has been studied in image classification and other areas of AI. For image
classification, a contrastive explanation could be framed as follows: a classifier predicts label 𝑌 for input
𝑋 because, if 𝑋 were slightly modified to 𝑋𝑐 , the classifier would instead predict label 𝑌𝑐 where X can
be important object pixels in an image [11, 12, 13]. [11] proposes a contrastive explanation method
for explaining image black-box classifiers. This method identifies the minimal set of pixels that must
be present in an image to justify its classification and the minimal set of pixels that must be absent to
distinguish it from a similar input close to the original. Note that the contrastive aspect is between the
actual image and a hypothetical alternative that would be classified differently.
   In text generation settings, where the output is a sequence of words rather than a single label, the
explanation might be framed as the LLM outputs a reply to a given prompt because if the prompt was
slightly modified, the LLM would have given a different response [14, 15, 16, 17].
   Jacovi et al. [18] introduce a comparative explanation method for model interpretability by generating
a contrastive latent representation. This method projects the latent representation of the input space
into a new space that distinguishes between two different decisions made by the model. They use an
interventionist approach [19] to determine the causality of a factor by intervening on it, thus generating
a counterfactual. In all of these contrastive explanation studies, although the questions do not directly
compare two types of events, they explain the event in a comparative manner for the purpose of
justification.

2.3. Provider Perspectives
Although the multistakeholder perspective on recommender systems has achieved recognition in recent
research [2], there is little research specifically on how recommender systems should interface with
item providers. Recommender systems designers have at times considered providers’ perspectives as
irrelevant to their efforts to find appropriate content for users, leading them to concentrate on ways to
defeat provider manipulation of ranking systems [20]. The general problem of recommender system
transparency for providers has been approached in qualitative work seeking provider perspectives. For
example, content creators and dating app users identified transparency as an important part of fairness
in [21], creating transparency metrics that expose matching mechanisms to users or discussing content
revoking reasoning and “what factors into a successful post”. Music artists mostly agreed that they
wanted more transparency in [22] and [23], with one artist specifically stating a desire for the system
to describe what artists should do to be recommended more often [23]. YouTube creators also discussed
transparency, particularly how difficult it is to understand the relationship between creative choices and
the impact of their videos, and wanted to know how the algorithm operates [24]. An example scenario
that the creators generated during one of the workshops was that the algorithm can explain that your
video is not doing well due to how long it is [24].


3. Consumer-Oriented Comparative Explanations
As noted above, explanation in recommender systems has historically been directed towards consumers.
Providing consumer-oriented explanations can build trust and satisfaction among users by helping
them understand why certain recommendations are being made [25, 4]. This section focuses on
comparative explanations for consumers examining different kinds of comparative explanations by
providing hypothetical examples of questions and explanations. As we note above, the set of comparisons
that we expect users to seek will be focused on their own recommendation results. Other types of
comparisons may violate privacy or confidentiality.
   In demonstrating these explanation types, we will rely on hypothetical scenarios from two very dif-
ferent domains: music streaming recommendation (an application oriented towards a general audience)
and the recommendation of scientific literature (an application targeted towards specialists). Also note
that music streaming is more asymmetrical (with consumers less likely to have provider roles as well)
whereas the consumers of recommendations about scientific papers are likely to be individuals who
also publish such papers.

3.1. Comparing Individual Items (icC)
Question template: Why is Item A being recommended to me instead of Item B?

The first type of comparative explanation for a single consumer involves comparing two different
individual items when one item was explicitly recommended and the other item was not. In Miller’s
terminology, the first item is referred to as the fact, and the second as the foil [1].
   Consider the following hypothetical example: Sarah is a frequent user of the Tunester platform,
which she uses to listen to music. The platform uses collaborative filtering to recommend music to
users by finding similar users (neighbors) and recommending songs that those neighbors like. Each
time it recommends 10 songs in descending order based on the predicted score for each track. Sarah
received a recommendation list from the platform, and the first song on the list “Alice Abroad” was
from her favorite band, Wolf Law. Since she is a fan of this artist, she is familiar with other songs
by them, especially “Because, Because” which she really likes. She becomes curious about why the
recommendation system recommended “Alice Abroad” to her instead of “Because, Because.” In answer
to this question, the comparative explainer might say, “85% of your close neighbors listened to ‘Alice
Abroad’ while only 10% of your neighbors listened to ‘Because, Because’. That’s why ‘Alice Abroad’
was recommended.” This explanation makes it clear that a collaborative recommender is in use and that
Sarah’s recommendations are a function of what her peers on the platform are listening to.
   This kind of explanation would be useful in any environment where the user is likely to have extensive
knowledge of potential items for recommendation, such as popular culture or media, because it depends
on the consumer having specific knowledge of items they might expect to be recommended to them. In
other settings, it might be less useful; for example, for restaurant recommendations in an unfamiliar city,
the consumer might be unlikely to have an alternative ‘foil’ restaurant about which they are seeking an
explanation.
   To generate such an explanation, the comparative explainer needs access to key steps in the rec-
ommendation computation: the selection of peers, the characteristics of these peers’ profiles, and the
extrapolation of recommendations. Since the user is supplying the two entities to be contrasted, the
explainer can focus on the difference between how these entities were treated in the original recom-
mendation calculation, or it could run the recommendation calculation again, with these two entities as
targets, and extract the differences.

3.2. Comparing Item Ranking (rcC)
Question template: Why is Item A being ranked ahead of Item B in my recommendations?

  Items are typically recommended to users in a ranked list, with each item’s rank indicating its
importance and priority relative to the others. Users expect that the first item is the most relevant and
best aligned with their preferences. However, the ranking process is often not transparent, leaving
users uncertain as to how rankings are derived.
  We can examine this explanation style with reference to Sarah and the Tunester platform. Recently,
Sarah noticed that in a list of recommendations, “All the Things You Are” (the first recommendation)
and “Bolivar Blues” (the seventh recommendation) are both in the Jazz genre and by the same artist.
Curious about the ranking, she asks why “All the Things You Are” is prioritized over “Bolivar Blues”.
The comparative explainer responds by saying “Our recommender likes to promote artists’ newer work.
‘All the Things You Are’ is recommended over ‘Bolivar Blues’ because it is a recently-released track.”
   We see that the explanation in this case focuses attention on a particular feature that the recommender
system takes into account in ranking, one that the user might not be aware of. From Sarah’s point of
view, these tracks are very similar; the explanation provides an opportunity for the recommender to
indicate what distinctions it is making. This type of recommendation could also be useful in contexts
where the user might not be aware of some of the key distinctions between items. For example, consider
an e-commerce setting, where a consumer is purchasing an electronics product, such as a laptop
dock. These products come in many different configurations and capabilities, so inquiring about the
recommender’s ranking may help the user understand the differences between the products.
   As in our first example, the explainer needs access to the recommendation computation. To get the
type of explanation we have envisioned in this case, we can imagine a causal model linking the songs’
features to their weight in the item representation and to the recommendation calculation. The release
date would be a key difference between the casual reasoning in each case since the songs are similar in
other ways, and could be used as the basis for generating our (admittedly hypothetical) explanation.

3.3. Comparing Patterns of Items (tcC)
Question template: Why are items of type A being recommended to me more than items of type B?

A pattern of recommendation can span multiple items in a list or across multiple lists. There are many
different types of patterns that users might ask about. Questions that are related to item features will
have explanations similar in nature to those discussed above about comparative ranking.
   A different type of pattern is one that takes place over time, when the recommender is making certain
recommendations at one time and different recommendations at another. Such temporal changes in
recommendation patterns can be categorized as either contextual shifts or preference drifts. Contextual
shift refers to changes in the context or environment in which recommendations are made. This context
can include observable factors such as the time of day, location, the user’s current mood, or even
external factors that might not be observable to users. Preference drift, on the other hand, describes the
gradual evolution in a user’s preferences over time. Unlike contextual shifts, which are temporary and
context-dependent, preference drift reflects long-term changes in what a user likes or needs [26]. While
users might not always be aware of these underlying changes, they may notice the changed nature of
the content or products recommended to them. Providing explanations can help users recognize and
understand these changes, such as why certain recommendations may have shifted or diversified over
time.
   Consider the following hypothetical example: Sanjay is a final-year PhD student in mathematics; the
title of his dissertation is “Unconstrained Optimization: a New Faster Method for Solving Nonlinear
Equations.” During his PhD studies, he has published several papers. He is a regular user of Moogle
Scholar, a website that allows users to subscribe to a daily newsletter of recommended scientific papers
according to their interests. He has found that he generally receives paper recommendations that are
highly relevant to his research area. However, recently, he has noticed a shift in the recommendations
coming from the system, with a focus on optimization methods for machine learning papers – an area
in which he has neither published nor has substantial knowledge. Sanjay is curious about why this
change in recommendation patterns has occurred and asks: “Why are papers on machine learning-
oriented optimization methods now being recommended more frequently than those on mathematical
optimization that I used to get?” The comparative explainer responds: “Sometimes we recommend
papers based on patterns of citations coming from your work. There has been a recent increase in
citations of your work from researchers in the machine learning topic area and we are recommending
papers by some of those researchers to you.”
   Without this comparative explanation, Sanjay might not understand the reasons for the changing
patterns in the recommendations he see. With the insights the explanation provides, he may want to
learn more about the researchers who have picked up on his work and what they are doing. Without the
transparency provided by the explanation, he might have been tempted to ignore the recommendations
of these papers. We envision that temporally-oriented comparative explanations could be very helpful
in explaining trends in any area. The recommender has access to the constantly changing item catalog
and the evolution of user behavior and typical users will not. We will also see the value of these types
of explanations for providers in the next section.
   Generating such temporal explanations is not simple, however. First, it assumes that the explainer
either has access to or can re-generate a trace of the recommender system’s behavior sufficient to derive
an explanation for the phenomenon of interest. We cannot assume that the explainer is tracking every
kind of change in anticipation of some user possibly asking about it. There will be too many possible
changes and too many users and most such computation would be wasted. With this capability in
place, the system would still need the ability to identify the trend to which the user is referring. In this
specific case, the system would need to be able to make enough sense of the question to detect that the
user is asking about a change in the topic area of recommended papers. After that, the types of casual
inference that we have referred to above would be needed to isolate the effect of changes in citation
patterns which cause the changes in recommendations that the user sees.


4. Provider-Oriented Comparative Explanations
A multistakeholder approach to recommendation requires that we consider the needs of stakeholders
other than recommendation consumers, especially item providers [2]. As noted above, providers note
acutely the lack of transparency that they perceive in their interactions with recommender systems.
Yet, research on provider-oriented explanations in recommender systems is practically non-existent.
There is a lot to address in this gap. For the purposes of this paper, we believe that providers can
benefit from explanations that help them understand to whom their products are recommended (and to
whom they are not). We note that in the music context, artists (as providers) have expressed a strong
desire for more transparency [22, 23] in recommender systems. One artist specifically noted a desire
for the system to explain what actions they could take to be recommended more frequently [23]. We
noted above that comparative explanations are more informative for consumers and align with how
humans naturally make decisions. For slightly different reasons, this approach is equally valuable
for providers. Comparative questions, such as why two items from the same provider have different
levels of recommendation activity, can reveal potential issues like algorithmic bias or intrinsic quality
differences. This understanding can lead to improved marketing strategies, enabling providers to more
effectively reach their desired audiences.

4.1. Single Provider
4.1.1. Comparing Individual Items (ipP)
Question template: Why is my Item A being recommended and my Item B is not being recommended?
Ideally, providers want all their items to have a fair chance of being recommended and purchased or
otherwise being received by interested consumers or audiences. There is of course natural variation
in the appeal of products, but it is very hard for a provider, especially one whose work is intrinsically
tied to a recommendation-oriented platform (think YouTube or TikTok), to understand the interplay
between the item’s properties and the relatively inscrutable behavior of the recommender system. This
lack of transparency in existing algorithms often leads providers to try to develop their own folk theories
from others’ experiences or ad-hoc experimentation [27, 28]. Such theories can be far afield from actual
algorithm behavior. Comparative explanations of the type that we are advocating for in this paper can
help close this gap. Analogous to the consumer-oriented comparative explanations discussed above, we
can imagine explanations through which providers can compare the treatment of different items at the
hands of the recommender system.
   Consider the following hypothetical example: Maria is a PhD student working on the evaluation
of large language models. In the final years of her PhD, she writes a conference paper based on her
results and then follows it up with a more comprehensive journal article. While she is developing her
application materials in advance of a job search, she consults the provider-side interface of Moogle
Scholar which gives her information about her profile relative to the recommendations it generates.
She notices that the conference paper is being recommended more often than the journal article, even
though the journal article was published in a top journal and the conference was a relatively small
one. Curious about the reasons behind this difference, she asks the system: “Why is my conference
paper A being recommended more often than my journal article B?” The system responds: “One factor
in our recommendation algorithm is how often a paper is cited and by whom. Publication A has a
higher citation count and higher quality score (citations * centrality of citing author) than Publication
B, which outweighs Publication B’s higher venue centrality score and so Publication A is more likely to
be recommended in contexts where both items might be relevant.”
   We see that this explanation provides a valuable bit of transparency to the researcher who can use it
to understand the difference in reception of her work and might cause her to investigate how her work
has been cited. We imagine such explanations to be highly useful in other recommendation applications.
The YouTube creators interviewed in [24] explicitly noted their inability to understand why some
of their videos were viewed heavily and other, similar, videos received hardly any attention. ipP
explanations could go a long way towards helping such individuals have a more productive relationship
with the recommendation platform.
   Our example here assumes a capability not present in today’s recommendation platforms: a provider-
side interface that helps a provider understand when and in what contexts their items have been
recommended. We believe that robust interfaces of this type are necessary for providers to be incorpo-
rated as first-class users of recommender systems; this is corollary of adopting a multistakeholder view
of recommendation in the first place. In this example and others in this section, we proceed under the
assumption that such an interface exists and comparative explanation can become one of its features.
   The types of casual explanation mechanisms that we have discussed earlier in this paper could come
into play in generating explanations of this type. There is a key difference, however. Providers would
not have access to individual recommendation lists to be able to ask questions about how individual
lists were generated. So, provider-focused explanations are by necessity explaining a pattern that has
developed over multiple recommendation interactions: item A was recommended over time, item B was
not. This creates complexities similar to those discussed in relation to the pattern-oriented explanations
of type tcC. The recommender would need to be able to revisit a history of recommendation decisions
and the underlying processes that generated them.

4.1.2. Comparing Item Ranking (rpP)
Question template: Why is my Item A being ranked higher than my Item B in recommendation lists?

As described earlier, items are usually recommended to users in a ranked list, where items higher
in the list are expected to be the most relevant and the most aligned with user preferences. While
providers likely do not have access to individual user’s recommendation lists, they may be told average
recommendation ranks across many users’ recommendation lists of their own items, and may wonder
why there is a discrepancy in rank between two of their items (on average).
   Consider the following hypothetical example: Saylor Twift is an up-and-coming singer-songwriter
who just released a new EP. Her new album has five songs on it, and she and her manager thought
the first song, “Orange” – a poppy, upbeat, fun song – would perform better than “Mosaic”, a slow,
acoustic ballad. Twift and her team have been advertising the whole album but have been focusing on
the first song. Twift’s album is on Tunester, the hypothetical music streaming platform we have seen
before. The artist interface for Tunester shows that in recommendation lists, “Mosaic” is ranked, on
average, higher than “Orange” on these daily playlists. Twift wants to know why this is the case. The
comparative explainer responds by saying, “Our recommender considers the music tastes of listeners
when recommending songs. ’Mosaic’ is recommended over ’Orange’ on average because listeners in
the Pop music category who have you and artists like you in their listening history tend to listen more
to slow acoustic songs with guitar than uptempo songs with synthesizer.”
   In this case, the comparative explainer reveals a glimpse of how the recommender algorithm works –
clearly, some assigned categorization of the songs are considered and playlists depend on engagement
with artists. We can also see that the recommender characterizes music listeners and uses that to make
decisions. Here, comparative explanation is particularly advantageous over simple factual explanation –
asking “why does my song not have a higher ranking” may be difficult to explain by itself, since ranking
is inherently rivalrous. The simple answer is that other things are ranked higher, but the comparative
explanation can help the provider narrow in on what features of their items the recommendation
algorithm is picking up on, which can help influence decisions on newer productions or marketing.
Armed with this new knowledge, maybe Twift decides to focus more on the acoustic songs featured on
her new EP. This type of explanation would be informative to any provider with multiple items being
recommended and wanting greater transparency in recommender system operation.
   As with the prior provider-oriented explanation, the explainer needs access to the history of rec-
ommendation slates delivered to users and the ability to generalize over multiple executions of the
recommendation algorithm. A casual approach would be useful as we have seen, to identify the key
features distinguishing between the treatment of one item versus another. Our hypothetical explanation
assumes the system performs certain kinds of categorization (“slow” vs “uptempo”, “artists like you”)
and these can be included in the explanation.

4.1.3. Comparing Patterns of Items (tpP)
Question template: Why are my items of type A being recommended more often than my items of
     type B?

   Although comparing individual items or individual items’ rankings may be useful for a provider, a
provider might notice larger discrepancies in the recommendations of items according to specific types
of their items or across time. Instead of considering a single recommendation list, the provider would
ask about the recommendations of their items across multiple users.
   Consider the following hypothetical example: Jo is a tenured professor of information science
of 10 years. They have 600+ citations, and although much of their early work was in ethical AI
and policy, recently they’ have been working more on research into ethical AI curriculum. Their
papers are all available on Moogle Scholar. Recently, Jo noticed that their ethical AI and policy papers
have been recommended significantly more to users. Jo wants to know why their older papers on
ethical AI and policy are being recommended more often than their more recent papers on ethical AI
curricula and pedagogy. The comparative explainer responds by saying, “We recommend papers that
are highly relevant to our subscribers’ research interests. There are fewer scholars working on ethical
AI curriculum and fewer opportunities to recommend these papers compared to the number of AI and
policy researchers.”
   Here the explainer is focusing on the recommendation opportunities, based on the popularity of the
research area. Jo might not have a good sense for the relative popularity of different research fields,
especially if they are not currently doing research in this area. Similar to the provider-side explanations
above, this kind of insight could be useful to any provider seeking to understand larger scale dynamics
of which the recommender system is aware by its position.
   The explainer would need the same historical perspective that the other provider-oriented explanation
types require. It would also need to understand what Jo is referring to in describing the classes of
papers they wish to contrast (“ethical AI and policy” vs “ethical AI curriculum”). A causal model of how
recommendations are delivered over time could highlight the difference between papers with many
potential opportunities for recommendation and those with fewer.

4.2. Cross-Consumers
Providers often aim for equitable distribution or recommendations of their products across different
consumer groups (e.g. geographic location, gender, age, race/ethnicity, or stated preferences), but
achieving this can be challenging due to various factors including consumer behavior, marketing
strategies, or potential biases. Therefore, providers may be particularly interested in understanding the
rationale behind differing levels of engagement or utility among these consumer groups. To the best of
our knowledge, this type of explanation has not been studied yet in recommender systems.

4.2.1. Comparing Individual Items (iccP)
Question template: Why is my Item A being recommended to users of type X and not users of type
    Y?

In previous examples, the key comparison occurred between two items, item rankings, or groups of
items. For the cross-consumer source, however, the comparison is instead between groups of users. A
provider may want to know why a specific item of theirs is recommended to one group of users and
not another.
   We’ll return to the example of Jo, a tenured information science professor. They have recently
published a new paper, “Ethical AI Curriculum in U.S. Public High Schools: A Literature Review”. When
looking at their Moogle Scholar dashboard, which shows analytics of recommendations on their papers,
Jo notices that this paper is only being recommended to professors and not to grad students. Jo wants
to know why this is the case, as they have recently been talking to several grad students interested in
this area who might benefit from reading such a paper. The comparative explainer responds by saying,
“We recommend papers that are similar in topic to papers that users have published and depend on a
minimum of five papers to establish a user’s topic profile. Grad students have fewer published papers
on average than professors and may not have established profiles in the system.”
   The explanation reveals more about how the underlying recommender system works. Clearly, if
one doesn’t have any published papers, their recommendations may not be the same as someone with
published papers, even if they share common interests. This may not be desirable behavior, but at least
Jo now knows why their paper is not being recommended in the way that they expect. The added
benefit of comparative explanation in this context is that it can serve two purposes at once, by allowing
the provider to discover why a certain group is not being recommended to and to discover why a certain
group is being recommended to. Each of these may be useful in different contexts. For example, a queer
Instagram content creator might want to know why their content is being served to transphobic people,
which they do not want, and why their content is not being served to LGBTQIA+ people, which they
do want. Asking only one of these questions may not reveal the full picture.
   On top of the capabilities already discussed for provider-oriented explanations, this type of expla-
nation requires information about different user categories: “professor” vs “graduate student”. Such
categorization is assumed in the ability for Jo to even formulate their question, so this capacity is similar
to what we have articulated for the other explanation types.

4.2.2. Comparing Item Ranking (rccP)
Question template: Why is my Item A being recommended more highly to users of type X than users
     of type Y?

There are many factors that may cause recommendations to vary across demographic groups. There
may be shared group preferences; consider teenage music listeners versus elderly ones. There may
be cultural factors, such as the language in which lyrics are sung, and there may be third parties
differentially marketing items to one group over another.
   Consider again our musical artist Saylor Twift. From Tunester’s artist dashboard, she discovered that
“Mosaic” was recommended to European listeners at a higher average rank compared to U.S. listeners,
despite equal advertising efforts in both regions. Since a higher ranking increases the likelihood of
consumption, she wants to understand why the “Mosaic” track is ranked higher for European listeners
than for listeners in the U.S. The comparative explainer replies: “Our recommender considers the
demographics of listeners when recommending songs. ’Mosaic’ is recommended more highly in the
European market on average because listeners in the Pop music category who have you and artists like
you in their listening history tend to be older than their U.S. counterparts and older listeners tend to
prefer acoustic music.”
  It is easy to imagine explanations of this type being of use to any type of creator interested in the
audience for their work and wanting to understand how the recommender is (or is not) helping to reach
those audiences or markets. Analysis of this type might highlight undesired bias in a recommender
system. For example, consider a company that discovers that its job listings for engineers were being
recommended much more often to male applicants than female applicants. If the job recommender
exhibited the type of gender bias seen in [29], the hiring manager might get an explanation saying that
“We recommended your engineering positions to male applicants because they were more similar in
background to prior successful applicants for similar positions than the female applicants were.” This
would be a sign of unacceptable gender bias in the recommender and one would expect the manager to
demand that the platform change how its algorithm operates.

4.2.3. Comparing Patterns of Items (tccP)
Question template: Why are my items of type A being recommended to users of type X and not
     users of type Y?

Besides asking why one specific item is recommended differently between groups of users, a provider
may wonder why a group of their items is recommended differently. All of the benefits of a comparative
explanation for a single item and comparing across consumers holds in this case as well – namely,
knowing that your content is reaching your intended audience, or not, is extremely useful. The only
difference here is that by looking at a pattern, group, or trend of items, a provider can potentially
understand how more of their content is being recommended (or not) by the recommendation algorithm,
which could lead to deeper understanding of how particular content is affected.
   Let’s return to the example of Saylor Twift, a singer-songwriter who just released a new EP. The
artist interface for Tunester shows that the songs from the EP overall are recommended at much higher
rates to listeners in the Japan than to listeners than in the U.S., even though Twift is American and
is based in the U.S. Twift wants to know why this group of songs – her EP – is recommended more
to Japanese listeners and not to U.S. listeners. The comparative explainer responds by saying, “We
recommend items based on trends that we anticipate when there are new releases. The new EP was
just released in Japan and so is getting more recommendations due to our new release expectations
while in the U.S., it has been out for three months and is outside the new release window.”
   This explanation is again similar to those we have already seen in terms of its demands on the
explainer. This explanation in particular requires that the explainer have access to what might be
considered business rules (“release window”) about promotional activity that the platform engages in.


5. Conclusions and Future Work
We have presented a taxonomy of different types of comparative explanations for recommender systems,
where a comparative explanation is one that compares two different outcomes from the recommender
system with the aim of providing greater transparency into the system’s operation. We note that such
comparisons are recognized as a common feature of explanations in interpersonal communication.
  Our taxonomy considers different types of explanations based on the kinds of comparisons that are
being made, between which kinds of recommender output, and for which audiences / stakeholders. We
note in particular that not all comparative explanations are appropriate for all audiences for privacy or
confidentiality reasons. We emphasize the importance of explanations for multiple stakeholders, in
particular providers, who are for the most part neglected as users in recommender systems writ large.
As far as we know, this is the first work to date to examine the potential of comparative explanations
broadly in recommender systems, and to suggest the possibility of and potential impacts of explanations
for providers.
   This taxonomy is intended as a pointer to future work as the title would suggest, since there is little or
no work on the types of explanations that we describe here. Some explanation types require significant
advances in how we represent and track system activity within recommender systems, and how we
interface with users to solicit their explanation-oriented questions and to produce acceptable answers.
We note that comparative explanation, as a task, orients recommender systems explanation squarely
in the direction of system transparency. A question like “Why is A ranked higher than B in my list?”
cannot be legitimately answered in a post-hoc manner, as a rationalization. It can only be answered by
reference to the process by which the ranking is actually generated, prioritizing transparency. Thus,
we expect that new techniques may be needed to generate explanations for complex recommendation
models, but as we discuss, we believe that causal techniques may offer some solutions.
   Although some existing work has surveyed providers on their perspectives of the platforms they use,
no studies have interviewed providers with the intent of designing explanations for them. A detailed
survey of providers where their perspectives on explanations are gathered, as well as the current
information they receive from a platform’s interface, could help influence explanation design and help
articulate the value of comparative explanation. Such relationships would also be useful in future user
studies of a fully developed comparative explanation system.
   We opted to leave comparative explanations targeted at system stakeholders, such as platform owners
and operators, from our discussion. We expect that systems designers might want all of the types
of explanations included here and then some, in order to understand how their systems are treating
different kinds of content and different classes of users. Most recommender system designers and
operators function within well-resourced organizations, but with the rise of Very Small Online Platforms
(VSOPs) [30], the need for explanation interfaces to support recommender systems in these contexts
may increase.


Acknowledgments
Author Burke was supported by the National Science Foundation under grant award IIS-2107577. Author
McKinnie was supported by the National Science Foundation under grant award IIS-2232555.


References
 [1] T. Miller, Explanation in artificial intelligence: Insights from the social sciences, Artificial
     intelligence 267 (2019) 1–38.
 [2] H. Abdollahpouri, G. Adomavicius, R. Burke, I. Guy, D. Jannach, T. Kamishima, J. Krasnodebski,
     L. Pizzato, Multistakeholder recommendation: Survey and research directions, User Modeling and
     User-Adapted Interaction 30 (2020). doi:10.1007/s11257-019-09256-1.
 [3] J. L. Herlocker, J. A. Konstan, L. G. Terveen, John, T. Riedl, Evaluating collaborative filtering
     recommender systems, ACM Transactions on Information Systems 22 (2004) 5–53.
 [4] N. Tintarev, J. Masthoff, Beyond explaining single item recommendations, in: Recommender
     Systems Handbook, Springer, 2022, pp. 711–756.
 [5] A. Yang, N. Wang, R. Cai, H. Deng, H. Wang, Comparative explanations of recommendations, in:
     Proceedings of the ACM Web Conference 2022, 2022, pp. 3113–3123.
 [6] A. D. Starke, C. Musto, A. Rapp, G. Semeraro, C. Trattner, “tell me why”: using natural language
     justifications in a recipe recommender system to support healthier food choices, User Modeling
     and User-Adapted Interaction 34 (2024) 407–440.
 [7] C. Musto, A. D. Starke, C. Trattner, A. Rapp, G. Semeraro, Exploring the effects of natural language
     justifications in food recommender systems, in: Proceedings of the 29th ACM conference on user
     modeling, adaptation and personalization, 2021, pp. 147–157.
 [8] P. Lipton, Contrastive explanation, Royal Institute of Philosophy Supplements 27 (1990) 247–266.
 [9] D. Temple, Discussion: The contrast theory of why-questions, Philosophy of Science 55 (1988)
     141–151.
[10] P. Ylikoski, The idea of contrastive explanandum, in: Rethinking explanation, Springer, 2007, pp.
     27–42.
[11] A. Dhurandhar, P.-Y. Chen, R. Luss, C.-C. Tu, P. Ting, K. Shanmugam, P. Das, Explanations based
     on the missing: Towards contrastive explanations with pertinent negatives, Advances in neural
     information processing systems 31 (2018).
[12] Y. Wang, X. Wang, “why not other classes?”: Towards class-contrastive back-propagation explana-
     tions, Advances in Neural Information Processing Systems 35 (2022) 9085–9097.
[13] Y. Lei, Z. Li, Y. Li, J. Zhang, H. Shan, Lico: explainable models with language-image consistency,
     Advances in Neural Information Processing Systems 36 (2024).
[14] R. Luss, E. Miehling, A. Dhurandhar, Cell your model: Contrastive explanation methods for large
     language models, ArXiv abs/2406.11785 (2024). URL: https://api.semanticscholar.org/CorpusID:
     270560761.
[15] S. A. Chemmengath, A. Azad, R. Luss, A. Dhurandhar, Let the cat out of the bag: Contrastive
     attributed explanations for text, ArXiv abs/2109.07983 (2021). URL: https://api.semanticscholar.
     org/CorpusID:237532193.
[16] N. Madaan, I. Padhi, N. Panwar, D. Saha, Generate your counterfactuals: Towards controlled
     counterfactual generation for text, in: AAAI Conference on Artificial Intelligence, 2020. URL:
     https://api.semanticscholar.org/CorpusID:228063841.
[17] T. S. Wu, M. T. Ribeiro, J. Heer, D. S. Weld, Polyjuice: Generating counterfactuals for explaining,
     evaluating, and improving models, in: Annual Meeting of the Association for Computational
     Linguistics, 2021. URL: https://api.semanticscholar.org/CorpusID:235266322.
[18] A. Jacovi, S. Swayamdipta, S. Ravfogel, Y. Elazar, Y. Choi, Y. Goldberg, Contrastive explanations
     for model interpretability, in: Conference on Empirical Methods in Natural Language Processing,
     2021. URL: https://api.semanticscholar.org/CorpusID:232092617.
[19] J. Woodward, Making things happen: A theory of causal explanation, Oxford university press,
     2005.
[20] Y. Deldjoo, T. D. Noia, F. A. Merra, A survey on adversarial recommender systems: from at-
     tack/defense strategies to generative adversarial networks, ACM Computing Surveys (CSUR) 54
     (2021) 1–38.
[21] J. J. Smith, A. Satwani, R. Burke, C. Fiesler, Recommend me? designing fairness metrics with
     providers, in: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Trans-
     parency, FAccT ’24, Association for Computing Machinery, New York, NY, USA, 2024, p. 2389–2399.
     URL: https://doi.org/10.1145/3630106.3659044. doi:10.1145/3630106.3659044.
[22] K. Dinnissen, C. Bauer, Amplifying artists’ voices: Item provider perspectives on influence and
     fairness of music streaming platforms, in: Proceedings of the 31st ACM Conference on User
     Modeling, Adaptation and Personalization, UMAP ’23, Association for Computing Machinery, New
     York, NY, USA, 2023, p. 238–249. URL: https://doi.org/10.1145/3565472.3592960. doi:10.1145/
     3565472.3592960.
[23] A. Ferraro, X. Serra, C. Bauer, What is fair? exploring the artists’ perspective on the fairness of
     music streaming platforms, in: C. Ardito, R. Lanzilotti, A. Malizia, H. Petrie, A. Piccinno, G. Des-
     olda, K. Inkpen (Eds.), Human-Computer Interaction – INTERACT 2021, Springer International
     Publishing, Cham, 2021, pp. 562–584.
[24] Y. Choi, E. Kang, M. Lee, J. Kim, Creator-friendly algorithms: Behaviors, challenges, and design
     opportunities in algorithmic platforms, in: CHI 2023 - Proceedings of the 2023 CHI Conference
     on Human Factors in Computing Systems, Conference on Human Factors in Computing Systems
     - Proceedings, Association for Computing Machinery, 2023. doi:10.1145/3544548.3581386,
     publisher Copyright: © 2023 ACM.; 2023 CHI Conference on Human Factors in Computing Systems,
     CHI 2023 ; Conference date: 23-04-2023 Through 28-04-2023.
[25] N. Tintarev, J. Masthoff, A survey of explanations in recommender systems, in: Data Engineering
     Workshop, 2007 IEEE 23rd International Conference on, IEEE, 2007, pp. 801–810.
[26] N. Hariri, B. Mobasher, R. Burke, Adapting to user preference changes in interactive recommenda-
     tion, in: Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
[27] M. A. DeVito, Adaptive folk theorization as a path to algorithmic literacy on changing platforms,
     Proceedings of the ACM on Human-Computer Interaction 5 (2021) 1–38.
[28] Y. Choi, E. J. Kang, M. K. Lee, J. Kim, Creator-friendly algorithms: Behaviors, challenges, and
     design opportunities in algorithmic platforms, in: Proceedings of the 2023 CHI Conference on
     Human Factors in Computing Systems, 2023, pp. 1–22.
[29] J. Dastin, Amazon scraps secret ai recruiting tool that showed bias against women, in: Ethics of
     data and analytics, Auerbach Publications, 2022, pp. 296–299.
[30] C. Rajendra-Nicolucci, M. Sugarman, E. Zuckerman, The Three-Legged Stool: A Manifesto for a
     Smaller, Denser Internet, Technical Report, Initiative for Digital Public Infrastructure, 2023. URL:
     https://publicinfrastructure.org/2023/03/29/the-three-legged-stool/.