Causality based explanations in multi-stakeholder
                                 recommendations
             Willème Verdeaux, Clément Moreau                                                  Nicolas Labroche, Patrick Marcel
                       University of Tours, France                                                    University of Tours, France
                 firstname.lastname@etu.univ-tours.fr                                             firstname.lastname@univ-tours.fr
ABSTRACT                                                                               2 RELATED WORK
This paper introduces two novel contributions in the context                           2.1 Multi-stakeholder recommender systems
of multi-stakeholder recommender system. First, we present
                                                                                       As stated by [1] “multi-stakeholder recommendation has emerged
a simple and intuitive method for multi-objective and multi-
                                                                                       as a unifying framework for describing and understanding rec-
stakeholder recommendation that relies on preferences for each
                                                                                       ommendation settings where the end user is not the sole focus”.
stakeholder. The second contribution is the adaptation of Halpern
                                                                                       Practically, a multi-stakeholder recommender produces an ag-
and Pearl’s definition of causality to introduce definitions of
                                                                                       gregated recommendation that maximizes the utility for several ,
causes and explanations for multi-stakeholder recommender sys-
                                                                                       with possibly conflicting objectives, stakeholders at once. Burke
tems. Experiments conducted on real data study the tractability
                                                                                       et al. [9] describe the utility space to explore as a tuple where each
of such explanation approach in the context of RS.
                                                                                       tuple’s component is the utility of a stakeholder computed as the
                                                                                       sum of items utility for this stakeholder. Other works consider
                                                                                       more complex models with multiple objectives per stakeholder
                                                                                       [15].
1    INTRODUCTION                                                                         This improvement of the traditional recommendation problem
Recommender systems solve the problem of predicting the value                          allows to model different stakeholders’ types such as consumer,
of a user-item relationship to propose the most appropriate list of                    providers or system [2, 15] and to take into account new evalua-
items to each user based on personalization and contextualization                      tion metrics such as novelty or fairness among stakeholders (to
mechanisms [3, 19]. Traditional approaches like content-based,                         counter the popularity bias as in [9]). Several approaches have
KNN systems or Latent Factor Models can generate recommen-                             been proposed in the recent years [1–3] and [1] introduces a first
dations for which simple, limited explanations can be derived                          effort to structure this novel domain by proposing a classifica-
(e.g., "other customer also bought that item"). However, modern,                       tion of multi-stakeholder systems depending on: (1) the types of
multi-objective, multi-stakeholder recommender systems [18, 21]                        stakeholders involved, (2) if they are active or passive or (3) if the
make the generation of explanations more challenging.                                  recommendation is neutral or personalized for each stakeholder.
   In this paper, we introduce two novel contributions in this                            As recalled in [15], and similarly to business rules, multi-stake-
regard. First, we introduce a simple and intuitive method for                          holders approaches [9, 18] are most of the time a posteriori meth-
multi-objective and multi-stakeholder recommendation that re-                          ods applied on top of existing traditional recommender systems.
lies on preferences of each stakeholder. Following previous works                         For example, [18] proposes an approach that inputs a user ×
on multi-stakeholder recommenders [1, 9, 15], we express the                           item matrix as provided by a traditional recommender system,
multi-stakeholder recommendation problem with an additive                              and then finds an optimal binary assignment for items by providers
gain-based formulation. We compute a profile for each stake-                           to enforce constraints related to providers and enhance the distri-
holder, and define the gain brought by the recommendation us-                          butions of recommendations across retailers. Similarly to these
ing the classical NDCG measure, to confront the profile with the                       approaches, in our proposal, we also rely on a traditional recom-
ranks of the recommended items.                                                        mender system to compute a profile for each stakeholder.
   The second contribution is the adaptation of Halpern and                               Finally, in [15] the adaptation to multiple-objective multi-
Pearl’s definition of causality [12] and more precisely, its adapta-                   stakeholder is done as a post-processing step after a first ranking
tion in the context of databases as developed in [14], to introduce                    of items is already obtained for each consumer. The proposed ap-
definitions of causes and explanations for multi-stakeholder rec-                      proach aims to re-rank it such that the new ranking remains close
ommender systems. This so-defined framework allows to explain                          to the initial one and optimizes the commission perceived by the
both the recommendations given but also each individual stake-                         "system" stakeholder. Interestingly, the proposed approach com-
holder profile, in terms of the items rated by this stakeholder.                       putes a ranking of items instead of predicting scores, to produce
   The outline of the paper is as follows. The next section presents                   the recommendations. To combine all conflicting stakeholders
related work and Section 3 motivates our approach with an ex-                          objectives, [15] introduces a new learn to (re-)rank optimization
ample. We present our first contribution, the multi-stakeholder                        approach based on the kernel version of the Kendall tau cor-
recommender, in section 4. Section 5 introduces our second con-                        relation metric. A similar approach that learns to rank is also
tribution, namely explanation generation. Section 6 presents                           used in [21] to determine ratings that would be unknown to the
preliminary tests and Section 7 concludes by discussing future                         multi-stakeholder recommender.
work.                                                                                     In our proposal, we consider a more straightforward rank-
                                                                                       based approach that involves Normalized Discounted Cumula-
                                                                                       tive Gain to measure to which extent the ranking preferences
© 2020 Copyright for this paper by its author(s). Published in the Workshop Proceed-   of a stakeholder are respected. The solution space is explored
ings of the EDBT/ICDT 2020 Joint Conference (March 30-April 2, 2020, Copenhagen,       with a simple simulated annealing. Similarly to [21], we use a
Denmark) on CEUR-WS.org. Use permitted under Creative Commons License At-              recommender system to predict user item scores that are missing
tribution 4.0 International (CC BY 4.0)
in the data to construct a complete ranking over the set of items,
as our stakeholder profile.

2.2    Explanations
   Explaining recommendations. Explainable recommendations
refers to personalized recommendation algorithms that “not only
provide the user with recommendations, but also make the user
aware why such items are recommended” [20]. Gedikli et al.
[10] evaluate different explanation types and propose a set of
guidelines for designing and selecting suitable explanations for
recommender systems.
   Various forms of explanations have been explored. For in-
stance, in [7], the authors introduce the problem of meaningful
rating interpretation (MRI) in the context of recommendation
systems and as such provide two main heuristics for explaining             Figure 1: Overview of a user’s profile explanation
collaborative ratings. Explanations are here provided as a subset
of reviewers presented to the user and described by metadata                 We follow a similar approach in this paper, partitioning
attributes shared among all reviewers in the group. Groups are               the set of events in exogenous and endogenous events.
build under constraints to minimize a description error while              • Contingencies: a piece of information (tuple in [14]) t
maximizing the coverage of ratings. Similarly, meaningful differ-            is a cause for the observed outcome only if there is a
ences are provided by determining the most unbalanced subset                 hypothetical setting of the other endogenous variables
of reviewers in terms of positive / negative ratings and are meant           under which the addition/removal of t causes the observed
to highlight controversial items, i.e. items for which groups of             outcome to change. Therefore, in order to check that t is
users consistently disagree.                                                 a cause of an outcome based on a given dataset, one has
   Interestingly, it turns out that so far, to our knowledge, ex-            to find a data set (called contingency) to remove from (or
planations based on the notion of causality have not attracted               add to) the data set, such that t immediately affects the
attention in the recommender system community.                               outcome in the new state of the data set. In theory, in order
    Explanation and causality. Causality has been studied and                to compute the contingency one has to iterate over the
defined algorithmically by Judea Pearl, in a highly influential              subsets of the data set, meaning that checking causality is
work primarily using the concepts of counterfactuals and degree              NP-complete in general [8]. In the context of this paper, the
of responsibility [16].                                                      considered outcome is a set of recommended items and the
    At the core, causality is defined in terms of intervention: an           dataset is the set of ratings used by the recommendation
input is said to be a cause if we can affect the output by changing          algorithm.
just the value of that input, while keeping all others unchanged.          • Responsibility measures the degree of causality as a func-
Importantly, causality can only be established in a controlled               tion of the size of the smallest contingency set. In ap-
physical experiment, and cannot be derived solely from data,                 plications involving large datasets, it is critical to rank
while an explanation only requires that a change in the input                the candidate causes by their responsibility, because the
affects the output: the more it affects the output, the better the           outcome may have large lineages and large numbers of
explanation [16].                                                            candidate causes. In theory, in order to compute the re-
    Recently, many authors in the database community (e.g., [14,             sponsibility one has to iterate over all contingency sets
17]) adapted these notions to explain the results of database                meaning that computing responsibility in general is hard
queries. In this context, a major challenge in finding explanations          [5].
is the difficulty of defining and computing in real time interven-
tions for complex datasets and queries. Noticeably, in both [7]       3    MOTIVATING EXAMPLE
and [17] data cube operations are used to compute explanations,
hence benefiting from built-in database optimizations. We leave                             i1 i2   i3 i4 i5
as future work an in-depth study of optimization mechanisms                            u1 1 2 2 5 6
for computing explanations in our context of multi-stakeholder                         u 2 11        3        7
recommendations.                                                                       u3        2        5
    The approach we present in this paper, depicted in Figure 1,                 Table 1: Toy example of user purchases
is particularly inspired by that of Meliou et al. [14], who bor-
rowed from Pearl and Halpern’s theory of causality the following
important concepts:                                                      Figure 1 gives the intuition of our approach. Assume a set of
     • Partitioning of variables into exogenous and endogenous:       events relating users to items (e.g., purchases), expressing implicit
        exogenous variables define a context determined by ex-        user preferences. From this set, a profile σu1 can be inferred for
        ternal, unconcerned factors, deemed not to be possible        a given user u 1 , e.g., by predicting scores for unseen items. To
        causes, while endogenous variables are the ones judged        explain this profile to u 1 , one could look for events that are the
        to affect the outcome and are thus potential causes. In       most responsible of it. To do so, we compute interventions on the
        their work, Meliou et al. propose to partition a database     set of events, by removing some events from it. Each intervention
        instance into exogenous and endogenous tuples [14], and       leads to another profile σu′ 1 for u 1 . If the two profile σu1 and σu′ 1
        look for explanations among the endogenous tuples only.       are different, this means that the removed events have, to a certain
degree, a responsibility in σu1 . As there can be many such events,                     event and measuring the responsibility as a function of the num-
instead of presenting them as explanations to u 1 , we summarize                        ber of events to remove (called contingency set) before the non
them using pattern extraction, and present the extracted patterns.                      counterfactual cause becomes a counterfactual cause. For in-
We next illustrate this approach with a detailed example.                               stance, removing both events in entry (u 1, i 3 ) changes σu1 , mean-
   Consider Table 1, that details consumption events of three                           ing that both events in that entry have a responsibility in the
users. An event corresponds to a quantity purchased, and a user                         profile, but with a lesser degree than that of (u 2, i 5 ) (which does
can purchase different quantities of the same item at different                         not need another event to be removed to be counterfactual).
moments. For instance, user u 1 purchased initially 5 items i 3                            Explaining a profile σui means presenting to ui the events
and then another 6 items i 3 subsequently. Assume we have two                           most responsible for their profile. Consider for instance user u 2 .
categories of stakeholder: that of users ui and that of a provider                      The events most responsible for u 2 ’s profile are the counterfactual
p. Each purchased item i j is described by two properties, price                        causes (u 2, i 1 ), (u 2, i 5 ) and (u 3, i 2 ).
and popularity (see Table 2.                                                               However, since the number of causes can be large, presenting
                                                                                        the events themselves may not be user-friendly. A better way is
                        i1    i2     i3     i4    i5                                    to summarize them using patterns extracted from the properties
               price med med med high high                                              of the items concerned by the events. Indeed, consistently with
          popularity high med high high high                                            earlier literature on deriving explanations using causality theory
         Table 2: Item description for the toy example                                  [17], we propose to present explanations under the form of pred-
                                                                                        icates P = v where P is property and v is a value. In our example,
                                                                                        suppose we use frequent itemset mining [11] to extract proper-
                                                                                        ties appearing more than two third of the time in items i 1, i 2, i 5 .
     Stakeholder preferences can be modeled following a strategy                        Two frequent itemsets can be extracted from these counterfactual
particular to each stakeholder category, and computed using the                         events’ properties: (i) price=med and (ii) popularity=high.
set of events. For instance, users’ preferences should be such that
most purchased items are preferred. The provider’s preferences
could be that the same number of each item is purchased. For
                                                                                        4  MULTI-STAKEHOLDER
instance, according to Table 1, the preferences of user u 1 are that                       RECOMMENDATION
i 3 is preferred over i 2 that in turns is preferred over i 1 , that in turn            4.1 Stakeholders and their profiles
is preferred over both i 4 and i 5 . The preferences of stakeholder p
                                                                                        We formalize the problem as follows. We assume a set O of n
is that i 4 is preferred (since it sold less) over i 2 , in turns preferred
                                                                                        objects, where each object is described by a list of properties. Let
over i 5 , in turns preferred over i 1 and then over i 3 .
                                                                                        F be a set of property names and V be a set of property values,
     Assume now that we have a way of constructing a user profile
                                                                                        function properties : O × F → V denotes the values of a given
from each user’s preferences so that the profile corresponds to a
                                                                                        property for a given object. We assume a set S of stakeholders
total order of the items, consistent with that user’s preferences.
                                                                                        and a set R of stakeholder roles, like : "user", "provider", "system
For instance, a simple baseline approach, where prediction is the
                                                                                        owner", etc. The role of a stakeholder is given by a surjective
sum of consumptions by item, would allow to deduce that the
                                                                                        function role : S → R. Finally, we have a set E of events, where
profile σu1 of u 1 ranks all items as follows: σu1 = [i 3, i 2, i 1, i 5, i 4 ].
                                                                                        an event is a 4-tuple ⟨s, o, r, t⟩ with s a stakeholder in S, o an
     Now, assume that we want to issue a recommendation achiev-
                                                                                        object in O, a numerical value r ∈ R, representing e.g., a rating
ing a compromise between all stakeholders. A multi-stakeholder
                                                                                        or an amount, and t a timestamp.
multi-objective recommendation algorithm can be used to com-
pute this recommendation from the profiles. This recommenda-
                                                                                           Preferences. Each stakeholder s is associated with their prefer-
tion can take the form of a total ranking of the items, proposed
                                                                                        ences, expressed as a weak order (a partial order that is nega-
to all stakeholders, noted σ∗ . In our example, suppose this rec-
                                                                                        tively transitive) over the set O, noted ⪯s . This preference relation
ommendation is σ∗ = [i 4, i 3, i 1, i 2, i 5 ].
                                                                                        induces a partition of the set O.
     As stakeholders’ profiles and recommendation σ∗ have the
same form, both can be explained in the same way. We illus-
trate here how profiles can be explained. Suppose we would like                           Example 4.1. Consider the example of the previous section.
to explain σu1 , considered as the outcome of the process that                          The preferences of u 1 can be deduced from Table 1. They corre-
computes the profile. We then intervene on the set of events by                         spond to the following weak order over the set of consumption
removing those of the events responsible for the current profile.                       events i 3 ⪯u1 i 2 ⪯u1 i 1 ⪯u1 {i 4, i 5 }.
For instance, we can see that removing any of the events in en-
tries (u 1, i 1 ) (u 1, i 2 ) or (u 1, i 3 ) would leave σu1 unchanged, while              Profile. The profile of a stakeholder s ∈ S consists of a total
removing the event in entry (u 2, i 5 ) would alter σu1 . Therefore,                    order σs over the objects of O that is consistent with the pref-
the event in entry (u 2, i 5 ) can be said to be a direct cause, or                     erence relation ⪯s , i.e., σs is obtained by composing ⪯s with
counterfactual cause, for σu1 . Note that both purchases made                           a total order ≺ using priority composition [6], where the total
by u 1 and not made by u 1 can be counterfactual causes for u 1 ’s                      order ≺ is expressed in a quantitative way through function
profile. Indeed, event (u 3, i 4 ) is not a counterfactual cause.                       utility : S × O → R.
     Note also that events pertaining to preferences can also be                           These preferences can be noted by a vector ⟨utility(s, o 1 ), . . . ,
counterfactual causes. Consider indeed user u 2 , whose profile is                      utility(s, on )⟩, where the oi ’s are the objects of O, and utility’s
σu2 = [i 1, i 5, i 3, i 2, i 4 ]. It can be seen that events (u 2, i 1 ), (u 2, i 5 )   are pairwise different, or alternatively by a permutation σs of the
are counterfactual causes for σu2 , as is (u 3, i 2 ).                                  objects in O. We note P the set of permutations of size n. Let a
     Understanding the responsibility of non counterfactual causes                      permutation σ , we note σs (o) = i the rank i given by s to object
in the profile can be done by trying to remove more than one                            o, and σs [i] = o the object ranked i by s.
   Example 4.2. Assume that it is predicted, based on u 1 ’s prefer-          Actual cause. Given a stakeholder s with preference relation
ences, that u 1 is more likely to prefer i 5 to i 4 . The profile of u 1   ⪯s and profile σs , and a set of events E, an event e = ⟨s, o, r, t⟩ of
therefore consists of the total order σu1 = [i 3, i 2, i 1, i 5, i 4 ].    E is an actual cause in E for σs if there exists a set Γ ⊆ E called
                                                                           contingency for e, such that e is a counterfactual cause for σs in
4.2      Recommendation objective                                          E − Γ. An event is an actual cause if one can find a contingency
Our recommendation objective is to find the optimal permuta-               under which it becomes a counterfactual cause.
tion σ∗ in the sense of a compromise between a subset S of all                Responsibility. If an event e is a cause, the responsibility of e
stakeholders of S. Given a subset S ⊆ S the recommendation σ∗                                 1
                                                                           for σs is ρ e = 1+min
for S is given by function R : 2S → P:                                                           |Γ | where Γ ranges over all the contingency
                                    Õ                                      sets for e.
               R(S) = σ∗ = argmax       α s Q(σ, s)        (1)                The responsibility is a function of the minimal number of
                                 σ ∈P    s ∈S                              events to remove from E before the event becomes counterfactual.
where α s is the weight of the stakeholder s such that s ∈S α s = 1           Causality problem. Compute the set C ⊆ E of actual causes
                                                       Í
                     +
and Q : P × S → R is a quality function measuring how well a               for σs .
permutation σ is for stakeholder s.
   In this work, we have chosen the classical NDCG to measure                  Responsibility problem. For each actual cause e ∈ C, compute
the quality of the solution. In other words, the quality is measured       its responsibility ρ e .
in terms of the gain brought by σ∗ for the stakeholder s with                 Explanation. Given a set C of causes for σs , consider the set
respect to their preference. Equation 1 becomes:                           OC = {o ∈ O |∃e = ⟨s, o, r, t⟩ ∈ C}. An explanation for σs is a
                                  Õ DCG(s, σ )                             set of frequent patterns extracted from {⟨properties(o, f 1 ), . . . ,
                  σ∗ = argmax         αs                          (2)
                          σ ∈P           IDCG(s)                           properties(o, f |F | )⟩|o ∈ OC , fi ∈ F }.
                                 s ∈S
where:
                                n
                                                                           5.2      Explanations for σ∗
                                Õ utility(s, σ [r ])                       In this case, we look for what causes the differences between σ∗
                 DCG(s, σ ) =                                       (3)
                                r =1
                                        log2 (1 + r )                      and σs , in the profiles (and then preferences) of the stakeholders
                                                                           other than s. Intuitively, we want to explain to stakeholder s why
                       IDCG(s) = DCG(s, σs )                        (4)
                                                                           is the compromise σ∗ not more favorable to them. Precisely, an
   This formulation makes the recommendation problem an in-                event e is a counterfactual cause for σ∗ if we remove e from E then
stance of the well-studied rank aggregation problem, aiming at             σs does not change but there are some change in σs ′ for some
aggregating the total order preferences of multiple agents [13].           stakeholder s , s ′ and σ∗ improves the score Q for s. Formally:
Particularly, our problem is similar to the Kemeny optimal ag-
                                                                                 • E |= σ∗
gregation problem that determines the best consensus ranking
                                                                                 • for all stakeholders (including s) si ∈ S, E |= σsi
minimizing the number of permutations as a Kendall-tau distance
                                                                                 • E − {e} |= σs
with the rankings of each agent. This problem is known to be
                                                                                 • E − {e} ̸ |= σ∗ but E − {e} |= σ∗′ with σ∗′ , σ∗ ,
NP-Hard even when there are only few rankings to aggregate.
                                                                                 • Q E−e (σ∗′ , s) > Q E (σ∗, s)
Because of this complexity, stochastic or evolutionary algorithms
are often proposed as an efficient way to explore the space of pos-          Actual cause, responsibility and explanations are defined ac-
sible rankings and then solve what is known as Mallows problem             cordingly.
[4]. In this work, we follow a similar idea and use simulated an-
nealing to find the consensual ranking based on all stakeholders           6     WORK COUNCIL USE CASE
rankings.                                                                  The recommender system and the explanation framework are
                                                                           implemented in Python 3.6. User profiles are computed from
5     EXPLAINING PROFILES AND                                              raw data using the Surprise Scikit1 . Simulated annealing for
      RECOMMENDATIONS                                                      exploring the space of permutations is an in-house development.
Our framework is inspired by that of Meliou et al. for explaining          We experimented on a real use case based, with purchase data
query answers in relational databases [14].                                from Kalidea-Up, a company that provides a service platform to
                                                                           works councils. We considered two stakeholder roles: "workers"
                                                                           who can purchase discounted services from works councils, and
5.1      Profile explanations
                                                                           "system" to represent Kalidea-Up system. Available purchase
Inspired by Meliou et al [14], it makes sense to partition the set of      data correspond to the 2014-2018 period, gathering over 5, 840
all events E into endogenous and exogenous events. However, in             workers for 540 services and 168, 965 transactions, where each
this preliminary work, and without loss of generality, we consider         transaction corresponds to an event as defined in Section 4.
all events as endogenous, and in what follows, only the set of all
events E is used in the definitions. Let E be a set of (endogenous)        6.1      Testing the recommender system
events and a stakeholder s with preference ⪯s , we note E |= σs
if the profile σs is obtained using E and ⪯s .                             These purchases are used to build profiles for the workers, using
                                                                           Surprise’s off the shelve k-NN prediction algorithm, and baseline
    Counterfactual cause. Given a stakeholder s with preference            predictions to ensure a total ranking of the services. The profile
relation ⪯s and profile σs , and a set of events E, an event e =           for the system is a ranking computed based on the subsidies
⟨s, o, r, t⟩ of E is a counterfactual cause in E for σs if E |= σs         collected from the work councils for the services. As such, system
and E − {e} |= σs′ with σs′ , σs . In other words, removing event
e from E causes the profile of s to change.                                1 http://surpriselib.com/
                                                                                                        Uniform                 User                   System                  Global

                                                                                                                           2
preferences may not coincide with those of the users that may              α                              NDCG user                         NDCG
                                                                                                                                             0,79 System
                                                                                                                                                       0,81 NDCG0,79
                                                                                                                                                                 Global
prefer discounts on other types of services.                                                                               5
                                                                               Uniform                    0.79 ± 0.05                         0.85
                                                                                                                                               0,85 ± 0.03 0,86                     0.82 ±0,890.01
   Simulated annealing. The objective function optimised by the                80% system                 0.63 ± 100.07                       0.89 ± 0.04                           0.76 ± 0.04
                                                                                                                                               0,82        0,82                            0,8
simulated annealing is provided in Equation 2. In order to crawl               80% user                   0.95 ± 0.01                         0.56 ± 0.04                           0.75 ± 0.02
the space of possible permutations to determine the best σ∗ , each       Table 3: Mean NDCG over 3 runs for the setting with 2
new candidate permutation is generated by a random mutation              stakeholders:Uniform
                                                                                       1 recipient Number
                                                                                                       user ofand    the system. “Uniform”,
                                                                                                                stakeholders                                                                              80% system

that swaps 2 elements of the previous permutation at random.             “80% system /NDCG
                                                                                       user”user refer to the
                                                                                                               2
                                                                                                            0,79
                                                                                                                settings 0,81 5
                                                                                                                             for       10
                                                                                                                                 parameters
                                                                                                                                     0,79   α                                                             NDCG user

A temperature parameter allows to retain a solution that is not                                         NDCG System
                                                                                                        NDCG Global
                                                                                                                                                0,85
                                                                                                                                                0,82
                                                                                                                                                                      0,86
                                                                                                                                                                      0,82
                                                                                                                                                                                          0,89
                                                                                                                                                                                          0,80
                                                                                                                                                                                                          NDCG System
                                                                                                                                                                                                          NDCG Global
optimal after a mutation. In order to force the convergence, a
geometric progression with common ratio 0.996 is applied to the                                                          User                System             Global

temperature. The maximum number of iterations for simulated                                               1

annealing was set to 500 in our experiment.
                                                                                                        0,75
   We consider the following settings in our experiments:


                                                                                            Mean nDCG
                                                                                                         0,5
      • as in [18], we first observe to which extent the addition of
        new stakeholders decreases the quality of recommenda-                                           0,25

        tion for each stakeholder: we vary the number of stake-
                                                                                                          0
        holders from 2 : 1 recipient and the system as represented                                Number of
                                                                                                                                2                      5                       10

        by Kalidea-Up, to 10 : 9 recipients and the system ;                                      stakeholders


      • for each of the previous settings, we vary the distribution
        of weights α s of Equation 2, for each stakeholder s ∈ S.        Figure 2: Average “Mean NDCG scores” for all users, and
        Settings for α s are: either a uniform distribution, meaning     reported for the system and globally for 3 distinct settings
        that all stakeholders have the same weight in the final de-      with 2, 5 and 10 stakeholders.
        cision, or a 80% weight given to all workers, meaning that
        the system only accounts for 20%, or finally the symmetric       to be simple, fast and yet quite accurate. Our experiments aim to
        situation where 80% of the weight is set to the system ;         answer 2 main questions:
  For these settings, we expect the following results:                        (1) is the approach presented in Section 5 feasible in the con-
      • if our approach performs well, it should be able to produce                  text of multi-stakeholder recommendation algorithms? In
        a good quality consensus aggregation of all stakeholders’                    this case, there are possibly 2E contingency sets to evalu-
        preferences. In this case, the overall quality, as measured                  ate to determine the responsibility of each event of E. For
        by our objective function should be as close to 1 as possible.               this reason, we propose to first limit our experiments to
        This would reflect that our choice of a simple simulated an-                 small sets E ranging from 8 to 14 events, where all workers
        nealing algorithm is a fair enough solution in our context                   (resp., services) are pairwise different, resulting in a very
        of multi-objective optimization ;                                            sparse workers services matrix. Computation times are
      • we should observe that the decrease for each stakeholder                     reported in Figure 3 ;
        is moderate and comparable to the one observed in other               (2) how is the responsibility distributed and can this measure
        works such as [18], indicating that our approach manages                     be used in the context of recommender Tableau    system 1
                                                                                                                                                  to discrim-
                                                                          Nb events                     8              9           10          11              12                                                13
                                                                          Execution inate
                                                                                     time    between      events   to explain   a recommendation?         Are
        to accommodate all stakeholders simultaneously and with           (s)
                                                                                               0,87504697     1,84435916    3,8634882  8,24795723      17,6787393                                         37,6473985

        approximately the same effectiveness for each of them ;                      all(s)events
                                                                          Training time            equally discriminant
                                                                                               0,22704959     0,68398285       regarding
                                                                                                                           1,58830833      this measure?
                                                                                                                                       3,40311575      7,07547235                                         14,4389184

                                                                                     This paper answers the question by computing for each
    Results. As can be seen in Table 3, our approach obtains sat-                    event the minimum size of its contingency set min |Γ| as
isfactory optimization results in terms of NDCG with approxi-                        defined in Section 5.1. Then for all events, and depending
mately 0.82 as a global optimization score when weights α are                        on the size of |E|, a distribution for min |Γ| is computed.
uniform. As expected, our approach proves to be efficient when                       Results are presented in Figure 4.
biasing the convergence towards one of the stakeholder or the
other by adjusting the weights accordingly. Interestingly, when                                                Execution time (s)                               Training time (s)


favoring one stakeholder, the global NDCG decreases reflecting           180


that both stakeholders had contradictory profiles.
    To provide the reader with a range for acceptable NDCG values,       135

it can be noticed that [18] have achieved very high NDCG score
(above 0.95) on a tweaked MovieLens dataset where providers
                                                                          90
have been set manually for each movie. We have reproduced this
data set and on the aforementioned 2 stakeholders settings and
                                                                          45
by tuning system preferences based on fairness we have achieved
0.96 NDCG which proves that simulated annealing paired with
NDCG are a fair choice for the problem of multi-stakeholder                0
                                                                               8        9                         10                   11                 12              13             14          15

optimization.                                                                                                                          Number of events


6.2     Testing explanation generation                                   Figure 3: Executions and training times for different size
For those tests, we used Surprise’s off the shelf Baseline only          of the events set |E|
predictor to compute profiles. The advantage of this predictor is
                             Histogram of event7                                   Histogram of event14
                                       (a)                                                  (b)
                                                                                                                     (iv) The modification of the responsibility measure to include


                                                                         200
                50


                                                                                                                     the assessment of the perturbations on the ranking brought by
                40


                                                                                                                     causes.


                                                                         150
    Frequency


                                                             Frequency
                30


                                                                                                                     REFERENCES


                                                                         100
                20


                                                                                                                      [1] H. Abdollahpouri, G. Adomavicius, R. Burke, I. Guy, D. Jannach, T. Kamishima,


                                                                         50
                                                                                                                          J. Krasnodebski, and L. Pizzato. Beyond personalization: Research directions
                10


                                                                                                                          in multistakeholder recommendation. CoRR, abs/1905.01986, 2019.
                                                                                                                      [2] H. Abdollahpouri, R. Burke, and B. Mobasher. Recommender systems as
                0


                                                                         0
                     0   1     2   3         4   5   6   7                     0   2   4   6      8   10   12   14        multistakeholder environments. In UMAP, pages 347–348, 2017.
                                   min(Γ)                                                  min(Γ)                     [3] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender
                                    event7                                                 event14
                                                                                                                          systems: A survey of the state-of-the-art and possible extensions. IEEE Trans.
                                                                                                                          Knowl. Data Eng., 17(6):734–749, 2005.
Figure 4: Distribution of min |Γ| for events set |E| = 8 (left)                                                       [4] J. A. Aledo, J. A. Gámez, and D. Molina. Tackling the rank aggregation problem
                                                                                                                          with evolutionary algorithms. Applied Mathematics and Computation, 222:632–
and |E| = 14 (right)                                                                                                      644, 2013.
                                                                                                                      [5] H. Chockler and J. Y. Halpern. Responsibility and blame: A structural-model
                                                                                                                          approach. J. Artif. Intell. Res., 22:93–115, 2004.
                                                                                                                      [6] J. Chomicki. Preference formulas in relational queries. ACM Trans. Database
   The machine configuration for the tests is a Core i7-6500U                                                             Syst., 28(4):427–466, 2003.
CPU @ 2.50GHz, with 8 Gb of RAM, running Ubuntu 18.04.1                                                               [7] M. Das, S. Amer-Yahia, G. Das, and C. Yu. MRI: meaningful interpretations of
LTS. As can be seen from Figure 3, the causal explanation ap-                                                             collaborative ratings. PVLDB, 4(11):1063–1074, 2011.
                                                                                                                      [8] T. Eiter and T. Lukasiewicz. Complexity results for structure-based causality.
proach expectedly has an exponential time complexity behavior,                                                            In Proceedings of the Seventeenth International Joint Conference on Artificial
which opens new research direction to avoid the exploration                                                               Intelligence, IJCAI 2001, Seattle, Washington, USA, August 4-10, 2001, pages
of all contingency sets and make this approach tractable. Inter-                                                          35–42, 2001.
                                                                                                                      [9] R. D. B. et al. Towards multi-stakeholder utility evaluation of recommender
estingly, when the size of |E| increases, the training time of the                                                        systems. In UMAP, 2016.
multi-stakeholder recommender system tends to be negligible                                                          [10] F. Gedikli, D. Jannach, and M. Ge. How should I explain? A comparison of
                                                                                                                          different explanation types for recommender systems. Int. J. Hum.-Comput.
when compared to the explanations of the recommendation. This                                                             Stud., 72(4):367–382, 2014.
clearly indicates that causal explanation is even more challenging                                                   [11] A. Giacometti, D. H. Li, P. Marcel, and A. Soulet. 20 years of pattern mining: a
in terms of optimization than our quite resource intensive RS.                                                            bibliometric survey. SIGKDD Explorations, 15(1):41–50, 2013.
                                                                                                                     [12] J. Y. Halpern and J. Pearl. Causes and explanations: A structural-model ap-
   Then, considering Figure 4 it can be seen that most events in                                                          proach: Part 1: Causes. In UAI, pages 194–202, 2001.
both scenarios have a majority of (close to) counterfactual events                                                   [13] S. Lin. Rank aggregation methods. WIREs Computational Statistics, 2(5):555–
(|Γ| ≈ 0) that directly impacts the recommendation process and                                                            570, 2010.
                                                                                                                     [14] A. Meliou, W. Gatterbauer, K. F. Moore, and D. Suciu. The complexity of
a minority of sensibly less “responsible” events as shown by the                                                          causality and responsibility for query answers and non-answers. PVLDB,
2-modes distribution. This was expected as the events test sets                                                           4(1):34–45, 2010.
                                                                                                                     [15] P. Nguyen, J. Dines, and J. Krasnodebski. A multi-objective learning to re-rank
have been built in a way making the workers × services matrix                                                             approach to optimize online marketplaces for multiple stakeholders. CoRR,
very sparse.                                                                                                              abs/1708.00651, 2017.
                                                                                                                     [16] J. Pearl. Causal inference in statistics: An overview. Statist. Surv., 3:96–146,
                                                                                                                          2009.
7               DISCUSSION                                                                                           [17] S. Roy and D. Suciu. A formal approach to finding explanations for database
This paper introduces an on-going work on explaining recom-                                                               queries. In International Conference on Management of Data, SIGMOD 2014,
                                                                                                                          Snowbird, UT, USA, June 22-27, 2014, pages 1579–1590, 2014.
mendations in a multi-stakeholder context. Our immediate future                                                      [18] Ö. Sürer, R. Burke, and E. C. Malthouse. Multistakeholder recommendation
work includes the optimization of the computation of responsi-                                                            with provider constraints. In RecSYS, pages 54–62, 2018.
                                                                                                                     [19] R. W. White. Interactions with Search Systems. Cambridge University Press,
bilities and the evaluation of the approach on the explanation                                                            2016.
of σ∗ . Our short term goal is mainly a complete validation of                                                       [20] Y. Zhang and X. Chen. Explainable recommendation: A survey and new
the current framework, including a user study to validate the                                                             perspectives. In Under revision - Available on Arxiv.org, pages 1–88, 2019.
                                                                                                                     [21] Y. Zheng and A. Pu. Utility-based multi-stakeholder recommendations by
usefulness of explanations to the different stakeholders. Our long                                                        multi-objective optimization. In WI, pages 128–135, 2018.
term goal is to define a thorough framework for explanation
generation in a multi-stakeholder recommendation context. Such
a framework should cover: (i) The clear distinction of endoge-
nous and exogenous causes. In Halpern and Pearl’s framework,
exogenous variables define a context determined by external,
unconcerned factors, deemed not to be possible causes, while
endogenous variables are the ones judged to affect the outcome
and are thus potential causes. In the present work, we considered
all the events as endogenous causes. (ii) The modeling of different
types of causes. In the present work, we addressed the problem of
modeling why so cause, i.e., explaining why the recommendation
is what it is. Other forms of explanations can be addressed, like
for instance why not so, i.e., why recommendation is not like this.
(iii) The distinction between explanations of profiles and recom-
mendations. In the present work, all explanations are generated
through interventions over the set of events. Recommendation
explanations may be more intuitively understood if we allow in-
terventions directly on the stakeholders’ profile (e.g., permuting
the ranks of two objects). A challenge is then to investigate how
to map such interventions to interventions over the set of objects.