=Paper=
{{Paper
|id=Vol-1893/Paper2
|storemode=property
|title=Explanation Systems for Influence Maximization Algorithms
|pdfUrl=https://ceur-ws.org/Vol-1893/Paper2.pdf
|volume=Vol-1893
|authors=Amulya	Yadav,Aida Rahmattalabi,Ece Kamar,Phebe Vayanos,Milind	Tambe,Venil Loyd Noronha
|dblpUrl=https://dblp.org/rec/conf/ijcai/YadavRKVTN17
}}
==Explanation Systems for Influence Maximization Algorithms==
<pdf width="1500px">https://ceur-ws.org/Vol-1893/Paper2.pdf</pdf>
<pre>
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


                        Explanation Systems for Influence Maximization
                                         Algorithms

                Amulya Yadav1 , Aida Rahmattalabi1 , Ece Kamar2 , Phebe Vayanos1 , Milind Tambe1 ,
                                             Venil Loyd Noronha1
               1
                   Center for Artificial Intelligence in Society, University of Southern California, LA, CA, 90089
                                              2
                                                Microsoft Research, Redmond, WA 98052
                               1
                                 {amulyaya, rahmatta, phebe.vayanos, tambe, vnoronha}@usc.edu
                                                     2
                                                       {eckamar}@microsoft.com


                         Abstract. The field of influence maximization (IM) has made rapid advances,
                         resulting in many sophisticated algorithms for identifying “influential” members
                         in social networks. However, in order to engender trust in IM algorithms, the
                         rationale behind their choice of “influential” nodes needs to be explained to its
                         users. This is a challenging open problem that needs to be solved before these al-
                         gorithms can be deployed on a large scale. This paper attempts to tackle this open
                         problem via four major contributions: (i) we propose a general paradigm for de-
                         signing explanation systems for IM algorithms by exploiting the tradeoff between
                         explanation accuracy and interpretability; our paradigm treats IM algorithms as
                         black boxes, and is flexible enough to be used with any algorithm; (ii) we utilize
                         this paradigm to build XplainIM, a suite of explanation systems; (iii) we illus-
                         trate the usability of XplainIM by explaining solutions of HEALER (a recent IM
                         algorithm) among ∼200 human subjects on Amazon Mechanical Turk (AMT);
                         and (iv) we provide extensive evaluation of our AMT results, which shows the
                         effectiveness of XplainIM.


               1      Introduction

               The field of influence maximization (IM) deals with finding a set of “influential” nodes
               in a social network, which can optimally spread influence throughout the network. Sig-
               nificant progress in this field over the last few years has led to a variety of sophisticated
               algorithms [1] for finding influential nodes. Recently, [10] evidenced the real-world
               usability of IM algorithms by deploying three such algorithms in the real world.
                   As with any technology, evidence of real-world usability of IM algorithms brings
               to the fore other important questions that need to be tackled before their widespread
               adoption. One of the most important open questions for IM algorithms (which are used
               as decision support tools by domain experts such as viral marketers, social workers,
               etc.) is how to engender trust by explaining their recommendations to domain experts
               [7]. This is important because explaining the rationale behind these algorithms decisions
               helps the domain experts decision making process: they can then choose to either use the
                    Copyright c 2017 for the individual papers by the papers’ authors. Copying permitted for
                    private and academic purposes. This volume is published and copyrighted by its editors.


                                                                8
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               algorithm’s recommendation or discard it. Indeed, [9] noted the challenge of explaining
               the solutions of HEALER (an IM algorithm) to homeless shelter officials, who wanted
               explanations for why HEALER’s solutions were better than their preferred solutions.
               This raises the following important question: How to explain solutions generated by IM
               algorithms to its intended users? To the best of our knowledge, this problem has not
               received any attention in the literature so far.
                   Designing such an explanation system for IM algorithms is a challenging task for
               two reasons. First, instead of building different explanation systems for different IM
               algorithms, we want to build a single explanation system which is generalizable, i.e.,
               it should be flexible enough to explain the solutions of every possible IM algorithm.
               This makes our task challenging because we can no longer rely on “explaining the
               algorithm to explain the solution” (since there is no fixed algorithm). Second, most
               human end-users have cognitive limitations and therefore, in order to avoid overloading
               them with information, we need to ensure that our system’s explanations are relatively
               interpretable. However, this involves trading off explanation accuracy (i.e., correctness
               of explanation) with its interpretability (as we explain later).
                   This paper attempts to address this open problem via four major contributions. First,
               we propose a machine learning based paradigm for designing generalizable explanation
               systems for IM algorithms. Our paradigm achieves generalizability by using an algo-
               rithm independent classification dataset to build a machine learning classifier, which
               generates natural language explanations (in terms of this dataset’s features) for IM so-
               lutions. In order to aid interpretability of the generated explanations, our classification
               dataset does not contain features for un-interpretable complicated terms (e.g., marginal
               gain). This transforms the problem of explaining IM algorithm solutions into a problem
               of explaining the predictions made by that machine learning classifier (as we explain
               later). Further, our paradigm exploits a fundamental tradeoff between explanation accu-
               racy and interpretability (i.e., more accurate explanations are less interpretable, and vice
               versa) to generate a Pareto frontier of machine learning classifiers, which is then used
               to generate natural language explanations of varying accuracy and interpretability. Sec-
               ond, we utilize our paradigm to build XplainIM, a suite of explanation systems. Third,
               we illustrate the usability of XplainIM by explaining the solutions of HEALER (a recent
               IM algorithm) to ∼200 human subjects on Amazon Mechanical Turk (AMT). Fourth,
               we provide extensive evaluation of our AMT results, which shows the effectivenes of
               XplainIM.
                    Motivating IM Algorithm Our proposed explanation system is flexible enough to
               explain the solutions of any IM algorithm. However, we motivate our discussion in
               the rest of the paper by focusing on HEALER [9], a recent IM algorithm. We choose
               HEALER primarily because its paper [9] was the first to pose the question of explain-
               ing the solutions of an IM algorithm. HEALER was designed to select multiple sets
               of “influential” nodes sequentially from a homeless youth social network, who could
               optimally spread awareness about HIV in their social network. HEALER solves this
               problem by modeling it as a Partially Observable Markov Decision Process (POMDP)
               [5] and uses hierarchical ensembling techniques to scale up to real-world network sizes
               [9].


                                                           9
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


                   Thus, HEALER is a specialized algorithm for solving a more general problem than
               standard influence maximization (defined below). However, in order to ensure the gen-
               eralizability of our explanation system to other IM algorithms, we ignore the POMDP
               style sequential planning aspects of HEALER, and use it as an algorithm for solving the
               standard single-shot influence maximization problem: Given a social network graph G,
               select a set of K “influential” nodes which optimally spread influence in the network
               after T time steps.


               2   Related Work

               The problem of explaining solutions of IM algorithms was first posed by [9] in the con-
               text of HEALER, but their main focus was on justifying the need for such explanations,
               as opposed to providing any practical solutions to this problem. While explanation sys-
               tems for POMDPs have been proposed [8], they cannot be applied to explain solutions
               of every IM algorithm. To the best of our knowledge, no other previous work has looked
               at this problem so far.


               3   Machine Learning based Paradigm for Explanation Systems

               We now define an explanation system formally. Given a social network graph G as
               input, an IM algorithm outputs an optimal set of K nodes OG . Moreover, the end-user
               of the IM algorithm has a preferred set of K nodes UG , which he/she wrongly thinks is
               the optimal solution. Then, an explanation system is defined as follows:

               Definition 1. An explanation system for IM algorithms is a function E : G × OG ×
               UG → S which takes a graph G, an IM algorithm’s solution on G (i.e., OG ) , and an
               end-user’s preferred solution on G (i.e., UG ) as input, and produces as output a natural
               language string S, which explains why OG is a better solution than UG on graph G.

                   We believe that Definition 1 accounts for most practical use cases of explanation
               systems. For example, [9] noted that homeless shelter officials (i.e., end-users) wanted
               explanations for why HEALER’s solutions were better than their preferred solutions.
               Next, we explain our novel paradigm for designing explanation systems (for IM algo-
               rithms) in two stages.
                   First, we transform our original problem of explaining IM solutions into an alternate
               problem: “how to explain predictions made by ML classifiers?” Second, having made
               this transformation, we provide novel solutions for explaining predictions made by ML
               classifiers.
                   Step 1: Transformation Consider the following two problems for any given IM
               algorithm ∆:

               Problem 1. Given as input an IM algorithm ∆, output an explanation system E ∆ which
                                             ∆
               explains why ∆’s solutions (OG  ) are better than UG (for all possible G and UG ).


                                                         10
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               Problem 2. Given as input (i) an IM algorithm ∆, and (ii) a binary classifier M , which
                                                                                              ∆
               perfectly (i.e., without any misclassifications) differentiates ∆’s solutions OG from all
               other possible solutions UG (for all possible UG and G), output an explanation system
                                                                        ∆
               E M which explains how classifier M differentiates OG       from UG .

                   We claim that solving Problem 2 is sufficient to solve Problem 1, as formally stated
               below (proof in appendix1 ):

               Theorem 1. Problem 2 solution =⇒ Problem 1 solution

                   As a result, we transform the problem of explaining IM solutions (Problem 1) into
               the alternate problem: “how to explain predictions made by ML classifiers?” (Problem
               2). This completes the transformation step of our approach. Next, we provide a novel
               solution method for solving Problem 2.
                   Step 2: Solution Method First, we specify the input to Problem 2: an ML classifier
               M which can differentiate between OG and UG . Second, we need to use this classifier
               M to generate output for Problem 2: the process of explaining the predictions made by
               classifier M needs to be specified. We focus on the input specification in this section
               (output generation is explained in Section 5).
                   To train classifier M , each possible IM solution (i.e., set of K nodes) is mapped to
               a set of network-centric feature values (e.g., degree, mutual connections, etc.), and is
               labeled as either being “optimal” (if it is the IM algorithm’s solution) or “sub-optimal”
               (for any other solution). These network-centric features generalize across all IM algo-
               rithms, as they only depend on the network based characteristics of any possible IM
               solution. This creates a labeled dataset (details in Section 6) which is then used to train
               the ML classifier M which differentiates an optimal solution from a sub-optimal solu-
               tion.
                   We use decision trees [6] as our ML classifier of choice. The key advantage of
               decision trees over other classifiers is their interpretability, as their predictions can be
               explained to end-users as a conjunction of decision rules, where each rule involves a
               single feature in the dataset. In many applications, including our own, this interpretabil-
               ity is preferred over other methods that may have higher accuracy but are relatively
               un-interpretable [9].
                   Unfortunately, while decision trees are more interpretable than other ML classifiers
               (e.g., neural nets), the explanations for their predictions (i.e., a conjunction of decision
               rules) can overload human users with too much information to process (as we show
               in our experiments). Thus, there is a need to ensure interpretability of the generated
               explanations.
                   We address this challenge by introducing “interpretable decision trees”, which al-
               lows us to trade off the prediction accuracy of a decision tree with its interpretability
               in a quantifiable manner. Further, we exploit the competing objectives of prediction ac-
               curacy and interpretability to generate a Pareto frontier of decision trees. This Pareto
               frontier allows us to correctly differentiate between OG and UG , while maintaining
               different levels of interpretability. We then utilize trees from this Pareto frontier to gen-
               erate explanations for the IM algorithm’s solutions (see Section 4). The novelty of our
                1
                    See http://bit.ly/2kXDhE4 for proofs


                                                           11
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               work lies in the versatility of our explanation system, which is able to provide explana-
               tions of varying interpretability upon demand. Specifically, our explanation system is
               flexible enough to dynamically increase the interpretability of the generated explanation
               (by using more “interpretable” decision trees to generate the explanation), if the end-
               user is unconvinced about previously supplied explanations. We now formally define
               interpretable decision trees.


               4     Interpretable Decision Trees
               Decision trees take an input a labeled dataset with N datapoints (X i , Y i ) ∀i ∈ {1, N },
               where X i is a p-dimensional feature vector for the ith datapoint, and Y i is the label
               associated with the datapoint. In our domain, X i represents the vector of network-
               centric feature values corresponding to the ith IM solution (i.e., set of K nodes) in
               our dataset. Similarly, Y i = 1 if the ith solution in our dataset is the IM algorithm’s
               solution, and it is 0 otherwise.
                   While a decision tree’s prediction for X i can be explained as a conjunction of all
               the feature splits that X i traverses on its way to a leaf node, this explanation is un-
               interpretable, as it overloads end-users with too much information (we validate this in
               our experiments). Thus, in order to ensure interpretability of the generated explanations,
               we introduce the notion of interpretable decision trees, which allows us to tradeoff the
               explanation accuracy with its interpretability in a quantifiable manner. Next, we define
               the interpretability score, which quantifies the interpretability of a decision tree.
                   Interpretability Score We associate with each feature t ∈ {1, p} an interpretabil-
               ity weight wt ∈ [0, 1] which quantifies the interpretability of feature t. We estimate
               wt values ∀t ∈ {1, p} by surveying AMT workers (see Section 6). Further, let xD       t ∀t ∈
               {1, p} denote a binary variable which indicates whether a split with feature t is used
               at any branch node in the decision tree D or not. Then, the interpretability score of
                                                      p              p
               decision tree D is defined as: ID = ( xD                xD
                                                      P             P
                                                           t wt )/(      t ).
                                                      t=1          t=1
                   Thus, trees which use fewer number of features to split (the denominator in ID ’s
               definition) have higher ID scores (as it is easier to explain the tree’s predictions with
               fewer features). Moreover, trees which use features having more interpretable weights
               have higher ID scores. Next, we define the A&I score of a decision tree, which captures
               the tradeoff between its prediction accuracy and interpretability.
                   A&I Score For decision trees (and most other classification algorithms), inter-
               pretability and prediction accuracy (measured by ID and F1 score2 , respectively) are
               two competing objectives. Increasing the prediction accuracy of a decision tree leads to
               a loss of its interpretability, and vice versa.
                   The A&I score of decision tree D captures the tradeoff between these competing
               objectives and is defined as follows: A&ID (α) = F1 D + αID . Here, α is a parameter
               which controls the tradeoff between the F1 and ID scores. Higher α values lead to
               highly interpretable decision trees with low prediction accuracy, and vice versa. Next,
               we explain an algorithm which finds tree Dα which maximizes the A&I(α) score (i.e.,
               it optimally trades off between F1 and ID scores for the tradeoff level specified by α).
                2
                    F1 score = 2T P/(2T P + F P + F N )


                                                            12
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


                 Algorithm 1: Maximize A&I score
                   Input: α, Dataset (X i , Y i ) ∀i ∈ {1, N }
                   Output: Dα , the A&I score maximizing decision tree
                 1 (T rain, V alidation, T est) = Split(Dataset);
                 2 T ree = CART (T rain);
                 3 for level ∈ P runingLevels(T ree) do
                 4      P runedT ree = P rune(T ree, level);
                 5      rlevel = A&I(α, P runedT ree, V alidation);
                 6 OptLevel = argmaxj rj ;
                 7 Dα = P rune(T ree, OptLevel);
                 8 return Dα ;


                   Maximizing A&I score For each α value, we use Algorithm 1 to build a decision
               tree (on the dataset from Section 3) which maximizes A&I(α). First, we randomly
               split our dataset into a training, validation, and test set in Step 1(in our experiments,
               we use a 80:10:10 split). Next, we use CART, a state-of-the-art decision tree algorithm
               [4], to learn a decision tree on our training set in Step 2. Normally, CART outputs
               deep decision trees, which have high prediction accuracies on the training set. However,
               deeper trees have lower ID scores, as they have more number of features. As a result,
               we use our validation set to find the optimal pruning level of CART’s decision tree,
               which maximizes A&I(α). Specifically, for each possible pruning level , we prune
               CART’s decision tree to that level in Step 4. We use this pruned decision tree’s F1 score
               on the validation set to calculate the pruned tree’s A&I(α) score on the validation set
               (Step 5). Finally, we select the optimal pruning level which maximizes A&I(α) on the
               validation set (Step 6), and prune CART’s decision tree to this level (Step 7). To account
               for randomization in splitting of the dataset into the training, validation and test set, we
               average over 50 such splits.
                   Decision Tree Pareto Frontier Instead of generating a single decision tree (using
               Algorithm 1), we exploit the accuracy-interpretability tradeoff by varying the value of
               α to generate multiple decision trees. We do this because of two reasons. First, end-
               users may find explanations generated from one decision tree hard to understand, and
               thus, we may need a more interpretable tree (measured by ID ) to generate the explana-
               tions. Second, explanations can only be generated using decision trees which correctly
               differentiate between the IM algorithm’s optimal solution (OG ) and the end-user’s sub-
               optimal solution (UG ) (else we don’t have a valid classifier M in Problem 2). Thus, if
               more interpretable (less accurate) trees fail to differentiate between OG and UG , we
               need more accurate (less interpretable) trees to generate explanations.
                    Therefore, for each α value, we find the optimal Dα tree using Algorithm 1. Thus,
               we get a set of trees Θ, each of which has different F1 and ID scores. Next, we remove
               all trees D ∈ Θ which are dominated by other trees D0 in terms of both F1 and ID
                                                                            0
               scores (i.e., there exist trees D0 ∈ Θ for which either {F1D > F1D ∧ ID0 > ID } or
                   D0      D
               {F1 > F1 ∧ ID0 > ID } holds). This gives us our Pareto frontier of trees.
                   Our strategy is to search this Pareto-frontier for a decision tree D∗ , which can cor-
               rectly differentiate between OG and UG , and then use D∗ to generate explanations.


                                                           13
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               Next, we complete the discussion of our explanation system by proposing XplainIM,
               a solution method for Problem 2 which utilizes our Pareto frontier (recall that solving
               Problem 2 is sufficient to create an explanation system for IM algorithms).


               5   Solution for Problem 2: XplainIM
               Problem 2 takes as input a machine learning classifier M (which can differentiate be-
               tween OG and UG ), and produces as output an explanation system which explains how
               classifier M differentiates OG from UG . We propose XplainIM as a solver for Problem
               2. We begin by discussing XplainIM’s design.
                    XplainIM builds an explanation system for a specific IM algorithm in two stages. In
               the offline stage, it takes as input a classification dataset for that algorithm, and uses it
               to store a Pareto-frontier of decision trees. This input dataset is created using a fixed set
               of network-centric features (Figure 1), irrespective of the specific IM algorithm under
               consideration. In the online stage, it takes as input a network G and user solution UG
               and uses them to choose classifier M . We now explain how XplainIM chooses this
               classifier M .
                    Choosing Classifier Given a choice of OG and UG , XplainIM finds the set F of
               “feasible trees” in its Pareto frontier which can successfully classify OG as optimal, and
               UG as sub-optimal. Each tree in F is a valid classifier for Problem 2, as it can differen-
               tiate between OG and UG . To choose M , XplainIM simply uses the most interpretable
               tree in F, i.e., M = argmaxD∈F ID .
                    Note that it is possible that for some choice of OG and UG , no tree in our Pareto
               frontier is feasible. Then, solving Problem 2 is impossible, as we wouldn’t have an
               appropriate classifier M . However, this case rarely arises in practice (see Section 6).
                    Explanation System Next, having chosen a feasible decision tree as the classifier
               M , we now discuss XplainIM’s explanation system, which generates explanations for
               “how M differentiates OG from UG ”. Xplain relies on a natural language template
               (described below) to generate explanations, which is populated at runtime (depending
               on OG and UG ).

               Template 1 Your solution UG had a ft feature value of UG (ft ). On the other hand,
               the optimal solution OG had a ft feature value of OG (ft ). For solutions to be optimal,
               the value of ft feature should be {6, >}bt .

                    Given OG and UG , we propose three different methods for populating Template 1.
               Each of these methods relies on the observation that the paths taken by OG and UG
               (to leaf nodes) in tree M diverge at some branch node t∗ ∈ TB (because tree M is
               feasible). We use features ft and corresponding thresholds bt (recall Section 4) starting
               from this point of divergence t∗ to populate Template 1 to generate explanations for the
               end-user. Our three methods are explained below:
                    1. XplainIM-A Template 1 is populated using only ft∗ and bt∗ for branch node t∗ ,
               which is the first point of divergence between paths of OG and UG in tree M . This
               populated template is presented as explanation to the end-user.
                    2. XplainIM-B Template 1 is populated using ft∗ and bt∗ for branch node t∗ (simi-
               lar to XplainIM-A). However, the choice of classifier M used to generate explanations


                                                           14
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               is different. Instead of using the feasible tree with the highest ID score as our classifier
               M , we use the feasible tree in which the feature ft∗ (present at the branch node t∗ )
               has the highest interpretability weight wft∗ . This is because only feature ft∗ is used to
               populate Template 1, and a high value of wft∗ would ensure the interpretability of the
               generated explanation.
                   3. XplainIM-Full Instead of providing explanations in terms of a single feature,
               XplainIM-Full uses multiple features in its explanation. Starting from branch node t∗ ,
               XplainIM-Full populates Template 1 for all features along the two divergent paths of
               OG and UG . This method produces explanations which are identical in structure to
               ones generated by decision sets [2].


               6   Evaluation

               We evaluated XplainIM by using it to explain HEALER’s solutions to human work-
               ers on AMT. We provide results from these experiments in three parts. First, we de-
               scribe the HEALER specific dataset (which XplainIM uses to build its Pareto-frontier
               of trees). Second, we provide results from AMT surveys which were used to estimate
               interpretability weights wt for features in our dataset. Third, we provide AMT game re-
               sults to evaluate XplainIM’s effectiveness in explaining HEALER’s solutions to AMT
               workers.
                   Dataset We generate 100 different Watts-Strogatz (WS) networks (p = 0.25, k =
               7), each of size 20 nodes. The influence probabilities for network edges were ran-
               domly sampled from a uniform distribution U [0, 1]. We set K = 2, i.e., every set of
               two network nodes is a possible IM solution. For each of these 100 networks, we add
               HEALER’s solution, along with 20 randomly chosen IM solutions to our dataset with
               the labels “optimal” and “suboptimal”, respectively. Next, we map the ith IM solution
               in our dataset to a unique feature vector X i containing the network-centric feature val-
               ues (see Figure 1) for that IM solution. This gives us a labeled dataset (with 20 features
               and 2100 datapoints), which serves as XplainIM’s input in our experiments.


                                               Feature Name            wt
                                               Betweenness Centrality 0.0
                                               Clustering Coefficient 0.25
                                               Pagerank               0.28
                                               Closeness Centrality   0.71
                                               Degree                  1.0


                                     Fig. 1: Interpretability Weights for Features


                  Interpretability Weights We deployed a survey among 35 AMT workers to esti-
               mate interpretability weights wt for all 20 features in our dataset. Each worker is (i)
               provided the definition of each feature; and (ii) given an example of how that feature’s


                                                           15
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               value is calculated for a sample IM solution. Then, the worker is asked to calculate each
               feature’s value for a new IM solution on an unseen network.
                   For each feature t, we compare the user response distribution Mt (i.e., histogram
               of feature values calculated by the 21 workers who passed the validation game) with
               the actual value distribution Pt in order to calculate wt . We assume that a feature’s
               interpretability correlates heavily with ease of calculating that feature’s value. Thus,
               if Mt is very similar to Pt , it implies that AMT workers find feature t to be highly
               interpretable. However, significant differences between Mt and Pt implies that fea-
               ture t is uninterpretable. Specifically, we set wt = 1 − EarthMover(Mt , Pt ), where
               EarthMover(Mt , Pt ) measures the distance between the distributions Mt and Pt [3].
                   Due to a lack of space, Figure 1 shows normalized wt values for 5 out of the 20
               features. This figure shows that “Degree” and “Betweenness Centrality” are the most
               and least interpretable features, respectively.
                   AMT Game Results We now present results from two different game types that
               we deployed on AMT to evaluate XplainIM’s effectiveness at explaining HEALER’s
               solutions. Both games begin by asking AMT workers to pick their preferred IM solution
               UG on a randomly chosen WS network (p = 0.25, k = 7, n = 20). We begin by
               explaining the design and motivations of both games.
                   Game 1 This game has three different versions, one for each variant of XplainIM.
               This game presents explanations generated by XplainIM to AMT workers, and evalu-
               ates the effect of these explanations in improving the worker’s IM solutions UG . The
               game proceeds as follows. After the worker picks solution UG , he/she is given an expla-
               nation for why HEALER’s solution OG is better than UG (this explanation is generated
               by XplainIM-{A, B or Full} depending on the game version). Then, the worker is asked
               to improve his solution UG based on the explanation shown.
                   Game 2 This game presents explanations generated by all three XplainIM variants
               to the same AMT worker, and compares the self-assessed interpretabilities of these
               three explanation schemes. The game proceeds as follows. After the worker picks so-
               lution UG , he/she is given three different explanations (generated using XplainIM-{A,
               B & Full}) for why HEALER’s solution OG is better than UG . Then, the worker is
               asked to rate these three explanations (each on a scale of 1-10) in terms of their in-
               terpretability. Further, we also deploy a “No Explanation” game (as a baseline), where
               workers are asked to improve their suboptimal solutions UG , without being given any
               explanation for UG ’s suboptimality. Each of these game versions was deployed with 50
               unique AMT workers (to avoid learning bias), each of whom plays on five different WS
               networks sequentially. We now discuss evaluation metrics used in this section.
                   Evaluation Metrics For Game 1, we compare the interpretability of XplainIM-A,
               B & Full’s explanations by measuring the effect of these explanations in improving the
               worker’s IM solutions UG . Specifically, we measure the percentage of workers who (in
               response to the provided explanation) modified UG to either (i) match OG (Optimal
               group); OR (i) correctly followed the provided explanation (Correct group); OR (iii)
               moved in the right direction (Close group).
                   For example, assume that “Degree” (Figure 1) of UG is 4, and the provided ex-
               planation is “For solutions to be optimal, the Degree should be greater than 9”. If the
               worker modifies UG to match OG , he/she is counted in the Optimal group. Otherwise,


                                                         16
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


                          0.71                                                             0.71

                          0.61                                                             0.61
               F1 Score


                                                                                F1 Score
                          0.51                                                             0.51

                          0.41                                                             0.41

                          0.31                                                             0.31
                              0.95          1.05          1.15          1.25                   0.95        1.05          1.15         1.25
                                               Interpretability Score                                        Interpretability Score

                                     (a) Set of all decision trees                                (b) Trees forming Pareto frontier

                                              Fig. 2: Plotting the Pareto Frontier of Decision Trees
               if his/her modified solution’s Degree is greater than 9, he/she is counted in the Correct
               group. Finally, if his/her modified solution’s Degree is greater than 4 (UG ’s original
               Degree), but is less than 9, then he/she moved UG in the right direction and is counted
               in the Close group.
                   Also, we use the failure rate, which is defined as the fraction of times an explanation
               system fails to produce an explanation. This happens when there are no feasible decision
               trees for a given choice of OG and UG . We calculate an explanation system’s failure
               rate on network G by measuring the fraction of possible user solutions UG for which no
               tree is feasible in the Pareto frontier. We now present results comparing XplainIM-A, B
               & Full with the “No Explanation” treatment. All our comparison results are statistically
               significant under t-test.
                   Pareto Frontier First, we illustrate the accuracy-interpretability tradeoff exploited
               by all XplainIM variants to find Pareto-optimal trees. Figure 2a shows all decision trees
               (depicted by individual points) obtained by varying α values. The X and Y axes show
               the ID and F1 scores, respectively. We prune this set of decision trees to find the set
               of Pareto-optimal trees (shown in Figure 2b). These figures clearly depict the inverse
               correlation between the two competing objectives of prediction accuracy (F1 ) and in-
               terpretability (ID ).
                   Next, we validate our decision of maintaining this Pareto frontier of trees inside
               XplainIM (instead of maintaining just a single Pareto-optimal tree). Figure 4a compares
               the failure rate of XplainIM-A, B & Full to that of single Pareto-optimal trees. The X-
               axis shows trees of increasing prediction accuracy. The Y-axis shows the average failure
               rate (across 100 WS nets). This figure shows that our Pareto-frontier based explanation
               systems (XplainIM-A, B & Full) have an average failure rate of 0.01, i.e., XplainIM
               generate valid explanations in ∼99% of situations. However, the minimum failure rate
               achieved by a single Pareto-optimal tree is 0.27. This shows that using single trees as our
               classifier for Problem 2 (instead of maintaining a Pareto frontier) results in failures 27%
               of the times. This signifies the importance of the Pareto frontier in making XplainIM
               work.
                   XplainIM Results Figure 3a shows Game 2 results. The X-axis shows the three
               XplainIM variants. The Y-axis shows average ratings (scale of 1-10) given by workers to
               different explanation schemes. This figure shows that XplainIM-A & B’s explanations
               average rating was 6.04, as compared to 4.84 for XplainIM-Full (p=0.034). This shows
               that workers found XplainIM-A & B’s explanations to be more interpretable than those
               generated by XplainIM-Full.


                                                                               17
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


                                           7                                                                               60

               Average Subjective Rating
                                           6                                                                               50     No Explanation


                                                                                                     Percentage of users
                                                                                                                                  XplainIM-Full
                                           5
                                                                                                                           40
                                           4
                                                                                                                           30
                                           3
                                                                                                                           20
                                           2
                                           1                                                                               10
                                           0                                                                                0
                                                    XplainIM-A       XplainIM-B      XplainIM-Full                               Optimal            Correct          Close
                                                            Different Explanation Systems                                                  Different Groups of Users

                                                    (a) Subjective rating comparison                                            (b) Effect on modified solutions

                                                                  Fig. 3: Evaluating interpretability of XplainIM-Full

                                           0.6                                                                             70
                                                                      Single Decision Tree                                 60    No Explanation
                                           0.5
                 Average Failure Rate


                                                                                                     Percentage of Users
                                                                      XplainIM                                                   XplainIM-A
                                                                                                                           50
                                           0.4                                                                                   XplainIM-B
                                                                                                                           40
                                           0.3
                                                                                                                           30
                                           0.2                                                                             20
                                           0.1                                                                             10
                                               0                                                                           0
                                                    0.31 0.35 0.41 0.45 0.51 0.55 0.61 0.65 0.71                                 Optimal            Correct          Close
                                                                      F1 score                                                             Different Groups of Users

                                                        (a) Failure rate comparison                                             (b) Effect on modified solutions

                                                               Fig. 4: Frontier Validation & Evaluating XplainIM-A&B
                     Figure 3b shows Game 1 results for XplainIM-Full and the “No Explanation” treat-
                 ment. The X-axis shows the Optimal, Correct, and Close groups. The Y-axis shows the
                 percentage of workers belonging to these groups. This figure shows that only ∼4.6%
                 workers are in the Correct group after seeing XplainIM- Fulls explanations. Surpris-
                 ingly, even without any explanations, 24.6% workers are in the Correct group (p=3e−9 ),
                 which shows that users perform better without any explanations (compared to XplainIM-
                 Full). This shows that XplainIM-Full’s longer explanations (which are based on [2]) are
                 completely uninterpretable to AMT workers (possibly because they overload workers
                 with too much information). As a result, we now focus our attention on XplainIM-A &
                 B.
                     Figure 4b shows Game 1 results for XplainIM-A & B and the “No Explanation”
                 treatment. This figure shows that with XplainIM-A & B’s explanations, ∼8.5% users
                 are in the Optimal group (∼4% with no explanation), 36% are in the Correct group
                 (24% with no explanation), and ∼60% are in the Close group (46% with no explana-
                 tion) (p=0.05). This establishes that XplainIM-A & B’s explanations are far more inter-
                 pretable, as significantly more AMT workers follow these explanations to improve their
                 solutions. Further, this shows the importance of exploiting the accuracy-interpretability
                 tradeoff (used by XplainIM) in generating explanations to solutions of IM algorithms.


                 7                                 Conclusion

                 This paper solves the open problem of explaining solutions of IM algorithms to its
                 users. This paper shows that the problem of designing an explanation system can be


                                                                                                 18
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               equivalently viewed as a problem of explaining an ML classifier’s predictions. We pro-
               pose XplainIM, a suite of ML based explanation systems which exploit the accuracy-
               interpretability tradeoff to generate natural language explanations. Extensive human
               subject experiments show the effectiveness of XplainIM in explaining IM algorithm
               solutions to human users.


               8    Acknowledgements

               This research was supported by MURI Grant W911NF-11-1-0332.


               References
                1. Borgs, C., Brautbar, M., Chayes, J., Lucier, B.: Maximizing Social Influence in Nearly Opti-
                   mal Time. In: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete
                   Algorithms. pp. 946–957. SODA ’14, SIAM (2014)
                2. Lakkaraju, H., Bach, S.H., Leskovec, J.: Interpretable decision sets: A joint framework for
                   description and prediction. In: Proceedings of the 22nd ACM SIGKDD International Con-
                   ference on Knowledge Discovery and Data Mining. pp. 1675–1684. ACM (2016)
                3. Ling, H., Okada, K.: An efficient earth mover’s distance algorithm for robust histogram com-
                   parison. IEEE transactions on pattern analysis and machine intelligence 29(5) (2007)
                4. Loh, W.Y.: Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining
                   and Knowledge Discovery 1(1), 14–23 (2011)
                5. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming.
                   John Wiley & Sons (2009)
                6. Quinlan, J.R.: Induction of decision trees. Machine learning 1(1), 81–106 (1986)
                7. Schneier, B.: Trust in man/machine security systems. IEEE Security Privacy 11(5), 96–96
                   (Sept 2013)
                8. Wang, N., Pynadath, D.V., Hill, S.G.: The impact of pomdp-generated explanations on trust
                   and performance in human-robot teams. In: Proceedings of the 2016 International Confer-
                   ence on Autonomous Agents & Multiagent Systems. pp. 997–1005. International Foundation
                   for Autonomous Agents and Multiagent Systems (2016)
                9. Yadav, A., Chan, H., Jiang, A.X., Rice, E., Kamar, E., Grosz, B., Tambe, M.: POMDPs for
                   Assisting Homeless Shelters: Computational and Deployment Challenges. In: International
                   Workshop on Issues with Deployment of Emerging Agent-based Systems (IDEAS) (2016)
               10. Yadav, A., Wilder, B., Petering, R., Rice, E., Tambe, M.: Influence maximization in the field:
                   The arduous journey from emerging to deployed application. In: In Proceedings of the In-
                   ternational Conference on Autonomous Agents and Multi-agent Systems (AAMAS), 2017
                   (2017)


                                                              19

</pre>