<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Learning Confidence Intervals for Feature Importance: A Fast Shapley-based Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Davide Napolitano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorenzo Vaiani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Cagliero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Torino</institution>
          ,
          <addr-line>Turin</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>31</volume>
      <issue>2023</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Inferring feature importance is a well-known machine learning problem. Giving importance scores to the input data features is particularly helpful for explaining black-box models. Existing approaches rely on either statistical or Neural Network-based methods. Among them, Shapley Value estimates are among the mostly used scores to explain individual classification models or ensemble methods. As a drawback, state-of-the-art neural network-based approaches neglects the uncertainty of the input predictions while computing the confidence intervals of the feature importance scores. The paper extends a state-of-the-art neural method for Shapley Value estimation to handle uncertain predictions made by ensemble methods and to estimate a confidence interval for the feature importances. The results show that (1) The estimated confidence intervals are coherent with the expectation and more reliable than baseline methods; (2) The eficiency of the Shapley value estimator is comparable to those of traditional models; (3) The level of uncertainty of the Shapley value estimates decreases while producing ensembles of larger numbers of predictors.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>feature importance scores by learning the corresponding
confidence scores.</p>
      <p>
        Machine learning and deep learning have achieved re- Shapley Values [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] are known concepts from
coopermarkable results in various classification tasks. However, ative game theory that have become established for AI
due to the inherent complexity end-users often treat them model explanation. Specifically, they quantify the
contrias black-boxes as these models do not provide the nec- bution of a given feature to the prediction of a particular
essary insights into the reasons behind the generated instance. Thanks to the additive property, they can be
predictions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Understanding feature importance is a also used to estimate the global contribution of a feature
relevant branch of Explainable AI. The main goal is to esti- to an AI model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Since the exact Shapley Value estimate
mate the predictive power of a feature for a response vari- is computationally intractable on most real datasets
difable [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. To successfully cope with arbitrary predictive ferent approximation methods have been proposed. They
models, especially the non-intepretable ones such as neu- can be classified as stochastic approaches (e.g., [
        <xref ref-type="bibr" rid="ref6">7, 6, 8</xref>
        ])
ral networks or ensemble methods (e.g., random forests or or model-based ones (e.g., [9, 10]). Among the latter ones,
Gradient Boosting [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]), a particular research interest has recent approaches based on Neural Network models [10]
been devoted to studying model-agnostic methods. They have shown to be particularly eficient as allow real-time
compute the feature importance scores disentangling the Shapley Value estimate in a single forward pass using a
approximations from the underlying model character- learned explainer model.
istics. Within this field, statistics-based approaches to The main paper contributions are outlined below.
feature importance have two major issues [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]: (1) they
make arbitrary distributional assumptions, which are
often hard to verify in practice on real data, (2) they
neglect, in most cases, the uncertainty of model estimates
thus providing end-users with unreliable feature
importance scores. This paper addresses the above-mentioned
issues as follows: (1) It adopts a state-of-the-art neural
network model learning the underlying data distribution
at training time; (2) It quantifies the uncertainty of the
• Conceptualization. We propose to extend
existing Shapley Value approximation methods to
cope with uncertain predictors by leveraging the
concepts of Coalition Interval Game [11] and
Interval Shapley Value [12].
• Design and Implementation. We introduce
      </p>
      <p>Interval FastSHAP, a novel and eficient
methodology for the approximation of Shapley values,
which builds upon the existing state-of-the-art
model, FastSHAP [10]. Given an ensemble of
predictors associated with the corresponding
conifdence intervals, it returns the Shapley Values
enriched with the corresponding confidence
intervals.
• Comparative study. To compare Interval
Fast</p>
      <p>
        SHAP with baseline methods, we also extend a
statistical approach based on Montecarlo
sampling [13] and a recently proposed
regressionbased model, namely Biased KernelSHAP [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], to
handle confidence intervals.
• Empirical evaluation. We report a selection
of empirical outcomes achieved on benchmark
tabular datasets [14]. The estimated intervals
have shown to be more reliable than Biased
KernelSHAP; Interval FastSHAP has a complexity
that is comparable to FastSHAP (and superior
to Biased KernelSHAP); the uncertainty of the
Shapely Value approximations decreasing by
increasing the number of predictors.
      </p>
      <p>Value residuals has been proposed in [21]. The goal is to
warn practitioners against overestimating the extent to
which Shapley-value-based explanations give them
insights into a model. Unlike [21], the approach presented
in the current paper also considers the uncertainty of
black-box models consisting of predictor ensembles,
focusing on quantifying the uncertainty of feature
importance rather than the accuracy of Shapley values.</p>
    </sec>
    <sec id="sec-2">
      <title>3. Preliminaries</title>
      <sec id="sec-2-1">
        <title>Shapley Value Introduced in 1951, the Shapley Value</title>
        <p>
          The rest of the paper is organized as follows. Section 2 assigns a value  to each player  in a cooperative
provides an overview of related works in the field of ex- game based on the contribution to the total payof of
plainability and feature importance estimation. Section 3 the group [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
presents some preliminary notions about Shapley Values Formally speaking, the Shapley Value for a player 
and FastSHAP architecture. Section 4 introduces Inter- in a cooperative game with a set of players  and a
val FastSHAP, the proposed methodology for estimat- characteristic function is defined as follows:
iFSnaegscttSSiohHnaAp5PlepyarneVdsaeclnuoetmssptwahrietehsemiatspwsoiirctihicaattlheedevbcaaolsuneafidltieninoecnemoienftthIenortdvesar.lvsFa.il- () = ⊆∑︁∖ ||!(| ||−| |!| − 1)! [( ∪ ) − ()]
nally, Section 6 concludes the paper and outlines possible
future works.
where  is player coalition,  : 2 → R is a
characteristic function,  ⊆ ( ∖) is the sum taken over all subsets
 of players in  excluding , || is the cardinality of
2. Related works set , and | | is the total number of players.
The Shapley Value is computed as the sum over all
In recent years, various model-agnostic feature impor- possible coalitions that do not contain coalition . The
tance scores have already been proposed in the literature. term ( ∪ ) − () is the marginal contribution of
They can be classified as follows: player  to the coalition .
        </p>
        <p>
          The Shapley value satisfies the axioms of eficiency ,
a) Feature exclusion/occlusion methods symmetry, linearity, and dummy player [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Eficiency
in(e.g., [15, 16]), which investigate the impact of dicates that the sum of the Shapley values for all players is
excluding part of the input features; equal to the total payof of the game; symmetry indicates
b) Feature permutation (e.g., [17, 18]), which en- that if two players have the same marginal contributions
semble predictive models by combining diferent to all possible coalitions, their Shapley values are equal;
feature sets; linearity holds because the total payof of the game can
c) Shapley value-based feature importance be decomposed into two independent parts and the
Shap(e.g., [
          <xref ref-type="bibr" rid="ref6">6, 19</xref>
          ]), which quantifies the relevance score ley value of each player can be obtained by summing
of a feature by approximating the per-class global their individual Shapley values for each part; dummy
Shapley values [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. player indicates that the Shapley value of a player
havThis work belongs to category (c). Few works have fo- ing no marginal contribution to the total payof is zero.
cused on quantifying the reliability or uncertainty of the Notably, linearity allows us to sum the instance-level
feature importance based on statistical models. For exam- contributions of a feature for global explanability [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
ple, [16] performs leave-one-covariate-out inference, [20]
adopt MonteCarlo feature sampling, whereas [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] uses Coalition Interval Game For every coalition  in
minipatch ensembles. However, these approaches are a cooperative game the achieved outcomes have a
cercomputationally expensive and may not scale well to tain level of uncertainty. We assume that the prediction
high-dimensional feature spaces. We believe that our ap-  of a model  on instance  has a confidence
interproach provides a robust and scalable solution to the prob- val [, ]. The aforesaid range is bounded
lem of quantifying feature importance and uncertainty from below by the pessimistic prediction obtained using
in high-dimensional feature spaces. Unlike [
          <xref ref-type="bibr" rid="ref4">16, 4, 20</xref>
          ] the the lower value of the associated zero-sum game and
approach proposed in the present work relies on neural it is bounded from above by the optimistic prediction
network learning for eficient Shapley Value approxima- obtained using the upper value of that game. In
complition. A regression-based approach to estimate Shapley ance with [11], we associate with each strategic game
a coalitional interval game consisting of a pair ( , ),
To meet the eficiency constraint, FastSHAP applies a
where  is the set of players, and  is a correspondence
normalization factor to all predictions, namely the
addi an interval
tive eficient normalization :
that associates with every coalition  ⊆
() that indicates that the worth of the coalition will be
somewhere in this range.
        </p>
        <sec id="sec-2-1-1">
          <title>Shapley Value with confidence interval</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Analo</title>
        <p>gously to coalition interval games, the estimate of the
Shapley Value  of feature  can be extended to
deifne the corresponding confidence interval on a given
instance  [
,,</p>
        <p>,] [20]. The confidence
interval quantifies the range of uncertainty of the
importance of feature  . The traditional Shapley Value  is
expected to be the mean interval value [12].
where  is the feature representation vector
corresponding to a sample,  is the response variable for a
classification problem,  () represents the Uniform distribution
over the classes,  represent a subset of feature to be
considered to infer the label of a data sample, ,() is
the expected value of the model’s prediction when
considering only features in , ,(0) is the expected value
of the model’s prediction when all features are absent
and (, ;  ) is the learned parametric function that
should outputs exact Shapley values.
(, ;  ) = (, ;  )+
+
1 (︁

(,(1) − ,(0) − 1 (, ;  )
︁)
where ,(1) is the expected value of the model’s
prediction when all features are present in the sample,
and  is the total number of features. By incorporating
additive eficient normalization into the loss function,
the FastSHAP model ensures that the resulting feature
attributions are consistent with the Shapley value,
providing a theoretically grounded and transparent method
for interpreting the model’s predictions.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Interval FastSHAP</title>
      <p>lfow. The goal is to explain a black-box prediction model</p>
      <sec id="sec-3-1">
        <title>M consisting of an ensemble of  predictors 1, 2,</title>
        <p>. . .,  . Without any loss of generality, hereafter we
will address the problem of explaining an ensemble of
binary classifiers predicting either the positive ( +) or the
negative (− ) class.</p>
        <sec id="sec-3-1-1">
          <title>Black-box model</title>
          <p>Given a dataset , for each instance
 in  let  be the prediction of model  for instance
. Let
+ = [
+,,</p>
          <p>+,]
− = [− ,, − ,]
(1)
(2)
(3)
be the confidence intervals of predictor
M associated
to instance  for positive and negative classes,
respectively. For instance, if M is an ensemble of decision trees
then the per-class confidence levels can be defined by
the range of variation of  trees’ predictions. Since the
per-class probabilities P(, − ) and P(, +) of a given
instance  are linearly dependent, i.e., P(, − ) = 1 - P(, +),
we simplify the model output setting as target the crossed
 1 = [− ,, 1 − − ,]
 2 = [
+,, 1
−</p>
          <p>+,]
 3 = [1 − − ,, − ,]
 4 = [1 − 
+,,</p>
          <p>+,]
1 = [− ,,</p>
          <p>+,]
2 = [</p>
          <p>+,, − ,]
one for each bound</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>In details, since the variance on each class is the same,</title>
        <p>i.e.  − =  +, rather than considering four vectors,
 1 = [− ,, +,]
 2 = [+,, − ,]
 3 = [+,, − ,]
 4 = [− ,, +,]
(4)
(5)
(6)
the outcome of the random forest approach which is
employed as the black-box model. It takes as input the
original data points and the vectors 1 and 2 as target
variables and outputs two vectors
ˆ 1 = (ˆ− ,, ˆ+,)
ˆ 2 = (ˆ+,, ˆ− ,)
(7)</p>
        <sec id="sec-3-2-1">
          <title>Combining FastSHAP explainers Two FastSHAP</title>
          <p>explainers are trained in parallel to infer the interval
Shapley values + and − . Given an arbitrary
instance , Interval FastSHAP aims at the Interval Shapley
Values consisting of the two vector pairs +=(+,+)
and − =(− ,− ), where +/− are the interval
Shapley values associated to instance  for the positive
and negative classes, respectively. Due to the linear
dependency of per-class probabilities, the interval
boundaries are predicted by the FastSHAP network in a crossed
fashion, accordingly to the previous explanation:
 1 = [− , +]
 2 = [+, − ]
(8)</p>
          <p>Specifically, vectors + and − contain the lower
bound estimates of the positive/negative confidence
intervals of instance , where the  -th vector dimension
corresponds to feature with index  in the input dataset
. The same holds for + and − in the context of
upper bounds.
and simply consider two of them:
 1 =  4 = [− ,, +,]
 2 =  3 = [+,, − ,]</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Surrogate model The surrogate model SM is de- 5. Preliminary results</title>
      <p>signed to approximate the behavior of a black-box model
M. The objective of the surrogate model is to predict the Data We perform experiments on four benchmark
tabclass label ˆ+/− of instance  as determined by the black- ular datasets belonging to the UCI repository [14], i.e.,
box model. In order to achieve this, the surrogate model Monks, Heart, Census, and WBC.
may employ any suitable prediction algorithm that is
computationally more eficient and able to accommodate Models and settings To implement the Random
Forthe utilization of varying subsets of features. Regarding est classifier, we employed the implementation provided
the implementation, in this study, a multi-layer percep- by the widely used scikit-learn library [23]. For all
expertron is utilized as the surrogate model, trained to predict iments, the number of trees was set to 100, except in the</p>
      <p>Mean</p>
      <p>L2
tsrteuedsi.es exploring the impact of varying the number of True S.V. FastSHAP KerBniealsSeHdAP MonteCarlo</p>
      <p>As a surrogate model, we implemented a Multi-Layer Monks 0.0054 0.0056 0.0099 0.0059
Perceptron (MLP), which is a commonly used neural net- WBC 0.0040 0.0073 0.0241 0.0034
work architecture. The MLP consisted of three linear Heart 0.0029 0.0063 0.0100 0.0026
layers, of hidden size = 512 and interspersed with Rec- Census 0.0037 0.0050 0.0112 0.0040
tified Linear Unit (ReLU) activation functions, and two
classification heads, one for each target vector. The sur- Table 2
rogate model has been trained for a maximum of 200 Confidence Interval width.
epochs using the Kullback-Leibler divergence loss, the 5.1. Accuracy of the explanations
AdamW optimizer [24], learning rate = 10− 4, batch size
= 8 and weight decay = 10− 2. We test how Interval FastSHAP estimates are close to</p>
      <p>For the explainer we use the implementation provided the ground truth Interval Shapley values. To this end,
by the FastSHAP authors [10]. It is built as a MLP of 3 we compute the proximity of the Interval FastSHAP
eslinear layers of hidden size = 128 and interspersed with timates with the ground truth in terms of mean 1 and
ReLU activation functions. It has been trained for a max- 2 norms. The obtained results are reported in Table 1.
imum of 200 epochs using the custom loss described in The Interval FastSHAP approach demonstrates improved
section 3 together with the additive eficient normaliza- performance in terms of the distance between the
intion, the AdamW optimizer [24], learning rate = 10− 2, terval mean points on three out of four datasets, i.e.,
batch size = 8 and weight decay = 5 * 10− 2. Monks, WBC and Census, whereas achieves particularly
close results on the Heart dataset, approaching the
perGround truth To generate the ground truth, we com- formance of the best-performing competitor, i.e., Biased
pute the Shapley Values estimates using Unbiased Ker- KernelSHAP. In terms of interval width prediction, the
nelSHAP [22] (with paired sampling) as the model is Montecarlo approach outperforms the other tested
methknown to converge to the true Shapley Values given infi- ods, providing reasonable interval ranges while centering
nite samples. The confidence interval of the true Shapley the interval away from the target mean point. FastSHAP
values is computed using a modified version of Unbiased achieves slightly worse results, while, in contrast, Biased
KernelSHAP estimating both interval boundaries at the KernelSHAP consistently exhibits wider interval
predicsame time. tions.</p>
      <p>
        To quantify the reliability of the mean Shapley Value
Baselines We extend the following baseline methods estimate we also compare the widths of the confidence
into estimate the lower and bounds of the Shapley Value tervals of the estimated and true Shapley Values. Table 2
confidence intervals: reports the confidence interval width for both the ground
truth and the tested approaches. Montecarlo produces
• A statistical approach based on Montecarlo sam- intervals with minimal width despite that the reliability
pling [25], hereafter denoted by Montecarlo. of the mean Shapley Value is averagely low (see Table 1).
• A recently proposed regression-based model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], KernelSHAP significantly overestimates the width of the
i.e., Biased KernelSHAP1. confidence interval, showing low reliability of the
generated feature importance scores. Conversely, FastSHAP
1Biased KernelSHAP is the predecessor of Unbiased KernelSHAP, achieves a good trade-of between mean accuracy and
from which we derive the true Shapley Values.
0.02s
0.01s
0.04s
0.01s
      </p>
      <p>Biased
KernelSHAP
confidence of the estimate, with a slight overestimation
of the actual confidence interval width.</p>
      <p>In Figure 3 we plot the Global Shapley Values [26]
as an estimate of the global measure of feature
importance. They are computed as the mean of the per-instance
Shapley Value estimates. The results confirm the bias of
Montecarlo sampling-based approaches and the
comparable performance of FastSHAP and KernelSHAP estimates
on the majority of the input features.
5.2. Execution times
Table 3 compares the inference times per sample spent
by all analyzed approaches separately for each tested
dataset. The inference step has been performed on a 18
core Intel Xeon Gold 6140. The reported statistics show
that Interval FastSHAP is more eficient than MonteCarlo
and competitive against Biased KernelSHAP. Notably,
Interval FastSHAP also requires a training time overhead.</p>
      <p>However, similar to FastSHAP [10] (and unlike
MonteCarlo and Biased KernelSHAP) it can be used for real-time
Interval Shapley Value estimation.
5.3. Efect of the number of predictors
The results of our study indicate that the uncertainty
inherent in the predictions of black-box models can have
a substantial efect on the reliability of Shapley Value
estimates. In particular, the size of the Confidence Interval,
which provides an estimate of the degree of uncertainty
in the predictions, has shown to be dependent on the
number of predictors used in the ensemble method (see
Figures 4 and 5 for the Heart and Monks datasets). As the
number of predictors increases, the mean Shapley Value
estimate converges to a steady state and the Confidence
Interval gets smaller, indicating a decrease in uncertainty.</p>
      <p>This underscores the importance of utilizing a suficient
number of predictors in order to ensure reliable estimates
of the Shapley Values.</p>
      <p>However, it is important to consider that increasing
the number of predictors may also result in overfitting,
which could lead to a decrease in the overall performance
of the model. As such, it is necessary to balance the need
(a) Monks dataset
(b) WBC dataset
(c) Census dataset
(d) Heart dataset
for a suficient number of predictors with the need to
avoid overfitting.</p>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusions and future work</title>
      <sec id="sec-5-1">
        <title>The paper presented a Shapley-based approach to learn</title>
        <p>confidence intervals for feature importance. It is suited to
explain ensemble methods, whose predictors return
uncertain outcomes. We leverage the concept of Coalition
Interval Game and Interval Shapley Value to adapt the
real-time neural network-based approach to handle
uncertain input and produce as output confidence intervals
in conjunction with the Shapley Value estimates.</p>
        <p>The main takeaways can be summarized as follows:
• Explanation accuracy: Interval FastSHAP turns
out to be significantly more reliable than
MonteCarlo in estimating the mean Shapley Value
and less susceptible to uncertainty than Biased
KernelSHAP.
• Real-time confidence interval estimation :
Interval FastSHAP is comparable to existing
approaches in terms of inference time. Although
requiring a computational time overhead for model
training, Interval FastSHAP leverages the
capability of the original FastSHAP to perform real-time
Shapley Value estimates. Notably, the estimation
of the confidence interval does not invalidate the
eficiency of the original model.
• Number of predictors: The results of this study
highlight the need for careful consideration of the
number of predictors used when estimating the
Shapley Values of ensemble black-box models, as
the uncertainty inherent in the predictions can
have a significant impact on the reliability of the
estimates.</p>
      </sec>
      <sec id="sec-5-2">
        <title>As future work, we plan to explore the applicability of</title>
        <p>the proposed approach to real application scenarios
related to predictive maintenance, finance, and user
profiling. We also aim to explore the use of diferent black-box
and surrogate models.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <sec id="sec-6-1">
        <title>This study was carried out within the FAIR - Future</title>
        <p>Artificial Intelligence Research and received funding
from the European Union Next-GenerationEU (PIANO
NAZIONALE DI RIPRESA E RESILIENZA (PNRR) –
MISSIONE 4 COMPONENTE 2, INVESTIMENTO 1.3 – D.D.
1555 11/10/2022, PE00000013). This manuscript reflects
only the authors’ views and opinions, neither the
European Union nor the European Commission can be
considered responsible for them.
Neural Information Processing Systems, Curran Proc. of the 37th Int. Conf. on Machine Learning,
Associates, Inc., 2017, pp. 4765–4774. volume 119, PMLR, 2020, pp. 10282–10291.
[7] J. Castro, D. Gómez, J. Tejada, Polynomial calcula- [20] D. V. Fryer, I. Strümke, H. D. Nguyen, Shapley value
tion of the shapley value based on sampling, Com- confidence intervals for attributing variance
exputers &amp; Operations Research 36 (2009) 1726–1730. plained, Frontiers Appl. Math. Stat. 6 (2020) 587199.
[8] I. Covert, S. M. Lundberg, S.-I. Lee, Understand- [21] I. Kumar, C. Scheidegger, S. Venkatasubramanian,
ing global feature contributions with additive im- S. Friedler, Shapley residuals: Quantifying the limits
portance measures, in: H. Larochelle, M. Ranzato, of the shapley value for explanations, in: M.
RanR. Hadsell, M. Balcan, H. Lin (Eds.), Advances in zato, A. Beygelzimer, Y. Dauphin, P. Liang, J. W.
Neural Information Processing Systems, volume 33, Vaughan (Eds.), Advances in Neural Information
Curran Associates, Inc., 2020, pp. 17212–17223. Processing Systems, volume 34, Curran Associates,
[9] S. M. Lundberg, G. G. Erion, H. Chen, A. J. De- Inc., 2021, pp. 26598–26608.</p>
        <p>Grave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, [22] I. Covert, S. Lee, Improving kernelshap:
PractiN. Bansal, S. Lee, Explainable AI for trees: From cal shapley value estimation via linear regression,
local explanations to global understanding, CoRR CoRR abs/2012.01536 (2020).</p>
        <p>abs/1905.04610 (2019). [23] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
[10] N. Jethani, M. Sudarshan, I. C. Covert, S. Lee, R. Ran- B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
ganath, Fastshap: Real-time shapley value estima- R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,
tion, in: The Tenth International Conference on D. Cournapeau, M. Brucher, M. Perrot, E.
DuchLearning Representations, ICLR 2022, Virtual Event, esnay, Scikit-learn: Machine learning in Python,
April 25-29, 2022, OpenReview.net, 2022. Journal of Machine Learning Research 12 (2011).
[11] L. Carpente, B. Casas-Méndez, I. García-Jurado, [24] I. Loshchilov, F. Hutter, Decoupled weight decay
A. van den Nouweland, Coalitional interval games regularization, arXiv preprint arXiv:1711.05101
for strategic games in which players cooperate, The- (2017).</p>
        <p>ory and Decision (2008) 253–269. [25] R. Y. Rubinstein, D. P. Kroese, Simulation and the
[12] S. Z. A. Gök, R. Branzei, S. Tijs, The interval shapley Monte Carlo method, John Wiley &amp; Sons, 2016.
value: an axiomatization, Central Eur. J. Oper. Res. [26] C. Frye, C. Rowat, I. Feige, Asymmetric shapley
18 (2010) 131–140. values: incorporating causal knowledge into
model[13] K. Aas, M. Jullum, A. Løland, Explaining individual agnostic explainability, 2019.
predictions when features are dependent: More
accurate approximations to shapley values, 2019.
[14] D. Dua, C. Graf, UCI machine learning repository,</p>
        <p>2017.
[15] B. D. Williamson, P. B. Gilbert, M. Carone, N. Simon,</p>
        <p>Nonparametric variable importance assessment
using machine learning techniques, Biometrics 77
(2021) 9–22.
[16] J. Lei, M. GaSell, A. Rinaldo, R. J. Tibshirani,</p>
        <p>L. Wasserman, Distribution-free predictive
inference for regression, Journal of the American
Statistical Association 113 (2018) 1094–1111.
[17] G. Smith, R. Mansilla, J. Goulding, Model class
reliance for random forests, in: H. Larochelle, M.
Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances
in Neural Information Processing Systems 33:
Annual Conference on Neural Information Processing
Systems 2020, NeurIPS 2020, December 6-12, 2020,
virtual, 2020.
[18] A. Fisher, C. Rudin, F. Dominici, All models are
wrong, but many are useful: Learning a variable’s
importance by studying an entire class of prediction
models simultaneously, J. Mach. Learn. Res. (2019).
[19] B. Williamson, J. Feng, Eficient nonparametric
statistical inference on population feature importance
using shapley values, in: H. D. III, A. Singh (Eds.),</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mohseni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Zarei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Ragan</surname>
          </string-name>
          ,
          <article-title>A multidisciplinary survey and framework for design and evaluation of explainable ai systems</article-title>
          ,
          <source>ACM Trans. Interact. Intell. Syst</source>
          .
          <volume>11</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>I.</given-names>
            <surname>Batal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hauskrecht</surname>
          </string-name>
          ,
          <article-title>Constructing classification features using minimal predictive patterns</article-title>
          ,
          <source>in: Proceedings of the 19th ACM International Conference on Information and Knowledge Management</source>
          ,
          <source>CIKM '10</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA,
          <year>2010</year>
          , p.
          <fpage>869</fpage>
          -
          <lpage>878</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kamber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pei</surname>
          </string-name>
          , Data Mining:
          <article-title>Concepts and Techniques, 3rd edition</article-title>
          , Morgan Kaufmann,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. I. Allen</surname>
          </string-name>
          ,
          <article-title>Model-agnostic conifdence intervals for feature importance: A fast and powerful approach using minipatch ensembles</article-title>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L. S.</given-names>
            <surname>Shapley</surname>
          </string-name>
          , Notes on the N-Person
          <string-name>
            <surname>Game</surname>
            <given-names>II</given-names>
          </string-name>
          :
          <article-title>The Value of an N-Person Game</article-title>
          , RAND Corporation, Santa Monica, CA,
          <year>1951</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A unified approach to interpreting model predictions</article-title>
          , in: Advances in
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>