=Paper=
{{Paper
|id=Vol-1917/paper02
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-1917/paper02.pdf
|volume=Vol-1917
}}
==None==
Shapley Curves: A New Concept for
Modelling Feature Importance
Farjad Adnan, Karlson Pfannschmidt, Eyke Hüllermeier
Intelligent Systems Group
Paderborn University
We propose a novel method for measuring the importance and usefulness of
predictor variables (features) in supervised machine learning, which makes use
of concepts from cooperative game theory. The basic idea of our approach is to
consider subsets of variables as coalitions, and their predictive performance as a
payoff. This approach acknowledges the fact that the usefulness of a feature in a
learning context strongly depends, not only on the learning method being used,
but also on the other features being available.
A theoretically appealing measure of the importance of an individual feature
is the Shapley value [3]. Computationally, however, this measure is challenging.
First, the exact computation of the Shapley values requires determining the
performance of all possible subsets of features, which is in general #P-hard [2].
Furthermore, in the context of machine learning, even the training of a single
predictor on one subset of features can take a considerable amount of time.
As another aspect specific to machine learning, let us note that the Shapley
values of each feature can change with varying sample size, due to effects such as
overfitting. Motivated by this observation, we introduce the concept of a Shapley
curve, which depicts the (weighted average) contribution of a feature to the
learning curve (expected performance as a function of the sample size).
We develop an approximation technique for estimating Shapley values, which
is efficient in the number of models that need to be trained and validated.
Moreover, to estimate Shapley curves, we propose a hierarchical Bayes approach
that does not require an evaluation of all possible subsets of features on different
sample sizes. Last but not least, leveraging related techniques for extrapolating
learning curves [1], we are able to estimate the Shapley values in the limit when
the sample size goes to infinity. We evaluate our approach on synthetic and
real-world datasets.
References
1. C. Cortes, L.D. Jackel, S.A. Solla, V. Vapnik, and J.S. Denker. Learning curves:
Asymptotic values and rate of convergence. In Proc. NIPS, Advances in Neural
Information Processing Systems, Denver, USA, 1993.
2. X. Deng and C.H. Papadimitriou. On the complexity of cooperative solution concepts.
Math. Oper. Res., 19(2):257–266, 1994.
3. K. Pfannschmidt, E. Hüllermeier, S. Held, and R. Neiger. Evaluating tests in medical
diagnosis: Combining machine learning with game-theoretical concepts. In Proc.
IPMU, International Conference on Information Processing and Management of
Uncertainty in Knowledge-Based Systems, Eindhoven, The Netherlands, 2016.
Copyright © 2017 by the paper’s authors. Copying permitted only for private and academic purposes.
In: M. Leyer (Ed.): Proceedings of the LWDA 2017 Workshops: KDML, FGWM, IR, and FGDB.
Rostock, Germany, 11.-13. September 2017, published at http://ceur-ws.org