<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Shapley Curves: A New Concept for Modelling Feature Importance</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Farjad Adnan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karlson Pfannschmidt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eyke Hullermeier</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Intelligent Systems Group Paderborn University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>We propose a novel method for measuring the importance and usefulness of
predictor variables (features) in supervised machine learning, which makes use
of concepts from cooperative game theory. The basic idea of our approach is to
consider subsets of variables as coalitions, and their predictive performance as a
payo . This approach acknowledges the fact that the usefulness of a feature in a
learning context strongly depends, not only on the learning method being used,
but also on the other features being available.</p>
      <p>
        A theoretically appealing measure of the importance of an individual feature
is the Shapley value [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Computationally, however, this measure is challenging.
First, the exact computation of the Shapley values requires determining the
performance of all possible subsets of features, which is in general #P-hard [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Furthermore, in the context of machine learning, even the training of a single
predictor on one subset of features can take a considerable amount of time.
      </p>
      <p>As another aspect speci c to machine learning, let us note that the Shapley
values of each feature can change with varying sample size, due to e ects such as
over tting. Motivated by this observation, we introduce the concept of a Shapley
curve, which depicts the (weighted average) contribution of a feature to the
learning curve (expected performance as a function of the sample size).</p>
      <p>
        We develop an approximation technique for estimating Shapley values, which
is e cient in the number of models that need to be trained and validated.
Moreover, to estimate Shapley curves, we propose a hierarchical Bayes approach
that does not require an evaluation of all possible subsets of features on di erent
sample sizes. Last but not least, leveraging related techniques for extrapolating
learning curves [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we are able to estimate the Shapley values in the limit when
the sample size goes to in nity. We evaluate our approach on synthetic and
real-world datasets.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>C.</given-names>
            <surname>Cortes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.D.</given-names>
            <surname>Jackel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.A.</given-names>
            <surname>Solla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.S.</given-names>
            <surname>Denker</surname>
          </string-name>
          .
          <article-title>Learning curves: Asymptotic values and rate of convergence</article-title>
          .
          <source>In Proc. NIPS, Advances in Neural Information Processing Systems</source>
          , Denver, USA,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>X.</given-names>
            <surname>Deng</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.H.</given-names>
            <surname>Papadimitriou</surname>
          </string-name>
          .
          <article-title>On the complexity of cooperative solution concepts</article-title>
          .
          <source>Math. Oper. Res.</source>
          ,
          <volume>19</volume>
          (
          <issue>2</issue>
          ):
          <volume>257</volume>
          {
          <fpage>266</fpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>K.</given-names>
            <surname>Pfannschmidt</surname>
          </string-name>
          , E. Hullermeier, S. Held, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Neiger</surname>
          </string-name>
          .
          <article-title>Evaluating tests in medical diagnosis: Combining machine learning with game-theoretical concepts</article-title>
          .
          <source>In Proc. IPMU, International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems</source>
          , Eindhoven, The Netherlands,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>