<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Shapley Curves: A New Concept for Modelling Feature Importance</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Farjad</forename><surname>Adnan</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Intelligent Systems Group</orgName>
								<orgName type="institution">Paderborn University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Karlson</forename><surname>Pfannschmidt</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Intelligent Systems Group</orgName>
								<orgName type="institution">Paderborn University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Eyke</forename><surname>Hüllermeier</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Intelligent Systems Group</orgName>
								<orgName type="institution">Paderborn University</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Shapley Curves: A New Concept for Modelling Feature Importance</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">DAFDB71B63CD7F032FEFE3C4D373FEAB</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:40+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract/>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We propose a novel method for measuring the importance and usefulness of predictor variables (features) in supervised machine learning, which makes use of concepts from cooperative game theory. The basic idea of our approach is to consider subsets of variables as coalitions, and their predictive performance as a payoff. This approach acknowledges the fact that the usefulness of a feature in a learning context strongly depends, not only on the learning method being used, but also on the other features being available.</p><p>A theoretically appealing measure of the importance of an individual feature is the Shapley value <ref type="bibr" target="#b2">[3]</ref>. Computationally, however, this measure is challenging. First, the exact computation of the Shapley values requires determining the performance of all possible subsets of features, which is in general #P-hard <ref type="bibr" target="#b1">[2]</ref>. Furthermore, in the context of machine learning, even the training of a single predictor on one subset of features can take a considerable amount of time.</p><p>As another aspect specific to machine learning, let us note that the Shapley values of each feature can change with varying sample size, due to effects such as overfitting. Motivated by this observation, we introduce the concept of a Shapley curve, which depicts the (weighted average) contribution of a feature to the learning curve (expected performance as a function of the sample size).</p><p>We develop an approximation technique for estimating Shapley values, which is efficient in the number of models that need to be trained and validated. Moreover, to estimate Shapley curves, we propose a hierarchical Bayes approach that does not require an evaluation of all possible subsets of features on different sample sizes. Last but not least, leveraging related techniques for extrapolating learning curves <ref type="bibr" target="#b0">[1]</ref>, we are able to estimate the Shapley values in the limit when the sample size goes to infinity. We evaluate our approach on synthetic and real-world datasets.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="1,0.00,159.54,612.00,472.91" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Learning curves: Asymptotic values and rate of convergence</title>
		<author>
			<persName><forename type="first">C</forename><surname>Cortes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">D</forename><surname>Jackel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Solla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vapnik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Denker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. NIPS, Advances in Neural Information Processing Systems</title>
				<meeting>NIPS, Advances in Neural Information essing Systems<address><addrLine>Denver, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">On the complexity of cooperative solution concepts</title>
		<author>
			<persName><forename type="first">X</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Papadimitriou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Math. Oper. Res</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="257" to="266" />
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Evaluating tests in medical diagnosis: Combining machine learning with game-theoretical concepts</title>
		<author>
			<persName><forename type="first">K</forename><surname>Pfannschmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hüllermeier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Held</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Neiger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. IPMU, International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems</title>
				<meeting>IPMU, International Conference on Information essing and Management of Uncertainty in Knowledge-Based Systems<address><addrLine>Eindhoven, The Netherlands</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><surname>Copyright</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org" />
		<title level="m">Proceedings of the LWDA 2017 Workshops: KDML, FGWM</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Leyer</surname></persName>
		</editor>
		<meeting>the LWDA 2017 Workshops: KDML, FGWM<address><addrLine>, IR, and FGDB; Rostock, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017-09">2017. 11. September 2017</date>
			<biblScope unit="page">13</biblScope>
		</imprint>
	</monogr>
	<note>by the paper&apos;s authors. Copying permitted only for private and academic purposes</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
