<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">What Can We Expect from Active Class Selection?</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Mirko</forename><surname>Bunse</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">AI Group</orgName>
								<orgName type="institution">TU Dortmund</orgName>
								<address>
									<postCode>44221</postCode>
									<settlement>Dortmund</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Katharina</forename><surname>Morik</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">AI Group</orgName>
								<orgName type="institution">TU Dortmund</orgName>
								<address>
									<postCode>44221</postCode>
									<settlement>Dortmund</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">What Can We Expect from Active Class Selection?</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">487A63695A89116EA1F8EA0FBCBB7C66</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T18:29+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Active class selection</term>
					<term>Active learning</term>
					<term>Classification</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The promise of active class selection is that the proportions of classes can be optimized in newly acquired data. In this short paper, we take a step towards the identification of properties that data sets must meet in order to make active class selection (potentially) successful. Also, we compare the conceivable benefit of active class selection to that of active learning and we identify open research issues. It becomes apparent that active class selection is a tough task, in which informed strategies often exhibit only minor improvements over random sampling.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Active class selection (ACS) <ref type="bibr" target="#b7">[8]</ref> seeks to optimize the proportions of classes in newly acquired data. This process is taken out sequentially: In each iteration, the most promising proportion of classes is selected and instances are generated according to these proportions. Due to this iterative collection of training data, there is a certain similarity between ACS and active learning (AL) <ref type="bibr" target="#b9">[10]</ref>. However, the data acquisition is fundamentally different between these paradigms: Where AL selects un-labeled instances to be labeled, ACS selects classes for which new instances are to be generated. This distinction reveals the contrasting assumptions which underlie AL and ACS with regard to the data generating process: AL assumes an external oracle which is able to assign labels to observations, e.g. a human annotator. ACS assumes a data generator which produces observations from label queries. One prominent example of such generator is the artificial nose experiment, where a vapor (the label) must be selected before a sensor array can record data <ref type="bibr" target="#b7">[8]</ref>. In both cases, it is assumed that each query is costly. Therefore, ACS and AL try to minimize the amount of training data by selecting only the most promising examples. Let us thus narrow the question raised above: Given that new training data can be generated from label queries, can we expect ACS to make optimal use of a limited data generation budget? Which preconditions must hold to make ACS a success? Our contribution with respect to these questions is three-fold:</p><p>• We identify common properties of the data used in ACS publications.</p><p>• We compare the potential benefit of ACS to that of AL.</p><p>• We recognize open issues in ACS research.</p><p>The first one of these contributions is detailed in Sec. 2. The second and third ones are presented in Sec. 3 and in Sec. 4. Finally, Sec. 5 concludes our findings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Data Used in ACS</head><p>Despite the potential relevance of ACS, we could make out only two papers which suggest algorithms for this task. Lomasky et al. <ref type="bibr" target="#b7">[8]</ref>, who also introduced ACS, present five approaches, the most successful of which seek to stabilize the empirical error of each class. Kottke et al. <ref type="bibr" target="#b6">[7]</ref> compare these approaches to a framework with which AL methods can be adapted to ACS. Namely, they use AL to score pseudo instances and aggregate the scores for each class. Both papers use random sampling (proportional and uniform) as a baseline. </p><formula xml:id="formula_0">U n i f o r m [ 8 ] 3clusters [7] 3 60 / ∞ * spirals [7]</formula><p>3 120 / ∞ * bars <ref type="bibr" target="#b6">[7]</ref> 3 120 / ∞ * vehicle <ref type="bibr" target="#b3">[4]</ref> 4 80 / 946 * vertebral <ref type="bibr" target="#b3">[4]</ref> 3 60 / 310 * yeast <ref type="bibr" target="#b3">[4]</ref> 5 / 8 60 / 1150 * land cover <ref type="bibr" target="#b1">[2]</ref> 11 ≈ 28000 artificial nose <ref type="bibr" target="#b7">[8]</ref> 8 ≈ 1250 * Proportional ≡ Uniform, due to a uniform class distribution in the test set</p><p>Tab. 1 summarizes the results that have been reported for these methods. The columns with upright names indicate whether a method clearly outperforms its competitors () or not (). Missing values denote that a method has not been evaluated. Please consider that the qualification of a "winner" must remain somewhat subjective. We therefore declare multiple methods as winners wherever a single winner cannot be made out from the published plots and tables.</p><p>One observation to make is that the random strategies "proportional" and "uniform" perform highly competitive. In this overview, they win on five out of eight data sets. Moreover, they come for free, whereas the informed (i.e. nonrandom) strategies imply a certain computational overhead which needs to be justified with the data acquisition cost. Also, one may be concerned about the applicability of (informed) active sampling in general <ref type="bibr" target="#b0">[1]</ref>. Note that proportional sampling assumes that the correct label proportions of the test set are known at training time, which may not hold in some use cases.</p><p>All of the data sets used so far distinguish between at least three classes. Moreover, we see that the predictability differs among their classes. The synthetic data sets for instance are modeled so that one class can easily be distinguished from the other two classes, which in turn are hard to distinguish from each other. For the UCI data sets <ref type="bibr" target="#b3">[4]</ref>, we provide the confusion matrices in Tab. 2. Displayed are the mean values over 50 trials, using proportional sampling and the classifier from the ACS experiments. Each row is scaled to unit sum to account for class imbalance. We see that the yeast data exhibits large differences among class difficulties (78.7% vs 41.4% class-wise accuracy). The differences on the vertebral data are smaller, yet considerable (74.0% vs 56.1%). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">The Potential Benefits of ACS and AL</head><p>Given a data set consisting of at least three classes of varying difficulty, what is the improvement that we can expect from ACS? How does it relate to the improvement AL methods achieve? To answer these questions, we reproduce some of the experiments described in <ref type="bibr" target="#b6">[7]</ref>. We add one strategy to these experiments that is optimal for the spirals data-it uses only a single example from the easy class and randomly samples from the difficult classes. It is "optimal" with regard to the overall accuracy because a single example is already enough to achieve 100% accuracy on the easy class. Even though this strategy does not adapt to any other data set, it shows how well ACS could potentially perform. Moreover, we extend the experiments by evaluating an AL strategy, namely the probabilistic active learning (PAL) <ref type="bibr" target="#b5">[6]</ref> which is also used inside of PAL-ACS. Fig. <ref type="figure" target="#fig_0">1</ref> presents the results of these extensions, specifically the mean error over 500 trials. The optimal strategy indicates that there is still room for improving ACS methods. In particular, knowing the difficulty of the classes in advance allows us to outperform the other strategies on the spirals data set. However, PAL is even better than that. Knowing which examples are available thus allows us to improve even further. These observations can not be made on the two UCI data sets. On both of them, neither uniform sampling nor PAL-ACS are clear winners-a finding we deem consistent with the original experiments. What is probably surprising is that the AL strategy performs worse than ACS. We conjecture that the identification of relevant examples is not necessarily easier than, but considerably different from the identification of relevant classes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Open Issues in ACS</head><p>It remains open whether the current limits of (informed) ACS stem from the problem itself-i.e. from sequentially optimizing only the class proportionsor from the methods proposed to date. We suggest to approach this question by studying relaxations of "pure" ACS. Indeed, example generators are often not only controlled by class proportions but also from auxiliary parameters. In the artificial nose experiment, for instance, not only a vapor (the label) must be selected before data can be recorded, but also the vapor's concentration <ref type="bibr" target="#b8">[9]</ref>. Optimizing the data generation only with respect to the class proportions means to limit the actual task artificially-and maybe even detrimentally.</p><p>An issue that has been neglected in ACS so far is the problem of imbalanced data <ref type="bibr" target="#b4">[5]</ref>. This problem refers to situations in which one class is abound and another one is scarce, typically leading to the degradation of classifiers and evaluation metrics. It has also been argued that within-class imbalances, i.e. abound and scarce sub-groups of single classes, can hinder learning <ref type="bibr" target="#b10">[11]</ref>. In ACS, we are free to choose how balanced the data is, but only with respect to the label. Methods for imbalanced learning could therefore guide ACS by constraining the class proportions for between-class balance and they may also correct the effects of within-class imbalances.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>ACS addresses use cases which distinguish between at least three classes of varying predictability. However, this precondition does not necessarily lead to a successful application of ACS. Experiments suggest that a random sampling of classes is hard to beat with informed strategies. We expect future advances to be made by (i) queries which combine the label with auxiliary parameters that control the data generator and by (ii) accounting for data imbalances.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. The learning curves of ACS strategies and the AL strategy PAL in comparison.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>The winning ACS strategies for each evaluated data set.</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell>[ 8 ]</cell><cell>[ 8 ]</cell><cell>[ 8 ]</cell></row><row><cell>data set</cell><cell>no. classes</cell><cell>no. examples</cell><cell cols="2">P A L -A C S [ 7 ] R e d i s t r i c t i n g I m p r o v e m e n t I n v e r s e [ 8 ] P r o p o r t i o n a l</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Confusion matrices of the Parzen window classifier<ref type="bibr" target="#b2">[3]</ref>.</figDesc><table><row><cell></cell><cell></cell><cell cols="3">predicted class</cell><cell></cell><cell></cell><cell cols="2">predicted class</cell></row><row><cell></cell><cell>0.526</cell><cell>0.348</cell><cell>0.007</cell><cell>0.052</cell><cell>0.066</cell><cell></cell><cell>0.561</cell><cell>0.308</cell><cell>0.131</cell></row><row><cell>t r u t h</cell><cell>0.431 0.002 0.054 0.077</cell><cell>0.414 0.002 0.051 0.13</cell><cell>0.006 0.787 0.306 0.028</cell><cell>0.051 0.2 0.463 0.105</cell><cell>0.098 0.008 0.126 0.66</cell><cell>t r u t h</cell><cell cols="2">0.268 0.161 vertebral data set [4] 0.691 0.041 0.098 0.74</cell></row><row><cell></cell><cell></cell><cell cols="2">yeast data set [4]</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We thank Daniel Kottke for the discussions we had and for his great support in reproducing the experiments on PAL-ACS. We also thank our reviewers for their valuable comments, in particular for pointing imbalanced learning out to us.</p></div>
			</div>


			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This work has been supported by Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center SFB 876 "Providing Information by Resource-Constrained Data Analysis", project C3. http://sfb876.tu-dortmund.de</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Inactive learning? Difficulties employing active learning in practice</title>
		<author>
			<persName><forename type="first">J</forename><surname>Attenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">J</forename><surname>Provost</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGKDD Explorations</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="36" to="41" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Improving automated land cover mapping by identifying and eliminating mislabeled observations from training data</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">E</forename><surname>Brodley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Friedl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Int. Geoscience and Remote Sensing Symp</title>
				<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="1996">1996</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="1382" to="1384" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Active learning for Parzen window classifier</title>
		<author>
			<persName><forename type="first">O</forename><surname>Chapelle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the AIS-TATS 2005. Society for Artificial Intelligence and Statistics</title>
				<meeting>of the AIS-TATS 2005. Society for Artificial Intelligence and Statistics</meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Dua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Graff</surname></persName>
		</author>
		<ptr target="http://archive.ics.uci.edu/ml" />
		<title level="m">UCI machine learning repository</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Learning from Imbalanced Data Sets</title>
		<author>
			<persName><forename type="first">A</forename><surname>Fernández</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>García</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Galar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">C</forename><surname>Prati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Krawczyk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Herrera</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Multi-class probabilistic active learning</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kottke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Krempl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Teschner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Spiliopoulou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the ECAI 2016. Frontiers in Artificial Intelligence and Applications</title>
				<meeting>of the ECAI 2016. Frontiers in Artificial Intelligence and Applications</meeting>
		<imprint>
			<publisher>IOS Press</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">285</biblScope>
			<biblScope unit="page" from="586" to="594" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Probabilistic active learning for active class selection</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kottke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Krempl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Stecklina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Rekowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Sabsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">P</forename><surname>Minh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Deliano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Spiliopoulou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sick</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the NIPS Workshop on the Future of Interactive Learning Machines</title>
				<meeting>of the NIPS Workshop on the Future of Interactive Learning Machines</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Active class selection</title>
		<author>
			<persName><forename type="first">R</forename><surname>Lomasky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">E</forename><surname>Brodley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Aernecke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Walt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Friedl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the ECML 2007. LNCS</title>
				<meeting>of the ECML 2007. LNCS</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="volume">4701</biblScope>
			<biblScope unit="page" from="640" to="647" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">On the calibration of sensor arrays for pattern recognition using the minimal number of experiments</title>
		<author>
			<persName><forename type="first">I</forename><surname>Rodriguez-Lujan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fonollosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vergara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Homer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Huerta</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Chemometrics and Intelligent Laboratory Systems</title>
		<imprint>
			<biblScope unit="volume">130</biblScope>
			<biblScope unit="page" from="123" to="134" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Active Learning</title>
		<author>
			<persName><forename type="first">B</forename><surname>Settles</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Synthesis Lectures on Artificial Intelligence and Machine Learning</title>
				<imprint>
			<publisher>Morgan &amp; Claypool Publishers</publisher>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Mining with rarity: A unifying framework</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">M</forename><surname>Weiss</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGKDD Explorations</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="7" to="19" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
