<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Activist: A New Framework for Dataset Labelling</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jack</forename><surname>O'neill</surname></persName>
							<email>jack.oneill1@mydit.ie</email>
							<affiliation key="aff0">
								<orgName type="institution">Dublin Institute of Technology</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sarah</forename><forename type="middle">Jane</forename><surname>Delany</surname></persName>
							<email>sarahjane.delany@dit.ie</email>
							<affiliation key="aff0">
								<orgName type="institution">Dublin Institute of Technology</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Brian</forename><surname>Macnamee</surname></persName>
							<email>brian.macnamee@ucd.ie</email>
							<affiliation key="aff1">
								<orgName type="institution">University College Dublin</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Activist: A New Framework for Dataset Labelling</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">32BB2DBF8DE864053CCA84014574BB19</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T14:14+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acquiring labels for large datasets can be a costly and timeconsuming process. This has motivated the development of the semisupervised learning problem domain, which makes use of unlabelled data -in conjunction with a small amount of labelled data -to infer the correct labels of a partially labelled dataset. Active Learning is one of the most successful approaches to semi-supervised learning, and has been shown to reduce the cost and time taken to produce a fully labelled dataset. In this paper we present Activist; a free, online, state-of-theart platform which leverages active learning techniques to improve the efficiency of dataset labelling. Using a simulated crowd-sourced label gathering scenario on a number of datasets, we show that the Activist software can speed up, and ultimately reduce the cost of label acquisition.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The availability of a large corpus of labelled training data is a key component in developing effective machine learning models. In many cases, such as speech recognition systems and sentiment analysis, labels are time-consuming or expensive to obtain, and must be provided by human annotators, constituting a bottleneck in the predictive model development life-cycle. Recent trends have seen an increased interest in using crowd-sourcing platforms such as CrowdFlower<ref type="foot" target="#foot_0">3</ref> , and Amazon Mechanical Turk<ref type="foot" target="#foot_1">4</ref> to distribute the task of dataset labelling over a large number of anonymous oracles <ref type="bibr" target="#b20">[21]</ref>. While crowd-sourced labels may reduce both the cost and time required to obtain a fully labelled dataset, further reductions may be realized by employing active learning to reduce the number of labels required.</p><p>The key insight behind active learning is that "a machine learning algorithm can perform better with less training if it is allowed to choose the data from which it learns" <ref type="bibr" target="#b15">[16]</ref>. By allowing the active learning system to select the most informative data, and pose queries for labels for this data to the label provider, or oracle, the cost and time required to train an effective machine learning model can be greatly reduced.</p><p>Although the actual utility of a label may not be known in advance, an active learning system may employ one or more heuristics to predict the utility of querying for a particular label. This decision-making process, or selection strategy, is a key component of the active learning process. An active learning system begins with a small amount of pre-labelled, -or seed -data, and proceeds in iterations. Through its selection strategy, the system generates a query for a batch of labels from the unlabelled data. These labels are provided by the oracle, and the data is added to the labelled set. The process continues until a pre-determined stopping criterion is reached. A stopping criterion may be a straightforward label budget, or a more complex prediction of the marginal utility of each new label. Once this stopping criterion is met, a predictive model is trained using the set of labelled data. While active learning is primarily used in the context of predictive model generation, these same principles may be applied to a dataset labelling task. The process is carried out as above, but the resulting model is used to predict the labels of the remaining unlabelled data. The output of an active labelling task is, then, a fully labelled, approximately correct dataset.</p><p>A dataset labelling task may be seen as an instance of active learning in a pool-based setting, i.e. a setting in which the learner has access to a large, static pool of unlabelled instances from which to generate label requests. By submitting some, but not all data to oracles for labelling, the goal of the active learning system in this context is to reduce the cost accrued and time spent per correct label acquired, while maintaining accuracy. This paper presents Activist, an extensible framework which assists users in all aspects of the data labelling process. As well as giving users the ability to configure an active labelling task, Activist provides a front-end UI for providing labels to the active learning system. The system covers all aspects of the dataset labelling process from loading and pre-processing the data, to creating a fully labelled output dataset once the process is complete. In addition to assisting users in producing fully labelled datasets, Activist allows multiple active learning strategies to be compared on simulated dataset labelling tasks, creating a detailed performance analysis for each approach under examination.</p><p>In this paper we describe the Activist system, and show how it can be used in an evaluation investigating the cost-benefit of applying active learning to a number of dataset labelling tasks. We show that while the impact of active labelling varies depending on the task, an active labelling approach consistently outperforms full dataset labelling.</p><p>The rest of the paper is structured as follows: Section 2 discusses related research in the areas of active learning and cost-sensitive labelling; Section 3 describes the Activist framework, and how it can be used to support the active learning process; Section 4 evaluates the use of Activist on a number of datasets, exploring the cost-benefits of applying active learning to a dataset labelling task; finally, Section 5 discusses the findings, suggesting avenues for future research. This paper examines the use of active learning in a pool-based setting, i.e. a setting in which the learner has access to a large, static pool of unlabelled instances from which to generate label requests. The problem of pool-based active learning was introduced by Lewis and Gale <ref type="bibr" target="#b9">[10]</ref> in response to the need to develop text classification models for document retrieval. One of the key components which differentiates approaches to active learning is the selection strategy -the heuristic used to predict the informativeness of a particular label. Initial approaches to selection strategies favoured some measure of uncertainty sampling <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>, selecting those instances for labelling which are closest to the decision boundary of the model, i.e. those which the model was most likely to classify incorrectly.</p><p>An alternative selection strategy to uncertainty sampling is the Query-By-Committee (QBC) approach, introduced by Seung et al. <ref type="bibr" target="#b16">[17]</ref>. QBC describes a general approach in which a number of diverse classifiers are trained on the currently labelled data, such that the classifiers can be expected to produce slightly different results for each unseen instance. The learner then measures the level of disagreement between the classifiers for each unlabelled instance and selects those instances which induce the highest level of disagreement between the classifiers in the committee. Variations on the QBC algorithm continue to be popular in the literature <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b10">11]</ref>.</p><p>Although measures of diversity have often been incorporated into other active learning selection strategies <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b7">8]</ref>, diversity measurements were first proposed as a sole metric in a selection strategy by Baram et al. <ref type="bibr" target="#b0">[1]</ref>. Their Kernel-Farthest-First diversity algorithm seeks to label those instances which are least similar to the currently labelled data. Diversity, as a selection strategy, has been shown to work well in text classification <ref type="bibr" target="#b6">[7]</ref>, and in regression problems <ref type="bibr" target="#b12">[13]</ref>.</p><p>Research has shown that the labelling process of text-based classification datasets may be made more efficient by using visualisations to assist in the labelling process <ref type="bibr" target="#b18">[19]</ref> or by using machine learning techniques to reduce the number of labels required of the annotator <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b8">9]</ref>. Active learning has also been shown to improve the efficiency of dataset labelling for image classification <ref type="bibr" target="#b11">[12]</ref>, while the availability of commercial platforms such as CrowdFlower attest to the viability of active learning as a dataset labelling tool.</p><p>For a more in-depth discussion of the components comprising an active learning system (e.g. selection strategies, stopping criteria, etc.) see <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b7">8]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Activist</head><p>The Activist Framework provides an end-to-end solution for dataset labelling tasks. Using Activist, the dataset labelling process consists of 4 stages:loading, pre-processing, labelling and output. The life-cycle of an Activist task is illustrated in Figure <ref type="figure" target="#fig_1">3</ref>.</p><p>The simplest data format understood by Activist is the comma-separated values (csv) file. However, for many real-world problems (image or document dataset. Labels are hidden from the system until requested. After each batch of label requests is issued, the chosen predictive model is trained and used to predict the labels of the remaining data. Accuracy and execution times are recorded and returned to the researcher as a csv file when the process is complete, allowing for direct comparison of multiple approaches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Evaluation</head><p>The aim of the evaluation is to explore the potential of the Activist framework to reduce the number of manually required labels needed to produce a fully labelled dataset. This section describes the data and methodology used in the experiment, and reports the findings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Datasets Used</head><p>Three datasets were used in this experiment, the MNist handwriting recognition dataset, the CIFAR-10 image classification dataset and the 20 Newsgroups document classification dataset. The MNist dataset 5 consists of 50,000 28x28 pixel gray-scale images of hand-written digits between 0 and 9. Each image is represented as a pixel map containing the value of each pixel as an unsigned byte. Another image classification dataset, CIFAR-10 6 consists of 60,000 32x32 colour images in 10 equally distributed classes, indicating the content of the image -all subcategories of vehicles and animals, for example: airplane, automobile, bird, cat, dog etc.. Images are represented as a pixel graph containing RGB values for each pixel as unsigned bytes. Rather than using the raw pixel values directly, individual pixels were aggregated into row and column totals for each colour channel, resulting in a vector of 192 features. The 20 Newsgroups<ref type="foot" target="#foot_4">7</ref> dataset is a freely available document classification dataset, consisting of approximately 20,000 documents partitioned approximately evenly across 20 different newsgroups. Each document was represented as a bag of words. The data was stemmed, with stop words removed, and words occurring in fewer than 3 separate documents removed as part of the data pre-processing stage. In order to reduce dataset size and the problem complexity, a subsection of the data containing 5 of the 20 newsgroups, -alt.atheism, comp.windows.x, rec.autos, sci.space, talk.politics.guns -was chosen.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Experimental Methodology</head><p>The active learning approach used in these experiments was set up using the Activist framework. As part of the task configuration, choices need to be made for the active learning components used in this the task: seed data, a batch size, a selection strategy, a stopping criterion and a predictive model algorithm. The following system was used for each of the datasets under consideration.</p><p>Seed Data: 50 initial labels were randomly selected and provided to the active learning system as seed data Batch Size: To keep the batch sizes roughy proportional to the size of the datasets, the MNist dataset used a batch size of 10, while the CIFAR-10 and 20 Newsgroups datasets were evaluated with a batch size of 50</p><p>Stopping Criterion: The active learning loop was run until no unlabelled data remained, with performance recorded after each batch was complete.</p><p>Selection Strategies: A Query-by-Committee algorithm, using a committee of 5 k-nearest neighbour models was created, using k=5, with each committee member trained on a subset consisting of 80% of the data, selected randomly with replacement. An alternative, diversity-based selection strategy was also employed, using cosine distance as its distance metric. Finally, a random selection strategy, which makes no effort to select the best labels for querying, was evaluated as a baseline for selection strategies.</p><p>Predictive Model: A k-nearest neighbour predictive model with k=5 was used to classify the remaining unlabelled data, after each iteration.</p><p>After each new batch of labels was added to the labelled dataset, a predictive model was trained using the currently labelled data, and used to predict the labels of the remaining unlabelled data. The number of correct labels (labels provided by oracle + correctly predicted labels) was recorded at each step.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Findings</head><p>Figure <ref type="figure" target="#fig_1">3</ref> shows the results of the experiment. After each batch of labels was requested, a predictive model was trained using the currently labelled data, and used to predict labels for the as-yet unlabelled data. The overall accuracy is recorded on the y-axis while the number of labels provided by the oracle is recorded on the x-axis. The black dashed line represents the accuracy obtained in the absence of an active labelling system i.e. the number of labels provided by the oracle. The difference on the y-axis between the dashed and solid lines represents the accuracy-gain provided by the active labelling framework.</p><p>The MNist dataset demonstrates that Activist can significantly improve the labelling rate of some datasets. Although less pronounced, the CIFAR10 and Newsgroups datasets benefit from employing active labelling techniques. These results also show that the benefit gained from active labelling is dependent on the characteristics of the dataset being used on the related prediction problem. The results show that in all cases, a random selection strategy can yield demonstrable performance benefits over manual labelling, represented by the x=y baseline. This indicates that, although the performance of the Activist system differs depending on the selection strategy chosen, applying active learning techniques to dataset labelling yields a visible performance improvement irrespective of the particular selection strategy used.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions and Future Work</head><p>This paper presented Activist, a platform for applying active learning techniques to the problem of dataset labelling. Activist reduces the amount manual dataset labelling required to produce a fully labelled, approximately correct dataset. The Activist platform is under active development and is available for download online <ref type="foot" target="#foot_5">8</ref> .</p><p>This evaluation has demonstrated the potential benefits of applying active learning to dataset labelling. Future work will expand the capabilities of the framework to further facilitate labelling large datasets. In order to take advantage of the benefits of crowd-sourced labelling, future work will incorporate an API to allow users to obtain labels from on-line crowdsourcing platforms.</p><p>The Activist framework will be expanded to include a wider variety of active learning components, particularly predictive models. Convolutional neural networks have been shown to be effective at classifying the CIFAR10 dataset <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b5">6]</ref>, while SVMs have been shown to work well classifying the 20 newsgroups dataset <ref type="bibr" target="#b14">[15]</ref>. The inclusion of a wider range of predictive models is anticipated to yield a greater benefit for a larger number of datasets.</p><p>In its current format, the Activist system relies on a single label per instance. This approach is known to be problematic due to errors or subjectivity in the labelling process. Strategies for coping with this problem have been discussed in further detail by Tarasov <ref type="bibr" target="#b19">[20]</ref>. Future work will aim to allow the Activist system to handle multiple responses per instance in an effort to mitigate the impact of subjectivity and rater unreliability on the labelling process.</p><p>The experiment has shown that the performance of active labelling depends to some extent on the selection strategies used. This suggests that a deeper investigation of the relative impact of all active learning components may prove promising. In addition to adding a wider range of components to the Activist platform, we hope to develop heuristics which will guide users in tailoring an active learning task to the problem at hand.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Flow diagram illustrating the life-cycle of an Activist task</figDesc><graphic coords="4,145.72,121.61,323.92,156.13" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Graphs showing the accuracy achieved per labels requested on each of the datasets examined. The dashed black line represents the number of correct labels in the absence of an active labelling system.</figDesc><graphic coords="7,309.73,521.81,172.91,96.53" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="5,145.72,121.61,323.91,175.75" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">https://www.crowdflower.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">https://www.mturk.com/mturk/welcome</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2">http://yann.lecun.com/exdb/mnist/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_3">https://www.cs.toronto.edu/ ~kriz/cifar.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_4">http://qwone.com/ ~jason/20Newsgroups/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_5">https://github.com/joneill87/Activist</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Online choice of active learning algorithms</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Baram</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>El-Yaniv</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Luz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="255" to="291" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Incorporating diversity in active learning with support vector machines</title>
		<author>
			<persName><forename type="first">K</forename><surname>Brinker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ICML</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="59" to="66" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Active learning with statistical models</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Cohn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Ghahramani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">I</forename><surname>Jordan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of artificial intelligence research</title>
		<imprint>
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Minimizing manual annotation cost in supervised training from corpora</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">P</forename><surname>Engelson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Dagan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 34th annual meeting on Association for Computational Linguistics</title>
				<meeting>the 34th annual meeting on Association for Computational Linguistics</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="1996">1996</date>
			<biblScope unit="page" from="319" to="326" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Query by committee made real</title>
		<author>
			<persName><forename type="first">R</forename><surname>Gilad-Bachrach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Navot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tishby</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="443" to="450" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Spatially-sparse convolutional neural networks</title>
		<author>
			<persName><forename type="first">B</forename><surname>Graham</surname></persName>
		</author>
		<idno>CoRR abs/1409.6070</idno>
		<ptr target="http://arxiv.org/abs/1409.6070" />
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Active learning for text classification</title>
		<author>
			<persName><forename type="first">R</forename><surname>Hu</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
		<respStmt>
			<orgName>Dublin Institute of Technology</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Egal: Exploration guided active learning for tcbr</title>
		<author>
			<persName><forename type="first">R</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Delany</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mac Namee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Case-Based Reasoning</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="156" to="170" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Sweetening the dataset: Using active learning to label unlabelled datasets</title>
		<author>
			<persName><forename type="first">R</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mac Namee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Delany</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A sequential algorithm for training text classifiers</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">D</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">A</forename><surname>Gale</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval</title>
				<meeting>the 17th annual international ACM SIGIR conference on Research and development in information retrieval</meeting>
		<imprint>
			<publisher>Springer-Verlag New York, Inc</publisher>
			<date type="published" when="1994">1994</date>
			<biblScope unit="page" from="3" to="12" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Active learning for cross-domain sentiment classification</title>
		<author>
			<persName><forename type="first">S</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IJCAI</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Multilabel svm active learning for image classification</title>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sung</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICIP&apos;04. 2004 International Conference on</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2004">2004. 2004</date>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="2207" to="2210" />
		</imprint>
	</monogr>
	<note>Image Processing</note>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">An evaluation of selection strategies for active learning with regression</title>
		<author>
			<persName><forename type="first">J</forename><surname>O'neill</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Semi-automated annotation and active learning for language documentation</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Palmer</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Less is more: Active learning with support vector machines</title>
		<author>
			<persName><forename type="first">G</forename><surname>Schohn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cohn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICML</title>
				<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="2000">2000</date>
			<biblScope unit="page" from="839" to="846" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Active learning</title>
		<author>
			<persName><forename type="first">B</forename><surname>Settles</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Synthesis Lectures on Artificial Intelligence and Machine Learning</title>
				<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="1" to="114" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Query by committee</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">S</forename><surname>Seung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Opper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Sompolinsky</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the fifth annual workshop on Computational learning theory</title>
				<meeting>the fifth annual workshop on Computational learning theory</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="1992">1992</date>
			<biblScope unit="page" from="287" to="294" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Springenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dosovitskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Brox</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Riedmiller</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.6806</idno>
		<title level="m">Striving for simplicity: The all convolutional net</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Articulate: A semi-automated model for translating natural language queries into meaningful visualizations</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Leigh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Symposium on Smart Graphics</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="184" to="195" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Dynamic estimation of rater reliability using multi-armed bandits</title>
		<author>
			<persName><forename type="first">A</forename><surname>Tarasov</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">A survey of crowdsourcing systems</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Yuen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>King</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Leung</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2011">2011. 2011</date>
			<biblScope unit="page" from="766" to="773" />
		</imprint>
	</monogr>
	<note>IEEE Third International Conference on</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
