<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Ensemble of Neural Networks for Multi-label Document Classification</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Ladislav</forename><surname>Lenc</surname></persName>
							<email>llenc@kiv.zcu.cz</email>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Department of Computer Science and Engineering</orgName>
								<orgName type="department" key="dep2">Faculty of Applied Sciences</orgName>
								<orgName type="institution">University of West Bohemia</orgName>
								<address>
									<addrLine>Univerzitní 8</addrLine>
									<postCode>306 14</postCode>
									<settlement>Plzeň</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Faculty of Applied Sciences</orgName>
								<orgName type="laboratory">NTIS-New Technologies for the Information Society</orgName>
								<orgName type="institution">University of West Bohemia</orgName>
								<address>
									<addrLine>Technická 8</addrLine>
									<postCode>306 14</postCode>
									<settlement>Plzeň</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pavel</forename><surname>Král</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Department of Computer Science and Engineering</orgName>
								<orgName type="department" key="dep2">Faculty of Applied Sciences</orgName>
								<orgName type="institution">University of West Bohemia</orgName>
								<address>
									<addrLine>Univerzitní 8</addrLine>
									<postCode>306 14</postCode>
									<settlement>Plzeň</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Faculty of Applied Sciences</orgName>
								<orgName type="laboratory">NTIS-New Technologies for the Information Society</orgName>
								<orgName type="institution">University of West Bohemia</orgName>
								<address>
									<addrLine>Technická 8</addrLine>
									<postCode>306 14</postCode>
									<settlement>Plzeň</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Ensemble of Neural Networks for Multi-label Document Classification</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">2339569E4CDA2E995341E8D1D54BACA3</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T10:11+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Czech</term>
					<term>deep neural networks</term>
					<term>document classification</term>
					<term>multi-label</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper deals with multi-label document classification using an ensemble of neural networks. The assumption is that different network types can keep complementary information and that the combination of more neural classifiers will bring higher accuracy. We verify this hypothesis by an error analysis of the individual networks. One contribution of this work is thus evaluation of several network combinations that improve performance over one single network. Another contribution is a detailed analysis of the achieved results and a proposition of possible directions of further improvement. We evaluate the approaches on a Czech ČTK corpus and also compare the results with state-of-the-art approaches on the English Reuters-21578 dataset. We show that the ensemble of neural classifiers achieves competitive results using only very simple features.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>This paper deals with multi-label document classification by neural networks. Formally, this task can be seen as the problem of finding a model M which assigns a document d ∈ D a set of appropriate labels (categories) c ∈ C as follows M : d → c where D is the set of all documents and C is the set of all possible document labels. The multilabel classification using neural networks is often done by thresholding of the output layer <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. It has been shown that both standard feed-forward networks (FNNs) and convolutional neural networks (CNNs) achieve state-of-theart results on the standard corpora <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>.</p><p>However, we believe that there is still some room for further improvement. A combination of classifiers is a natural step forward. Therefore, we combine a CNN and an FNN in this work to gain further improvement in the terms of precision and recall. We support the claim that combination may bring better results by studying the errors of the individual networks. The main contribution of this paper thus consists in the analysis of errors in the prediction results of the individual networks. Then we present the results of several combination methods and illustrate that the ensemble of neural networks brings significant improvement over the individual networks.</p><p>The methods are evaluated on documents in the Czech language, being a representative of highly inflectional Slavic language with a free word order. These properties decrease the performance of usual methods. We further compare the results of our methods with other state-ofthe-art approaches on English Reuters-21578 <ref type="foot" target="#foot_0">1</ref> dataset in order to show its robustness across languages. Additionally we analyze the final F-measure on document sets divided according to the number of assigned labels in order to improve the accuracy of the presented approach.</p><p>The rest of the paper is organized as follows. Section 2 is a short review of document classification methods with a particular focus on neural networks. Section 3 describes our neural network models and the combination methods. Section 4 deals with experiments realized on the ČTK and Reuters corpora and then analyzes and discusses the obtained results. In the last section, we conclude the experimental results and propose some future research directions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Document classification is usually based on a supervised machine learning. A classifier is trained on an annotated corpus and it then assigns class labels to unlabelled documents. Most works use vector space model (VSM), which generally represents each document as a vector of all word occurrences usually weighted by their tf-idf.</p><p>Several classification methods have been successfully used <ref type="bibr" target="#b2">[3]</ref>, as for instance Bayesian classifiers, maximum entropy, support vector machines, etc. However, the main issue of this task is that the feature space is highly dimensional which decreases the classification results. Feature selection/reduction <ref type="bibr" target="#b3">[4]</ref> or better document representation <ref type="bibr" target="#b4">[5]</ref> can be used to solve this problem.</p><p>Nowadays, "deep" neural nets outperform majority of the state-of-the-art natural language processing (NLP) methods on several tasks with only very simple features. These include for instance POS tagging, chunking, named entity recognition and semantic role labelling <ref type="bibr" target="#b5">[6]</ref>. Several different topologies and learning algorithms were proposed. For instance, Zhang et al. <ref type="bibr" target="#b6">[7]</ref> propose two convolutional neural nets (CNN) for ontology classification, sen-timent analysis and single-label document classification. They show that the proposed method significantly outperforms the baseline approach (bag of words) on English and Chinese corpora. Another interesting work <ref type="bibr" target="#b7">[8]</ref> uses in the first layer pre-trained vectors from word2vec <ref type="bibr" target="#b8">[9]</ref>. The authors show that the proposed models outperform the state of the art on 4 out of 7 tasks, including sentiment analysis and question classification. Recurrent convolutional neural nets are used for text classification in <ref type="bibr" target="#b9">[10]</ref>. The authors demonstrated that their approach outperforms the standard convolutional networks on four corpora in singlelabel document classification task.</p><p>On the other hand, traditional feed-forward neural net architectures are used for multi-label document classification rather rarely. These models were more popular before as shown for instance in <ref type="bibr" target="#b10">[11]</ref>. They build a simple multi-layer perceptron with three layers (20 inputs, 6 neurons in hidden layer and 10 neurons in the output layer, i.e. number of classes) which gives F-measure about 78% on the standard Reuters dataset. The feed-forward neural networks were used for multi-label document classification in <ref type="bibr" target="#b11">[12]</ref>. The authors have modified standard backpropagation algorithm for multi-label learning (BP-MLL) which employs a novel error function. This approach is evaluated on functional genomics and text categorization.</p><p>A recent study on multi-label text classification was proposed by Nam et al. in <ref type="bibr" target="#b0">[1]</ref>. The authors build on the assumption that neural networks can model label dependencies in the output layer. They investigate limitations of multi-label learning and propose a simple neural network approach. The authors use cross-entropy algorithm instead of ranking loss for training and they also further employ recent advances in deep learning field, e.g. rectified linear units activation, AdaGrad learning with dropout <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14]</ref>. TF-IDF representation of documents is used as network input. The multi-label classification is handled by performing thresholding on the output layer. Each possible label has its own output node and based the final value of the node a final decision is made. The approach is evaluated on several multi-label datasets and reaches results comparable to the state of the art.</p><p>Another method <ref type="bibr" target="#b14">[15]</ref> based on neural networks leverages the co-occurrence of labels in the multi-label classification. Some neurons in the output layer capture the patterns of label co-occurrences, which improves the classification accuracy. The architecture is basically a convolutional network and utilizes word embeddings for initialization of the embedding layer. The method is evaluated on the natural language query classification in a document retrieval system.</p><p>An alternative approach to handling the multi-label classification is proposed by Yang and Gopal in <ref type="bibr" target="#b15">[16]</ref>. The conventional representations of texts and categories are transformed into meta-level features. These features are then utilized in a learning-to-rank algorithm. Experiments on six benchmark datasets show the abilities of this approach in comparison with other methods.</p><p>Another recent work proposes novel features based on the unsupervised machine learning <ref type="bibr" target="#b16">[17]</ref>.</p><p>A significant amount of work about combination of classifiers was done previously. Our approaches are motivated by the review of Tulyakov et al. <ref type="bibr" target="#b17">[18]</ref>.</p><p>3 Neural Networks and Combination</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Individual Nets</head><p>We use two individual neural nets with different activation functions (sigmoid and softmax) in the output layer. Their topologies are briefly presented in the following two sections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Feed-forward Deep Neural Network (FDNN)</head><p>We use a Multi-Layer Perceptron (MLP) with two hidden layers <ref type="foot" target="#foot_1">2</ref> . As the input of our network we use the simple bag of words (BoW) which is a binary vector where value 1 means that the word with a given index is present in the document. The size of this vector depends on the size of the dictionary which is limited by N most frequent words which defines the size of the input layer. The first hidden layer has 1024 while the second one has 512 nodes. This configuration was set based on the experimental results. The output layer has the size equal to the number of categories |C|. To handle the multi-label classification, we threshold the values of nodes in the output layer. Only the labels with values larger than a given threshold are assigned to the document.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Convolutional Neural Network (CNN)</head><p>The input is a sequence of words in the document. We use the same dictionary as in the previous approach. The words are then represented by the indexes into the dictionary. The architecture of our network (see Figure <ref type="figure" target="#fig_0">1</ref>) is motivated by Kim in <ref type="bibr" target="#b7">[8]</ref>. However, based on our preliminary experiments, we used only one-dimensional (1D) convolutional kernels instead of the combination of several sizes of 2D kernels. The input of our network is a vector of word indexes of the length L where L is the number of words used for document representation. The issue of the variable document size is solved by setting a fixed value (longer documents are shortened and the shorter ones padded). The second layer is an embedding layer which represents each input word as a vector of a given length. The document is thus represented as a matrix with L rows and EMB columns where EMB is the length of the embedding vectors. The third layer is the convolutional one. We use N C convolution kernels of the size K × 1 which means we do 1D convolution over one position in the embedding vector over K input words. The following layer performs max-pooling over the length L−K +1 resulting in N C 1×EMB vectors. The output of this layer is then flattened and connected with the output layer containing |C| nodes. The final result is, as in the previous case, obtained by the thresholding of the network outputs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Combination</head><p>We consider that the different nets keep some complementary information which can compensate recognition errors. We also assume that similar network topology with different activation functions can bring some different information and thus that all nets should have its particular impact for the final classification. Therefore, we consider all the nets as the different classifiers which will be further combined.</p><p>Two types of combination will be evaluated and compared. The first group does not need any training phase, while the second one learns a classifier.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Unsupervised Combination</head><p>The first combination method compensates the errors of individual classifiers by computing the average value from the inputs. This value is thresholded subsequently to obtain the final classification result. This method is called hereafter Averaged thresholding.</p><p>The second combination approach first thresholds the scores of all individual classifiers. Then, the final classification output is given as an agreement of the majority of the classifiers. We call this method as Majority voting with thresholding Supervised Combination We use another neural network of type multi-layer perceptron to combine the results. This network has three layers: n × |C| inputs, hidden layer with 512 nodes and the output layer composed of |C| neurons (number of categories to classify). n value is the number of the nets to combine. This configuration was set experimentally. We also evaluate and compare, as in the case of the individual classifiers, two different activation functions: sigmoid and softmax. These combination approaches are hereafter called FNN with sigmoid and FNN with softmax. According to the previous experiments with neural nets on multi-label classification, we assume better results of this net with sigmoid activation (see first part of Table <ref type="table" target="#tab_0">1</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head><p>In this section we first describe the corpora that we used for evaluation of our methods. Then, we describe the performed experiments and the final results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Tools and Corpora</head><p>For implementation of all neural nets we used Keras toolkit <ref type="bibr" target="#b18">[19]</ref> which is based on the Theano deep learning library <ref type="bibr" target="#b19">[20]</ref>. It has been chosen mainly because of good performance and our previous experience with this tool. All experiments were computed on GPU to achieve reasonable computation times.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Czech ČTK Corpus</head><p>For the following experiments we used first the Czech ČTK corpus. This corpus contains 2,974,040 words belonging to 11,955 documents. The documents are annotated from a set of 60 categories as for instance agriculture, weather, politics or sport out of which we used 37 most frequent ones. The category reduction was done to allow comparison with previously reported results on this corpus where the same set of 37 categories was used. We have further created a development set which is composed of 500 randomly chosen samples removed from the entire corpus. Figure <ref type="figure">2</ref> illustrates the distribution of the documents depending on the number of labels. Figure <ref type="figure">3</ref> shows the distribution of the document lengths (in word tokens). This corpus is freely available for research purposes at http://home.zcu.cz/~pkral/sw/. We use the five-folds cross validation procedure for all experiments on this corpus. The optimal value of the threshold is determined on the development set. For evaluation of the multi-label document classification results, we use the standard recall, precision and F-measure (F1) metrics <ref type="bibr" target="#b20">[21]</ref>. The values are micro-averaged.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Reuters-21578 English Corpus</head><p>The Reuters-21578 3 corpus is a collection of 21,578 documents. This corpus is used to compare our approaches with the state of the art. As suggested by many authors, the training part is composed of 7769 documents, while 3019 documents are reserved for testing. The number of possible categories is 90 and average label/document number is 1.23.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Results of the Individual Nets</head><p>The first experiment (see Table <ref type="table" target="#tab_0">1</ref>) shows the results of the individual neural nets with sigmoid and softmax activation functions against the baseline approach proposed by Brychcín et al. <ref type="bibr" target="#b16">[17]</ref>. These nets will be further referenced by the method number.</p><p>This table demonstrates very good classification performance of both individual nets and that the classification results are very close to each other and comparable. This table also shows that softmax activation function is slightly better for FDNN, while sigmoid activation function gives significantly better results for CNN.</p><p>Another interesting fact regarding to these results is that the approaches no. 1 -3 have comparable precision and 3 http://www.daviddlewis.com/resources/testcollections/reuters21578/ Error Analysis To confirm the potential benefits of the combination we analyze the errors of the individual nets. As already stated, we assume that different classifiers retain different information and thus they should bring different types of errors which could be compensated by a combination. Following analysis shows the numbers of incorrectly identified documents for two categories. We present the numbers of errors for all individual classifiers and compare it with the combination of all classifiers.</p><p>The upper part of Figure <ref type="figure" target="#fig_2">4</ref> is focused on the most frequent class -politics. The graph shows that the numbers of errors produced by the individual nets are comparable. However, the networks make errors on different documents and only few ones (384 from 2221 are common for all the nets.</p><p>The lower part of Figure <ref type="figure" target="#fig_2">4</ref> is concentrated on the less frequent class -chemical industry. This analysis demonstrates that the performances of the different nets significantly differ, the sigmoid activation function is substantially better than the softmax and the different nets provide also different types of errors. The number of the common errors is 49 (from 232 in total).</p><p>To conclude, both analysis clearly confirm our assumption that the combination should be beneficial for improvement of the results of the individual nets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Results of Unsupervised Combinations</head><p>The second experiment shows (see Table <ref type="table" target="#tab_1">2</ref>) the results of Averaged thresholding method. These results confirm our assumption that the different nets keep complementary information and that it is useful to combine them. This experiment further shows that the combination of the nets with lower scores (particularly with net no. 2) can degrade the final classification score (e.g. combination 1 &amp; 2 vs. individual net no. 1).</p><p>Another interesting, somewhat surprising, observation is that the CNN with the lowest classification accuracy can have some positive impact to the final classification  (e.g. combination 1 &amp; 3). However, the FDNN no. 2 (with significantly better results) brings only very small positive impact to any combination. The next experiment which is depicted in Table <ref type="table" target="#tab_2">3</ref> deals with the results of the second unsupervised combination method, Majority voting with thresholding. Note, that we consider an agreement of at least one half of the classifiers to obtain unambiguous results. Therefore, we evaluated the combinations of at least three networks.</p><p>This table shows that this combination approach brings also positive impact to document classification and the results of both methods are comparable. However, from the point of view of the contribution of the individual nets, the net no. 2 contributes better for the final results as in the previous case. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">Results of Supervised Combinations</head><p>The following experiments show the results of the supervised combination method with an FNN (see Sec 3.2). We have evaluated and compared the nets with both sigmoid (see Table <ref type="table" target="#tab_3">4</ref>) and softmax (see Table <ref type="table" target="#tab_4">5</ref>) activation functions. These tables show that these combinations have also positive impact on the classification and that sigmoid activation function brings better results than softmax. This is a similar behaviour as in the case of the individual nets. Moreover, as supposed, this supervised combination slightly outperforms both previously described unsupervised methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.6">Final Results Analysis</head><p>Finally, we analyze the results for the different document types. The main criterion was the number of the document labels. We assume that this number will play an important role for classification and intuitively, the documents with less labels will be easier to classify. We thus divided the documents into five distinct classes according to the number of labels (i.e. the documents with one, two, three and four labels and the remaining documents). Then, we tried to determine an optimal threshold for every class and report the F-measure. This value is compared to the results obtained with global threshold identified previously (one threshold for all documents). The results of this analysis are shown in Figure <ref type="figure" target="#fig_3">5</ref>. We have chosen two representative cases to analyze, the individual FDNN with softmax (left side) and the combination by Averaged thresholding method (right The adaptive threshold means that the threshold is optimized for each group of documents separately. The fixed threshold is the one that was optimized on the development set. This figure confirms our assumption. The best classification results are for the documents with one label and then they decrease. Moreover, this analysis shows that this number plays a crucial role for document classification for all cases. Hypothetically, if we could determine the number of labels for a particular document before the thresholding, we could improve the final F-measure by 1.5%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.7">Results on English Corpus</head><p>This experiment shows results of our methods on the frequently used Reuters-21578 corpus. We present the results on English dataset mainly for comparison with other stateof-the-art methods while we cannot provide such comparison on Czech data. Table <ref type="table" target="#tab_5">6</ref> shows the performance of proposed models on the benchmark Reuters-21578 dataset. The bottom part of the table provides comparison with other state-of-the-art methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions and Future Work</head><p>In this paper, we have used several combination methods to improve the results of individual neural nets for multilabel document classification of Czech text documents. We have also presented the results of our methods on a standard English corpus. We have compared several popular (unsupervised and also supervised) combination methods. 1 Approach proposed by Zhang et al. <ref type="bibr" target="#b11">[12]</ref> and used with ReLU activation, AdaGrad and dropout.   <ref type="bibr" target="#b21">[22]</ref> 89.8 86.0 87.9</p><p>The experimental results have confirmed our assumption that the different nets keep different information. Therefore, it is useful to combine them to improve the classification score of the individual nets. We have also proved that the thresholding is a good method to assign the document labels of multi-label classification. We have further shown that the results of all the approaches are comparable. However, the best combination method is the supervised one which uses an FNN with sigmoid activation function. The F-measure on Czech is 85.3% while the best result for English is 87.6%. Results on both languages are thus at least comparable with the state of the art.</p><p>One perspective for further work is to improve the com-bination methods while the error analysis has shown that there is still some room for improvement. We have also shown that knowing the number of classes could improve the result. Another perspective is thus to build a classifier with thresholds dependent on the number of labels.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: CNN architecture</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :Figure 3 :</head><label>23</label><figDesc>Figure 2: Distribution of documents depending on the number of labels assigned to the documents</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Error analysis of the individual nets for the most frequent (top, politics) and for the less frequent (bottom, chemical industry) classes, numbers of incorrectly identified documents in brackets</figDesc><graphic coords="5,95.89,163.58,162.26,118.62" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: F-measure according to the number of labels for adaptive and fixed thresholds, the upper graph shows the results for MLP with softmax while the lower one is for the combination of all nets</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Results of the individual nets with sigmoid and softmax activation functions against the baseline approach</figDesc><table><row><cell cols="6">No. Network/activation Prec. Recall F1 [%]</cell></row><row><cell>1.</cell><cell>FDNN</cell><cell>softmax</cell><cell>84.4</cell><cell>82.1</cell><cell>83.3</cell></row><row><cell>2.</cell><cell></cell><cell>sigmoid</cell><cell>83.0</cell><cell>81.2</cell><cell>82.1</cell></row><row><cell>3.</cell><cell>CNN</cell><cell>softmax</cell><cell>80.6</cell><cell>80.8</cell><cell>80.7</cell></row><row><cell>4.</cell><cell></cell><cell>sigmoid</cell><cell>86.3</cell><cell>81.9</cell><cell>84.1</cell></row><row><cell></cell><cell cols="2">Baseline [17]</cell><cell>89.0</cell><cell>75.6</cell><cell>81.7</cell></row><row><cell cols="6">recall, while the best performing method no. 4 has signifi-cantly better precision than recall (∆ ∼ 4%). This table further shows that three individual neural net-works outperform the baseline approach.</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Combinations of nets by Averaged thresholding</figDesc><table><row><cell>Net combi.</cell><cell>Precision</cell><cell>Recall</cell><cell>F1 [%]</cell></row><row><cell>1 &amp; 2</cell><cell>83.0</cell><cell>82.4</cell><cell>82.7</cell></row><row><cell>1 &amp; 3</cell><cell>83.2</cell><cell>84.6</cell><cell>83.9</cell></row><row><cell>1 &amp; 4</cell><cell>85.7</cell><cell>84.3</cell><cell>85.0</cell></row><row><cell>2 &amp; 3</cell><cell>86.2</cell><cell>79.6</cell><cell>82.8</cell></row><row><cell>2 &amp; 4</cell><cell>84.9</cell><cell>83.5</cell><cell>84.2</cell></row><row><cell>3 &amp; 4</cell><cell>87.3</cell><cell>81.7</cell><cell>84.4</cell></row><row><cell>1 &amp; 2 &amp; 3</cell><cell>84.8</cell><cell>81.9</cell><cell>83.3</cell></row><row><cell>1 &amp; 2 &amp; 4</cell><cell>90.1</cell><cell>79.6</cell><cell>84.5</cell></row><row><cell>1 &amp; 3 &amp; 4</cell><cell>86.7</cell><cell>83.5</cell><cell>85.1</cell></row><row><cell>2 &amp; 3 &amp; 4</cell><cell>89.3</cell><cell>80.5</cell><cell>84.6</cell></row><row><cell>1 &amp; 2 &amp; 3 &amp; 4</cell><cell>89.7</cell><cell>80.5</cell><cell>84.9</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Combinations of the nets by Majority voting with thresholding</figDesc><table><row><cell>Net combi.</cell><cell>Precision</cell><cell>Recall</cell><cell>F1 [%]</cell></row><row><cell>1 &amp; 2 &amp; 3</cell><cell>86.1</cell><cell>82.9</cell><cell>84.6</cell></row><row><cell>1 &amp; 2 &amp; 4</cell><cell>87.5</cell><cell>82.6</cell><cell>85.0</cell></row><row><cell>1 &amp; 3 &amp; 4</cell><cell>86.5</cell><cell>82.9</cell><cell>84.6</cell></row><row><cell>2 &amp; 3 &amp; 4</cell><cell>86.9</cell><cell>82.7</cell><cell>84.8</cell></row><row><cell>1 &amp; 2 &amp; 3 &amp; 4</cell><cell>84.1</cell><cell>85.7</cell><cell>84.9</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 :</head><label>4</label><figDesc>Combinations of the nets by FNN with sigmoid</figDesc><table><row><cell>Net combi.</cell><cell>Precision</cell><cell>Recall</cell><cell>F1 [%]</cell></row><row><cell>1 &amp; 2</cell><cell>86.1</cell><cell>82.1</cell><cell>84.1</cell></row><row><cell>1 &amp; 3</cell><cell>87.1</cell><cell>81.5</cell><cell>84.2</cell></row><row><cell>1 &amp; 4</cell><cell>88.4</cell><cell>81.9</cell><cell>85.0</cell></row><row><cell>2 &amp; 3</cell><cell>86.6</cell><cell>81.4</cell><cell>83.9</cell></row><row><cell>2 &amp; 4</cell><cell>87.7</cell><cell>82.0</cell><cell>84.7</cell></row><row><cell>3 &amp; 4</cell><cell>89.3</cell><cell>80.0</cell><cell>84.4</cell></row><row><cell>1 &amp; 2 &amp; 3</cell><cell>86.9</cell><cell>82.4</cell><cell>84.6</cell></row><row><cell>1 &amp; 2 &amp; 4</cell><cell>87.9</cell><cell>82.8</cell><cell>85.3</cell></row><row><cell>1 &amp; 3 &amp; 4</cell><cell>88.2</cell><cell>82.5</cell><cell>85.2</cell></row><row><cell>2 &amp; 3 &amp; 4</cell><cell>87.9</cell><cell>82.2</cell><cell>85.0</cell></row><row><cell>1 &amp; 2 &amp; 3 &amp; 4</cell><cell>88.0</cell><cell>82.8</cell><cell>85.3</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 :</head><label>5</label><figDesc>Combinations of the nets by FNN with softmax</figDesc><table><row><cell>Net combi.</cell><cell>Precision</cell><cell>Recall</cell><cell>F1 [%]</cell></row><row><cell>1 &amp; 2</cell><cell>85.3</cell><cell>81.6</cell><cell>83.4</cell></row><row><cell>1 &amp; 3</cell><cell>85.4</cell><cell>81.8</cell><cell>83.6</cell></row><row><cell>1 &amp; 4</cell><cell>86.3</cell><cell>82.6</cell><cell>84.4</cell></row><row><cell>2 &amp; 3</cell><cell>85.4</cell><cell>80.9</cell><cell>83.1</cell></row><row><cell>2 &amp; 4</cell><cell>86.1</cell><cell>82.0</cell><cell>84.0</cell></row><row><cell>3 &amp; 4</cell><cell>86.7</cell><cell>81.3</cell><cell>83.9</cell></row><row><cell>1 &amp; 2 &amp; 3</cell><cell>85.0</cell><cell>82.7</cell><cell>83.9</cell></row><row><cell>1 &amp; 2 &amp; 4</cell><cell>85.7</cell><cell>83.2</cell><cell>84.4</cell></row><row><cell>1 &amp; 3 &amp; 4</cell><cell>85.8</cell><cell>83.3</cell><cell>84.5</cell></row><row><cell>2 &amp; 3 &amp; 4</cell><cell>85.6</cell><cell>82.9</cell><cell>84.3</cell></row><row><cell>1 &amp; 2 &amp; 3 &amp; 4</cell><cell>85.7</cell><cell>83.6</cell><cell>84.6</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 6 :</head><label>6</label><figDesc>Results on the Reuters-21578 dataset</figDesc><table><row><cell>Method</cell><cell></cell><cell>Precision</cell><cell>Recall</cell><cell>F1 [%]</cell></row><row><cell>MLP/softmax</cell><cell></cell><cell>89.08</cell><cell>80.6</cell><cell>85.0</cell></row><row><cell>MLP/sigmoid</cell><cell></cell><cell>89.6</cell><cell>82.7</cell><cell>86.0</cell></row><row><cell>CNN/softmax</cell><cell></cell><cell>87.8</cell><cell>84.1</cell><cell>85.9</cell></row><row><cell>CNN/sigmoid</cell><cell></cell><cell>89.4</cell><cell>81.3</cell><cell>85.2</cell></row><row><cell cols="2">Supervised combi</cell><cell>91.4</cell><cell>84.1</cell><cell>87.6</cell></row><row><cell>NN AD [1]</cell><cell></cell><cell>90.4</cell><cell>83.4</cell><cell>86.8</cell></row><row><cell>BP − MLL TAD BR R</cell><cell>1</cell><cell>84.2</cell><cell>84.2</cell><cell>84.2</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.daviddlewis.com/resources/testcollections/reuters21578/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">We have also experimented with an MLP with one hidden layer with lower accuracy.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>This work has been supported by the project LO1506 of the Czech Ministry of Education, Youth and Sports. We also would like to thank the Czech New Agency ( ČTK) for support and for providing the data.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Large-scale multi-label text classification-revisiting neural networks</title>
		<author>
			<persName><forename type="first">J</forename><surname>Nam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">L</forename><surname>Mencía</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fürnkranz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Joint European Conference on Machine Learning and Knowledge Discovery in Databases</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="437" to="452" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Deep neural networks for czech multilabel document classification</title>
		<author>
			<persName><forename type="first">L</forename><surname>Lenc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Král</surname></persName>
		</author>
		<idno>CoRR abs/1701.03849</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Inducing features of random fields</title>
		<author>
			<persName><forename type="first">S</forename><surname>Della Pietra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Della Pietra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lafferty</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="380" to="393" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A comparative study on feature selection in text categorization</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">O</forename><surname>Pedersen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fourteenth International Conference on Machine Learning. ICML &apos;97</title>
				<meeting>the Fourteenth International Conference on Machine Learning. ICML &apos;97<address><addrLine>San Francisco, CA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Morgan Kaufmann Publishers Inc</publisher>
			<date type="published" when="1997">1997</date>
			<biblScope unit="page" from="412" to="420" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora</title>
		<author>
			<persName><forename type="first">D</forename><surname>Ramage</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Nallapati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 -Volume 1</title>
		<title level="s">Association for Computational Linguistics</title>
		<meeting>the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 -Volume 1<address><addrLine>Stroudsburg, PA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="248" to="256" />
		</imprint>
	</monogr>
	<note>EMNLP &apos;09</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Natural language processing (almost) from scratch</title>
		<author>
			<persName><forename type="first">R</forename><surname>Collobert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bottou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Karlen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kavukcuoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kuksa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2493" to="2537" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Text understanding from scratch</title>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1502.01710</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Convolutional neural networks for sentence classification</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Kim</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1408.5882</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Efficient estimation of word representations in vector space</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Workshop at ICLR</title>
				<meeting>Workshop at ICLR</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Recurrent convolutional neural networks for text classification</title>
		<author>
			<persName><forename type="first">S</forename><surname>Lai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhao</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">One-class document classification via neural networks</title>
		<author>
			<persName><forename type="first">L</forename><surname>Manevitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yousef</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neurocomputing</title>
		<imprint>
			<biblScope unit="volume">70</biblScope>
			<biblScope unit="issue">7-9</biblScope>
			<biblScope unit="page" from="1466" to="1481" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Multilabel neural networks with applications to functional genomics and text categorization</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">H</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge Data Engineering</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page" from="1338" to="1351" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
	<note>IEEE Transactions on</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Rectified linear units improve restricted boltzmann machines</title>
		<author>
			<persName><forename type="first">V</forename><surname>Nair</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">E</forename><surname>Hinton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 27th international conference on machine learning (ICML-10)</title>
				<meeting>the 27th international conference on machine learning (ICML-10)</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="807" to="814" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Dropout: a simple way to prevent neural networks from overfitting</title>
		<author>
			<persName><forename type="first">N</forename><surname>Srivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">E</forename><surname>Hinton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Krizhevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Salakhutdinov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1929" to="1958" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Improved neural network-based multi-label classification with better initialization leveraging label co-occurrence</title>
		<author>
			<persName><forename type="first">G</forename><surname>Kurata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Xiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of NAACL-HLT</title>
				<meeting>NAACL-HLT</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="521" to="526" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Multilabel classification with metalevel features in a learning-to-rank framework</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gopal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">88</biblScope>
			<biblScope unit="issue">1-2</biblScope>
			<biblScope unit="page" from="47" to="68" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Novel unsupervised features for Czech multi-label document classification</title>
		<author>
			<persName><forename type="first">T</forename><surname>Brychcín</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Král</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">13th Mexican International Conference on Artificial Intelligence (MI-CAI 2014)</title>
				<meeting><address><addrLine>Tuxtla Gutierrez, Chiapas, Mexic, Springer</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014-11-22">16-22 November 2014</date>
			<biblScope unit="page" from="70" to="79" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Review of classifier combination methods</title>
		<author>
			<persName><forename type="first">S</forename><surname>Tulyakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jaeger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Govindaraju</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Doermann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Machine Learning in Document Analysis and Recognition</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="361" to="386" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Chollet</surname></persName>
		</author>
		<ptr target="https://github.com/fchollet/keras(2015" />
		<title level="m">keras</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Theano: a cpu and gpu math expression compiler</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bergstra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Breuleux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bastien</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lamblin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pascanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Desjardins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Turian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Warde-Farley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Python for scientific computing conference (SciPy)</title>
				<meeting>the Python for scientific computing conference (SciPy)<address><addrLine>Austin, TX</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page">3</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Evaluation: From precision, recall and fmeasure to roc., informedness, markedness &amp; correlation</title>
		<author>
			<persName><forename type="first">D</forename><surname>Powers</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Technologies</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="37" to="63" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Statistical topic models for multi-label document classification</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">N</forename><surname>Rubin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chambers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Smyth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Steyvers</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine learning</title>
		<imprint>
			<biblScope unit="volume">88</biblScope>
			<biblScope unit="issue">1-2</biblScope>
			<biblScope unit="page" from="157" to="208" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
