<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Benchmarking Multi-label Classification Algorithms</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Arjun</forename><surname>Pakrashi</surname></persName>
							<email>arjun.pakrashi@insight-centre.org</email>
							<affiliation key="aff0">
								<orgName type="department">Insight Centre for Data Analytics</orgName>
								<orgName type="institution">University College Dublin</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Derek</forename><surname>Greene</surname></persName>
							<email>derek.greene@ucd.ie</email>
							<affiliation key="aff0">
								<orgName type="department">Insight Centre for Data Analytics</orgName>
								<orgName type="institution">University College Dublin</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Brian</forename><forename type="middle">Mac</forename><surname>Namee</surname></persName>
							<email>brian.macnamee@ucd.ie</email>
							<affiliation key="aff0">
								<orgName type="department">Insight Centre for Data Analytics</orgName>
								<orgName type="institution">University College Dublin</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Benchmarking Multi-label Classification Algorithms</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">241AE2AB5D4C98C65251F57424564C2B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T14:13+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Multi-label classification is an approach to classification problems that allows each data point to be assigned to more than one class at the same time. Real life machine learning problems are often multi-label in nature-for example image labelling, topic identification in texts, and gene expression prediction. Many multi-label classification algorithms have been proposed in the literature and, although there have been some benchmarking experiments, many questions still remain about which approaches perform best for certain kinds of multi-label datasets. This paper presents a comprehensive benchmark experiment of eleven multilabel classification algorithms on eleven different datasets. Unlike many existing studies, we perform detailed parameter tuning for each algorithmdataset pair so as to allow a fair comparative analysis of the algorithms. Also, we report on a preliminary experiment which seeks to understand how the performance of different multi-label classification algorithms changes as the characteristics of multi-label datasets are adjusted.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>There are many important real-life classification problems in which a data point can be a member of more than one class simultaneously <ref type="bibr" target="#b8">[9]</ref>. For example, a gene sequence can be a member of multiple functional classes, or a piece of music can be tagged with multiple genres. These types of problems are known as multi-label classification problems <ref type="bibr" target="#b22">[23]</ref>. In multi-label problems there are typically a finite set of potential labels that can be applied to data points. The set of labels that are applicable to a specific data point are known as the relevant labels, while those that are not applicable are known as irrelevant labels.</p><p>Early, naïve approaches to the multi-label problem (e.g. <ref type="bibr" target="#b0">[1]</ref>) consider each label independently using a one-versus-all binary classification approach to predict the relevance of an individual label to a data point. The outputs of a set of these individual classifiers are then aggregated into a set of relevant labels. Although these approaches can work well <ref type="bibr" target="#b10">[11]</ref>, their performance tends to degrade significantly as the number of potential labels increases. The prediction of a group of relevant labels effectively involves finding a point in a multi-dimensional label space, and as the number of labels increases this becomes more challenging as this space becomes more and more sparse. An added challenge is that multi-label problems can suffer from a very high degree of label imbalance. To address these challenges, more sophisticated multi-label classification algorithms <ref type="bibr" target="#b8">[9]</ref> attempt to exploit the associations between labels, and use ensemble approaches to break the problem into a series of less complex problems (e.g. <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b13">14]</ref>).</p><p>We describe an experiment to benchmark the performance of eleven of the most widely-cited approaches to multi-label classification on a set of eleven multilabel classification datasets. While there are existing benchmarks of this type (eg. <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>), they do not sufficiently tune the hyper-parameters for each algorithm, and so do not compare approaches in a fair way. In this experiment extensive hyper-parameter tuning is performed. The paper also presents the results of an initial experiment to investigate how the performance of different multi-label classification algorithms changes as the characteristics of datasets (e.g. the size of the set of potential labels) change.</p><p>The remainder of the paper is structured as follows. Section 2 provides a brief survey of existing multi-label classification algorithms and previous benchmark studies. Section 3 describes the benchmark experiment, along with an analysis of the results of this experiment. Section 4 describes the experiment performed to explore the performance of multi-label classification algorithms as the characteristics of the dataset change. Section 5 draws conclusions from the experimental results and outlines a path for future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Multi-Label Classification Algorithms</head><p>Multi-label classification algorithms can be divided into two categories: problem transformation and algorithm adaptation <ref type="bibr" target="#b22">[23]</ref>. The problem transformation approach transforms the multi-label dataset so that existing multi-class algorithms can be used to solve the transformed problem. Algorithm adaptation methods extend multi-class algorithms to directly work with multi-label datasets. In this section the most widely used approaches in each category will be described (including those used in the experiment described in Section 3). The section will end with a review of existing benchmark experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Problem Transformation</head><p>The most trivial approach to multi-label classification is the binary relevance method <ref type="bibr" target="#b0">[1]</ref>. Binary relevance adopts a one-vs-all ensemble approach, training independent binary classifiers to predict the relevance of each label to a data point. The independent predictions are then aggregated to form a set of relevant labels. Although binary relevance is a simple approach, Luaces et al. <ref type="bibr" target="#b10">[11]</ref> show that a properly implemented binary relevance model, with a carefully selected base classifier, can achieve good results.</p><p>Classifier chains <ref type="bibr" target="#b13">[14]</ref> take a similar approach to binary relevance but explicitly take the associations between labels into account. Again a one-vs-all classifier is built for each label, but these classifiers are chained together in order such that the outputs of classifiers early in the chain (the relevance of specific labels) are used as inputs into subsequent classifiers.</p><p>Rather than trying to transform the multi-label classification problem into multiple binary classification problems, the label powerset method <ref type="bibr" target="#b0">[1]</ref> transforms the multi-label problem into a single multi-class classification problem. Each unique combination of relevant labels is mapped to a class to create a transformed multi-class dataset which can be used to train a classification model using any multi-class learning algorithm. Although the label powerset method can perform well, as the number of labels increases the number of possible unique label combinations grows exponentially giving rise to a very sparse and imbalanced equivalent multi-class dataset.</p><p>The random k-label set (RAkEL) approach <ref type="bibr" target="#b19">[20]</ref> attempts to strike a balance between the binary relevance and label powerset approaches. RAkEL divides the full set of potential labels in a multi-label problem into a series of label subsets, and for each subset builds a label powerset model. By creating multiple multilabel problems with small numbers of labels, RAkEL reduces the sparseness and imbalance that affects the label powerset method, but still takes advantage of the associations that exist between labels.</p><p>Hierarchy of multi-label classifiers (HOMER) <ref type="bibr" target="#b16">[17]</ref> also divides the multi-label dataset into smaller subsets of labels, but in a hierarchical manner. Calibrated label ranking (CLR) <ref type="bibr" target="#b7">[8]</ref> takes a paired approach by training an ensemble of classifiers for each possible pair of labels in the dataset using only the data points which have either of the labels in the pair assigned to them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Algorithm Adaptation</head><p>Multi-label k-nearest neighbour (MLkNN ) <ref type="bibr" target="#b24">[25]</ref> is one of the most widely cited algorithm adaptation approaches. MLkNN is essentially a binary relevance algorithm, which acts on the labels individually, but instead of applying the standard k-nearest neighbour algorithm directly, it combines it with the maximum a posteriori principle. Dependent MLkNN (DMLkNN ) <ref type="bibr" target="#b21">[22]</ref> follows the same principle as MLkNN but incorporates all of the labels while deciding the probability for each label, therefore taking label associations into account. IBLR-ML <ref type="bibr" target="#b3">[4]</ref> is another modification of the k-nearest neighbour algorithm. It finds the nearest neighbours of the data point to be labeled, and trains a logistic regression model for each label using the labels of these neighbourhood points as features, thus taking the label associations into account. An algorithmic performance improvement of binary relevance combined with standard k-nearest neighbour, BRkNN, has also been proposed <ref type="bibr" target="#b15">[16]</ref>.</p><p>Multi-label decision tree (ML-DT) <ref type="bibr" target="#b4">[5]</ref> extends the C4.5 decision tree algorithm to allow multiple labels in the leaves, and choose node splits based on a re-defined multi-label entropy function. Rank-SVM <ref type="bibr" target="#b6">[7]</ref>, is a support vector machine based approach that defines one-vs-all SVM classifiers for each label, but uses a cost function across all of these models that captures incorrect predictions of pairs of relevant and irrelevant labels. Backpropagation for multi-label learning (BPMLL) <ref type="bibr" target="#b23">[24]</ref>, is a neural network modification used to train multi-label datasets using a single hidden layer feed forward architecture using the back propagation algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Multi-label Classification Benchmark Studies</head><p>A number of papers that describe new multi-label classification approaches <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b14">15]</ref> benchmark different multi-label classification algorithms against their newly proposed method. One of the limitations of these studies, however, is a lack of hyper-parameter tuning, and a reliance on default hyper-parameter settings. Rather than proposing a new algorithm, Madjarov et al. <ref type="bibr" target="#b12">[13]</ref> describes a benchmark study of several multi-label classification algorithms using several datasets. Hyper-parameter tuning is performed in this study. There is, however, a mismatch between the hamming loss measure used to select hyper-parameters and the measures used to evaluate performance in the benchmark. The study identifies HOMER, binary relevance, and classifier chains as promising approaches.</p><p>To perform a fair comparison of algorithms, the benchmark experiment described in this paper uses extensive parameter tuning. For consistency, the measure used to guide this parameter tuning-label based macro averaged F-Score (see Section 3.2)-is the same as the measure used to compare algorithms in the benchmark. The set of algorithms used overlaps with, but is different than, those in Madjarov et al. <ref type="bibr" target="#b12">[13]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Multi-label Classification Algorithm Benchmark</head><p>This section describes a benchmark experiment performed to evaluate the performance of a collection of multi-label classification algorithms across several datasets. This section introduces the datasets and performance measure used in the experiment as well as the experimental methodology. Finally, the results of the experiment are presented and discussed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Datasets</head><p>Table <ref type="table" target="#tab_0">1</ref> describes the eleven datasets used in this experiment. The datasets chosen are widely used in the multi-label literature, and have a diverse set of properties, listed in Table <ref type="table" target="#tab_0">1</ref>. Instances, inputs and labels indicate the total number of data points, the number of predictor variables, and the number of potential labels, respectively. Total labelsets indicates the number of unique combinations of relevant labels in the dataset, where each such unique label combination is a labelset. Single labelsets indicates the number of data points having a unique combination of relevant labels. Cardinality indicates the average number of labels assigned per data point. Density is a normalised dimensionless indicator of cardinality computed by dividing the value of cardinality by the number of labels. MeanIR <ref type="bibr" target="#b1">[2]</ref> indicates the average degree of label imbalance in the multilabel dataset-a higher value indicates more imbalance. These label parameters together describe the properties of the datasets which may influence the performance of the algorithms. Collectively, these properties will be referred to as label complexity in the remainder of this text. All datasets were acquired from <ref type="bibr" target="#b17">[18]</ref>. In the birds dataset, several data points are without any assigned label. To avoid problems computing performance scores, we have added an extra other label to this dataset which is added to a data point when it has no other labels assigned to it.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Experimental Methodology</head><p>In this study we use label based macro averaged F-measure <ref type="bibr" target="#b22">[23]</ref> for both hyperparameter selection and performance comparison. Higher values indicate better performance. This measure was selected as it allows performance of algorithms on minority labels to be captured and balances precision and recall for each label <ref type="bibr" target="#b9">[10]</ref>.</p><p>The algorithms used in this experiment are: binary relevance (BR) <ref type="bibr" target="#b0">[1]</ref>, classifier chains (CC) <ref type="bibr" target="#b13">[14]</ref>, label powerset (LP) <ref type="bibr" target="#b0">[1]</ref>, RAkEL-d <ref type="bibr" target="#b19">[20]</ref>, HOMER <ref type="bibr" target="#b16">[17]</ref>, CLR <ref type="bibr" target="#b7">[8]</ref>, BRkNN <ref type="bibr" target="#b15">[16]</ref>, MLkNN <ref type="bibr" target="#b24">[25]</ref>, DMLkNN <ref type="bibr" target="#b21">[22]</ref>, IBLR-ML <ref type="bibr" target="#b3">[4]</ref> and BPMLL <ref type="bibr" target="#b23">[24]</ref>. All algorithm implementations come from the Java library MULAN <ref type="bibr" target="#b18">[19]</ref>. For each algorithm-dataset pair, a grid search on different parameter combinations was performed. For an algorithm-dataset pair, for each parameter combination selected from the grid, a 2 × 5-fold cross-validation run was performed, and the F-measure was recorded. When the grid search is complete, the parameter combination with the highest F-measure was selected. These selected scores are shown in Table <ref type="table" target="#tab_1">2</ref> and used to compare the classifiers.</p><p>For each problem transformation method-CC, BR, LP and CLR-a support vector machine with a radial basis kernel (SVM-RBK) was used as the base classifier. The SVM models were tuned over 12 parameter combinations of the regularisation parameter (from the set {1, 10, 100}) and the kernel spread parameter (from the set {0.01, 0.05, 0.001, 0.005}). For RAkEL-d the subset size was varied between 3 and 6, and for HOMER the cluster size was varied between 3 and 6. For both RAkEL-d and HOMER, the base classifiers were label powerset models, using SVM-RBK models tuned as outlined above. The BRkNN, MLkNN, DMLkNN and IBLR-ML were tuned over 4 to 26 nearest neighbours, with a step size of 2. For BPMLL the tuning was two step in order to make it computationally feasible. First, a grid with 120 different parameter combinations for the regularisation weight, learning rate, number of iterations and the number of hidden units were created and the best combination was found using only the yeast dataset. Next, using this best combination of hyper-parameters other algorithm-dataset pairs were tuned over hidden layers containing units equal to 20%, 40%, 60%, 80% and 100% of the number of inputs for each dataset, as recommended by Zhang et al. <ref type="bibr" target="#b23">[24]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Benchmark results</head><p>The results of the benchmark experiment performed as explained in Section 3.2 are summarised in Table <ref type="table" target="#tab_1">2</ref>. The columns of the table are ordered in the increasing order of the average rank (a lower average rank is better) of the algorithms over all the datasets. The best performance per dataset is highlighted with bold-face. Direct interpretations of Table <ref type="table" target="#tab_1">2</ref> indicate that CC achieved the top score on 4 of the datasets, whereas BPMLL was able to achieve the top score 3 times, with RAkEL-d getting top score twice, and LP and HOMER once each. It is also interesting to note that the k-nearest neighbour based algorithms-IBLR-ML, MLkNN, BRkNN and DMLkNN-are ranked in that order and close to each other. DNF appears in Table <ref type="table" target="#tab_1">2</ref> for the CLR algorithm on the corel5k dataset as the experiment did not finish, due to the huge number of label pairs generated for the 347 labels in this dataset (this is a common outcome for this dataset, eg. <ref type="bibr" target="#b11">[12]</ref>).</p><p>To further explore these results, as recommended by Demšar <ref type="bibr" target="#b5">[6]</ref>, first a Friedman test was performed which indicated that a significant difference between the performance of the algorithms over the datasets did exist; then a pairwise Nemenyi test with a significance level of α = 0.05 was performed. The results indicate that the algorithms do not vary very much across the datasets. Figure <ref type="figure" target="#fig_0">1</ref> shows the critical difference plot for the pairwise Nemenyi test. The different algorithms indicated on the line are ordered by average ranks over the datasets. Algorithms that are not significantly different to each other over the datasets, found by the Nemenyi test with the significance level of α = 0.05, are connected with the bold horizontal lines.</p><p>Overall, Figure <ref type="figure" target="#fig_0">1</ref> indicates that CC, RAkEL-d, BPMLL and LP performed well, whereas the nearest neighbour based algorithms performed relatively poorly. Among the different nearest neighbour based algorithms, IBLR-ML performs better than others over the datasets, but all the nearest neighbour based algorithms perform significantly worse than CC. Hence, the overall performance of the algorithms indicate that-although over the different datasets none of the algorithms decisively outperforms the others-CC, RAkEL-d, BPMLL and LP perform well, and the nearest neighbour based algorithms perform poorly in general.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Label Analysis</head><p>A preliminary experiment was also performed to understand how multi-label classification approaches perform when the number of labels is increased, while the input space is kept the same. Section 4.1 describes the experimental setup and Section 4.2 discusses the results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Experimental Setup</head><p>The corel5k dataset has 50 times as many potential labels as the scene dataset. There are also significant differences in their MeanIR values: 1.254 for scene and  <ref type="table" target="#tab_1">2</ref> indicates that all of the multi-label classification approaches perform much better on scene than corel5k. It is tempting to draw a conclusion that this is because of the complexity of the labelsets, but this is probably a mistake. One multi-label classification problem can be inherently more difficult than another. The prediction performance of an algorithm on a multi-label dataset depends not only the label properties, but also the predictor variables in the input space. Therefore, attempting to establish a relationship between the performances of algorithms on different datasets with varying label properties can be misleading.</p><p>To assess the impact of changing label complexity on the performance of multi-label classification algorithms, a group of datasets were generated synthetically that vary label complexity but keep all input variables the same. These datasets were generated using the yeast dataset as the starting point. 13 synthetic datasets were formed from the yeast dataset. The input space of these 13 datasets are kept identical, with the k th dataset having the first k labels of the dataset in the original order, where 2 ≤ k ≤ 14. Similarly, the emotions dataset was also used to generate 5 such synthetic datasets. The yeast and emotions datasets were selected for this preliminary study for two reasons. First, these are widely used datasets that are somewhat typical of multi-label classification problems-they have medium cardinality and the frequencies of the different labels are relatively well balanced. Second, this experiment is computationally quite expensive (multiple days are required for each run) and so the sizes of these datasets makes repeated runs feasible for this preliminary study.</p><p>Following the experimental methodology explained in Section 3.2 the performance of the BR, CC, LP, RAkEL, IBLR-ML, BRkNN, CLR and BPMLL were assessed on the 13 datasets created based on the yeast data, and the 5 synthetic datasets based on emotions dataset. The results of this experiment are discussed in the following section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Label Analysis Results</head><p>In Figures <ref type="figure">2a and 2b</ref> the number of labels used in the dataset (wither yeast or emotions) is shown on the x-axis and the label based macro averaged Fmeasure is shown on the y-axis (note that the graphs do not use a zero baseline for F-measure so as to emphasise the differences between approaches). These plots indicate that all the algorithms have responded similarly with respect to F-measure as the number of labels vary. Figures <ref type="figure">2c and 2d</ref>, however, show how the relative ranking of the performance of the different algorithms changes as label complexity increases, and here interesting patterns are observed.</p><p>Figure <ref type="figure">2c</ref>, related to the yeast dataset, indicates that the performance of BR starts in a high rank, but reduces as the number of labels increases. CLR does better in rank than BR, but keeps on decreasing as the number of labels increases. For LP and CC, the performance increases as the number of labels increases, ending at the first and the second position respectively. BPMLL starts with the lowest rank, but quickly increases maintaining the best rank most of the time. RAkEL-d stays in the middle. BRkNN and IBLR-ML stays at the Fig. <ref type="figure">2</ref>: Number of labels selected from yeast and emotions dataset, when compared against classifier performance.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Scores of different nos of labels in Yeast dataset</head><p>Different number of labels Macro Averaged F−Measure q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q BPMLL CLR IBLR−ML RAKEL−d LP CC BR BRKNN (b) Macro average F-Measure performance changes, emotions. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Scores of different nos of labels in Emotions dataset</head><p>Different number of labels Macro Averaged F−Measure q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q BPMLL CLR IBLR−ML RAKEL−d LP CC BR BRKNN (c) Relative rank changes, yeast.</p><formula xml:id="formula_0">• • • • • • • • • • • • •</formula><p>Relative rankings on increasing number of Yeast labels    This preliminary study indicates that LP, CC and BPMLL were able to perform comparatively better than others, while BR showed consistent decrease in rank. To establish a definite relation, a more detailed study should be performed.</p><formula xml:id="formula_1">• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 2 3 4<label>5</label></formula><formula xml:id="formula_2">• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •<label>2</label></formula><p>Figure <ref type="figure">3</ref> shows how the label complexity parameters for the yeast and emotions datasets change as the number of labels are varied in the synthetically Fig. <ref type="figure">3</ref>: Change of a few label complexity parameters as the number of labels change q q q q q q q q q q q q q 2 4 6 8 10 12 14 1 2 3 4 for yeast Labels Cardinality q q q q q 2 3 4 5 6 1 2 3 4 for emotions Labels Cardinality q q q q q q q q q q q q q 2 4 6 8 10 12 14 0.26 0.30 0.34 0.38 for yeast Labels Density q q q q q 2 3 4 5 6 0.26 0.30 0.34 0.38 for emotions Labels Density q q q q q q q q q q q q q 2 4 6 8 10 12 14  <ref type="figure">2b</ref>, but such a conclusion from this experiment may be misleading, and hence requires further study.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Discussion and Future Work</head><p>This paper focuses on two aspects. Firstly, the benchmarking of several multilabel classification algorithms over a diverse collection of datasets. Secondly, a preliminary study to understand the performance of the algorithms when the input space is kept identical, while varying the label complexity. For the benchmark experiment, the hyper-parameters for each algorithm-dataset pair were tuned based on label based macro averaged F-measure to provide the fairest comparison between approaches. The algorithms DMLkNN, BRkNN and MLkNN perform poorly overall. On the other hand CC, RAkEL-d and BPMLL were the top three algorithms, in that order. The pairwise Nemenyi test, however, indicates that overall there is not a statistical difference between the performance of most of the pairs of different algorithms. This is perhaps unsurprising, and provides a reinforcement of the no free lunch theorem <ref type="bibr" target="#b20">[21]</ref> in the context of multi-label classification.</p><p>The preliminary label analysis provides some interesting results. The performance of BPMLL, LP and CC improve as the number of labels increases, whereas the performance of BR decreases in comparison. IBLR-ML appears to have consistently better ranks than BRkNN.</p><p>The level of research in the multi-label classification field is continuing to increase, with new methods being proposed and existing methods being improved. Further investigations can be done to understand the performance of additional algorithms over even more datasets to understand their overall effectiveness. Our label analysis experiment was limited to two datasets. Given the preliminary observations from this study, it would be interesting to further investigate if any consistent relationship exists between algorithm performance and the label properties of the dataset under consideration, which may provide a guideline for the suitable application of multi-label algorithms.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 :</head><label>1</label><figDesc>Fig. 1: Comparison of algorithms based on pairwise Nemenyi test. Connected groups with bold line are not significantly different with the significance level α = 0.05</figDesc><graphic coords="7,220.80,460.86,173.76,156.23" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>(a) Macro average F-Measure performance changes, yeast.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head></head><label></label><figDesc>Relative rank changes, emotions.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head></head><label></label><figDesc>though IBLR-ML was able to get a better rank than BRkNN most of the times. In Figure2drelated to emotions dataset, BPMLL and CC continued to rise up, CLR and BR floated down, IBLR-ML and BRkNN were relatively flat, while IBLR-ML achieved a better ranking most of the time.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head></head><label></label><figDesc>Although it looks like there is some relationship between the change of Density in Figure3with the change of performance in Figures2a and</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Datasets</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>Total</cell><cell>Single</cell><cell></cell><cell></cell></row><row><cell cols="8">Dataset Instances Inputs Labels Labelsets Labelsets Cardinality Density MeanIR</cell></row><row><cell>yeast</cell><cell>2417</cell><cell>103</cell><cell>14</cell><cell>198</cell><cell>77</cell><cell>4.237 0.303</cell><cell>7.197</cell></row><row><cell>scene</cell><cell>2407</cell><cell>294</cell><cell>6</cell><cell>15</cell><cell>3</cell><cell>1.074 0.179</cell><cell>1.254</cell></row><row><cell>emotions</cell><cell>593</cell><cell>72</cell><cell>6</cell><cell>27</cell><cell>4</cell><cell>1.869 0.311</cell><cell>1.478</cell></row><row><cell>medical</cell><cell cols="2">978 1449</cell><cell>45</cell><cell>94</cell><cell>33</cell><cell cols="2">1.245 0.028 89.501</cell></row><row><cell>enron</cell><cell cols="2">1702 1001</cell><cell>53</cell><cell>753</cell><cell>573</cell><cell cols="2">3.378 0.064 73.953</cell></row><row><cell>birds</cell><cell>322</cell><cell>260</cell><cell>20</cell><cell>89</cell><cell>55</cell><cell cols="2">1.503 0.075 13.004</cell></row><row><cell>genbase</cell><cell cols="2">662 1186</cell><cell>27</cell><cell>32</cell><cell>10</cell><cell cols="2">1.252 0.046 37.315</cell></row><row><cell>cal500</cell><cell>502</cell><cell>68</cell><cell>174</cell><cell>502</cell><cell>502</cell><cell cols="2">26.044 0.150 20.578</cell></row><row><cell>llog</cell><cell cols="2">1460 1004</cell><cell>75</cell><cell>304</cell><cell>189</cell><cell cols="2">1.180 0.016 39.267</cell></row><row><cell>slashdot</cell><cell cols="2">3782 1079</cell><cell>22</cell><cell>156</cell><cell>56</cell><cell cols="2">1.181 0.054 17.693</cell></row><row><cell>corel5k</cell><cell>5000</cell><cell>499</cell><cell>374</cell><cell>3175</cell><cell>2523</cell><cell cols="2">3.522 0.009 189.568</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Best mean Label Based Macro Averaged F-Measure</figDesc><table><row><cell>Dataset</cell><cell cols="3">CC RAkEL-d BPMLL</cell><cell cols="6">LP HOMER BR CLR IBLR-ML MLkNN BRkNN DMLkNN</cell></row><row><cell>yeast</cell><cell>0.451</cell><cell>0.437</cell><cell cols="2">0.436 0.451</cell><cell>0.448 0.387 0.399</cell><cell>0.394</cell><cell>0.377</cell><cell>0.392</cell><cell>0.380</cell></row><row><cell>scene</cell><cell>0.804</cell><cell>0.802</cell><cell cols="2">0.778 0.802</cell><cell>0.800 0.799 0.793</cell><cell>0.749</cell><cell>0.742</cell><cell>0.695</cell><cell>0.750</cell></row><row><cell>emotions</cell><cell>0.624</cell><cell cols="3">0.628 0.690 0.596</cell><cell>0.621 0.604 0.616</cell><cell>0.658</cell><cell>0.629</cell><cell>0.633</cell><cell>0.634</cell></row><row><cell>medical</cell><cell>0.692</cell><cell>0.697</cell><cell cols="2">0.558 0.659</cell><cell>0.611 0.676 0.520</cell><cell>0.434</cell><cell>0.540</cell><cell>0.474</cell><cell>0.505</cell></row><row><cell>enron</cell><cell>0.289</cell><cell>0.288</cell><cell cols="2">0.281 0.278</cell><cell>0.281 0.284 0.286</cell><cell>0.153</cell><cell>0.177</cell><cell>0.169</cell><cell>0.163</cell></row><row><cell>birds</cell><cell>0.158</cell><cell cols="3">0.181 0.343 0.181</cell><cell>0.155 0.157 0.156</cell><cell>0.255</cell><cell>0.226</cell><cell>0.273</cell><cell>0.216</cell></row><row><cell>genbase</cell><cell>0.944</cell><cell>0.943</cell><cell cols="2">0.815 0.941</cell><cell>0.939 0.941 0.931</cell><cell>0.910</cell><cell>0.850</cell><cell>0.837</cell><cell>0.821</cell></row><row><cell>cal500</cell><cell>0.185</cell><cell cols="3">0.179 0.237 0.178</cell><cell>0.199 0.181 0.169</cell><cell>0.178</cell><cell>0.101</cell><cell>0.124</cell><cell>0.107</cell></row><row><cell>llog</cell><cell>0.292</cell><cell>0.300</cell><cell cols="2">0.295 0.297</cell><cell>0.256 0.296 0.281</cell><cell>0.110</cell><cell>0.263</cell><cell>0.255</cell><cell>0.248</cell></row><row><cell>slashdot</cell><cell>0.469</cell><cell>0.472</cell><cell cols="2">0.209 0.474</cell><cell>0.477 0.466 0.151</cell><cell>0.214</cell><cell>0.194</cell><cell>0.164</cell><cell>0.200</cell></row><row><cell>corel5k</cell><cell>0.222</cell><cell>0.217</cell><cell cols="2">0.219 0.210</cell><cell>0.197 0.213 DNF</cell><cell>0.084</cell><cell>0.190</cell><cell>0.186</cell><cell>0.181</cell></row><row><cell cols="2">Average Rank 3.364</cell><cell>3.455</cell><cell cols="2">4.818 4.909</cell><cell>5.455 5.546 7.300</cell><cell>7.909</cell><cell>8.091</cell><cell>8.364</cell><cell>8.546</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>rankings on increasing number of Emotions labels</head><label></label><figDesc></figDesc><table><row><cell>Relative</cell><cell></cell><cell></cell><cell></cell></row><row><cell>•</cell><cell></cell><cell></cell><cell></cell></row><row><cell>•</cell><cell>•</cell><cell>•</cell><cell>•</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgement. This research was supported by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Learning multi-label scene classification</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Boutell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Brown</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pattern Recognition</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page" from="1757" to="1771" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Addressing imbalance in multilabel classification: Measures and random resampling algorithms</title>
		<author>
			<persName><forename type="first">F</forename><surname>Charte</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Rivera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Del Jesus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Herrera</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neurocomputing</title>
		<imprint>
			<biblScope unit="volume">163</biblScope>
			<biblScope unit="page" from="3" to="16" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Mltsvm: A novel twin support vector machine to multi-label learning</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">H</forename><surname>Shao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">N</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">Y</forename><surname>Deng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pattern Recognition</title>
		<imprint>
			<biblScope unit="volume">52</biblScope>
			<biblScope unit="page" from="61" to="74" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Combining instance-based learning and logistic regression for multilabel classification</title>
		<author>
			<persName><forename type="first">W</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hüllermeier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">76</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="211" to="225" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Knowledge discovery in multi-label phenotype data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Clare</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Clare</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">D</forename><surname>King</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Lecture Notes in Computer Science</title>
		<imprint>
			<biblScope unit="page" from="42" to="53" />
			<date type="published" when="2001">2001</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Statistical comparisons of classifiers over multiple data sets</title>
		<author>
			<persName><forename type="first">J</forename><surname>Demšar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Mach. Learn. Res</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="1" to="30" />
			<date type="published" when="2006-12">Dec 2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A kernel method for multi-labelled classification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Elisseeff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 14</title>
				<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="681" to="687" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Multilabel classification via calibrated label ranking</title>
		<author>
			<persName><forename type="first">J</forename><surname>Fürnkranz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hüllermeier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Loza Mencía</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Brinker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">73</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="133" to="153" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Multi-label learning: a review of the state of the art and ongoing research</title>
		<author>
			<persName><forename type="first">E</forename><surname>Gibaja</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ventura</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="411" to="444" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Kelleher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mac Namee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>D'arcy</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015</date>
			<publisher>The MIT Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Binary relevance efficacy for multilabel classification</title>
		<author>
			<persName><forename type="first">O</forename><surname>Luaces</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Díez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Barranquero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Del Coz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bahamonde</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Progress in Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="303" to="313" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Two stage architecture for multi-label learning</title>
		<author>
			<persName><forename type="first">G</forename><surname>Madjarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gjorgjevikj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Džeroski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pattern Recognition</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="1019" to="1034" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">An extensive experimental comparison of methods for multi-label learning</title>
		<author>
			<persName><forename type="first">G</forename><surname>Madjarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kocev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gjorgjevikj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Džeroski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">best Papers of Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA&apos;</title>
				<imprint>
			<date type="published" when="2011">2012. 2011</date>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page" from="3084" to="3104" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Classifier chains for multi-label classification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Read</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Pfahringer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Holmes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Frank</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">85</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="333" to="359" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Multi-label classification based on multi-objective optimization</title>
		<author>
			<persName><forename type="first">C</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Kong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Intell. Syst. Technol</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">22</biblScope>
			<date type="published" when="2014-04">Apr 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">An empirical study of lazy multilabel classification algorithms</title>
		<author>
			<persName><forename type="first">E</forename><surname>Spyromitros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Tsoumakas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Vlahavas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th Hellenic Conference on Artificial Intelligence: Theories, Models and Applications</title>
				<meeting>the 5th Hellenic Conference on Artificial Intelligence: Theories, Models and Applications<address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="401" to="406" />
		</imprint>
	</monogr>
	<note>SETN &apos;08</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Effective and Efficient Multilabel Classification in Domains with Large Number of Labels</title>
		<author>
			<persName><forename type="first">G</forename><surname>Tsoumakas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Katakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Vlahavas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD&apos;08)</title>
				<meeting>ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD&apos;08)</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">MULAN multi-label dataset repository</title>
		<author>
			<persName><forename type="first">G</forename><surname>Tsoumakas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">S</forename><surname>Xioufis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vilcek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Vlahavas</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Mulan: A java library for multi-label learning</title>
		<author>
			<persName><forename type="first">G</forename><surname>Tsoumakas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Spyromitros-Xioufis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vilcek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Vlahavas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2411" to="2414" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Random k-labelsets: An ensemble method for multilabel classification</title>
		<author>
			<persName><forename type="first">G</forename><surname>Tsoumakas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Vlahavas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Machine Learning: ECML 2007: 18th European Conference on Machine Learning</title>
				<meeting><address><addrLine>Warsaw, Poland; Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2007">September 17-21, 2007. 2007</date>
			<biblScope unit="page" from="406" to="417" />
		</imprint>
	</monogr>
	<note>Proceedings.</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">No free lunch theorems for optimization</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">H</forename><surname>Wolpert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">G</forename><surname>Macready</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Trans. Evol. Comp</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="67" to="82" />
			<date type="published" when="1997-04">Apr 1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Multi-label classification algorithm derived from k-nearest neighbor rule with label dependencies</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Younes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Abdallah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Denoeux</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Signal Processing Conference</title>
				<imprint>
			<date type="published" when="2008-08">2008. Aug 2008</date>
			<biblScope unit="page" from="1" to="5" />
		</imprint>
	</monogr>
	<note>16th European</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">A review on multi-label learning algorithms</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">H</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="1819" to="1837" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Multilabel neural networks with applications to functional genomics and text categorization</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">H</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page" from="1338" to="1351" />
			<date type="published" when="2006-10">Oct 2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Ml-knn: A lazy learning approach to multi-label learning</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">H</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pattern Recognition</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="issue">7</biblScope>
			<biblScope unit="page" from="2038" to="2048" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
