<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Estimation and Feature Selection by Application of Knowledge Mined from Decision Rules Models</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Wieslaw</forename><surname>Paja</surname></persName>
							<email>wpaja@ur.edu.pl</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Faculty of Mathematics and Natural Sciences</orgName>
								<orgName type="institution">University of Rzeszow</orgName>
								<address>
									<addrLine>Prof. St. Pigonia Str. 1</addrLine>
									<postCode>35-310</postCode>
									<settlement>Rzeszow</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Krzysztof</forename><surname>Pancerz</surname></persName>
							<email>kpancerz@wszia.edu.pl</email>
							<affiliation key="aff2">
								<orgName type="department">and Management Sucharskiego</orgName>
								<orgName type="institution">University of Information Technology</orgName>
								<address>
									<addrLine>Str. 2</addrLine>
									<postCode>35-225</postCode>
									<settlement>Rzeszow</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
							<affiliation key="aff3">
								<orgName type="institution">University of Management</orgName>
								<address>
									<addrLine>Administration Akademicka Str. 4</addrLine>
									<postCode>22-400</postCode>
									<settlement>Zamosc</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Estimation and Feature Selection by Application of Knowledge Mined from Decision Rules Models</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">BBE123A0678669B019C6204081FD7028</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T18:18+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Feature selection</term>
					<term>feature ranking</term>
					<term>decision rules</term>
					<term>dimensionality reduction</term>
					<term>relevance and irrelevance</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Feature selection methods, as a preprocessing step to machine learning, are effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection methods with respect to the efficiency and effectiveness. In this work, we introduce a novel concept, relevant feature selection based on information gathered from decision rule models. A new measure for a feature rank based on the feature frequency and rule quality is additionally defined. The efficiency and effectiveness of our method is demonstrated through exemplary use of five real-world datasets. Six different classification algorithms were used to measure the quality of learning models built on original features and on selected features.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In the era of the acquisition of vast amounts of data, different domain information databases, efficient analysis and retrieval of regularity have become an extremely important task. The issue of classification and object recognition is applied in many fields of human activity. Data mining is fraught with many aspects which hinder it, like a very large number of observations, too many attributes, the insignificance of the part of variables for the classification process, mutual interdependence of conditional variables, the simultaneous presence of variables with different types, the presence of undefined values of variables, the presence of erroneous values of the variables, uneven distribution of categories for the target variable. Thus, the development of efficient methods for significant feature selection is valid.</p><p>Feature selection (FS) methods are frequently used as a preprocessing step to machine learning experiments. An FS method can be defined as a process of choosing a subset of original features so that the feature space is optimally reduced according to a certain evaluation criterion. Feature selection has been a fruitful field of research and development since 1970's and it has been proven to be effective in removing irrelevant features, increasing efficiency in learning tasks, improving learning performance like predictive accuracy, and enhancing comprehensibility of the learned results <ref type="bibr">[1]</ref>.</p><p>The feature selection methods are typically divided into three classes based on how they combine the selection algorithm and the model building: filter, wrapper and embedded FS methods. Filter methods select features with respect to the model. They are based only on general features like the correlation with the variable to be predicted. These methods select only the most interesting variables. Then, a selected subset will be a part of the classification model. Such methods are effective in computation time and robust to overfitting <ref type="bibr" target="#b0">[2]</ref>. However, some redundant, but relevant features can remain unrecognized. In turn, wrapper methods evaluate subsets of features which allow to detect the possible interactions between variables [1, 3]. However, the increase in overfitting risk, when the number of observations is insufficient, is possible. Additionally, the significant computation time, when the number of variables is large, highly increases. The third type, called embedded methods, is intended for reducing the classification of learning. Methods in this group try to combine the advantages of both methods mentioned previously. Thus, the learning algorithm takes advantage of its own variable selection algorithm. Therefore, it needs to know initially what a good selection is, which limits its exploitation <ref type="bibr" target="#b2">[4]</ref>.</p><p>Kohavi and John [1] observed that there are several definitions of relevance that may be contradictory and misleading. They proposed two degrees of relevance (strong and weak) that are required to encompass all notions usually associated with this term. In their approach the relevance is defined in the absolute terms, with the help of the ideal Bayes classifier. In this context, a feature X is strongly relevant when removal of X alone from the data always results in deterioration of the prediction accuracy of the ideal Bayes classifier. In turn, a feature X is weakly relevant if it is not strongly relevant and there exists a subset of features S, such that the performance of the ideal Bayes classifier on S is worse than the performance on S ∪ {X}. A feature is irrelevant if it is neither strongly nor weakly relevant.</p><p>Nilsson et al. <ref type="bibr" target="#b3">[5]</ref> introduced the formal definition of two different feature selection problems:</p><p>1. Minimal Optimal Feature Selection (MOFS) consisting in identification of minimal set of features to obtain the optimal quality of classification.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">All Relevant Feature Selection (ARFS))</head><p>, where the problem is to find all the variables that may, under certain conditions, improve the classification.</p><p>There are two important differences between these problems. The first one is detection of attributes with low importance (ARFS) <ref type="bibr" target="#b4">[6]</ref>, which may be completely obscured by other, more important attributes, from the point of view of the classifier (MOFS). The second difference is to find the boundary between the variables poorly, but realistically related to the decision and those for which such a relation is created as a result of random fluctuations. The formal definition of the problem of all relevant feature selection (ARFS) as a distinct problem from the classical minimal optimal feature selection (MOFS), was proposed recently in 2007 <ref type="bibr" target="#b3">[5]</ref>.</p><p>In our research, we used the contrast variable concept to distinguish between relevant and irrelevant features <ref type="bibr" target="#b4">[6]</ref>. It is a variable that does not carry information on the decision variable by design that is added to the system in order to discern relevant and irrelevant variables. Here, it is obtained from the real variables by random permutation of values between objects. The use of contrast variables was, for the first time, proposed by Stoppiglia et al. <ref type="bibr" target="#b5">[7]</ref> and then by Tuv et al. <ref type="bibr" target="#b6">[8]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Methods and Algorithms</head><p>During experiments the following general procedure was applied: In the first step, a dataset as well as a feature for investigation were defined. Then, different ranking measures were applied to estimate importance of each feature. In order to check specificity of the feature selection, the dataset was extended by adding contrast variables. It means that each original variable was duplicated and its values were randomly permuted between all objects. Hence, a set of non-informative by design shadow variables was added to original variables. The variables that were selected as important more significantly than random, were examined further, using different tests. To define the level of feature importance, six well-known ranking measures were applied: ReliefF, Information Gain, Gain Ratio, Gini Index, SVM weight, and RandomForest. Additionally, our new measure, called RQualityFS, was introduced. It is based on the frequency of presence of different feature in a rule model generated from an original dataset and it also takes into consideration the quality of the rules in which this feature occurs. Rank quality of the i-th feature could be presented as follow:</p><formula xml:id="formula_0">1. Step 1.</formula><formula xml:id="formula_1">Q Ai = n j=1 Q Rj {A i } (1)</formula><p>where n is a number of rules inside the model, Q Rj defines the classification quality of the rule R j and {A i } describes the presence of the i-th attribute, usually it is equal to 1 (the feature occurred) or to 0 (the feature did not occur).</p><p>In turn, the quality of the rule is defined as follows:</p><formula xml:id="formula_2">Q Rj = E corr E corr + E incorr<label>(2)</label></formula><p>where E corr depicts a number of correctly matched learning objects by the j-th rule and E incorr depicts a number of incorrectly matched learning objects by this rule.</p><p>During the second step, a test probing the importance of variables was performed by analyzing the influence of variables used for model building on the prediction quality. Six different machine learning algorithms were applied to build different predictors for the original set of features and for selected features: Classification Tree (CT), Random Forest (RF), CN2 decision rules algorithm (CN2), Naive Bayes (NB), k-Nearest Neighbors (kNN), and Support Vector Machine (SVM). During this step, a 10-fold cross validation paradigm was used. Ten known evaluation measures were uti-lized in each predictor: Classification Accuracy (CA), Sensitivity, Specificity, Area Under ROC curve (AUC), Information Score (IS), F1 score (F1), Precision, Brier measure, Matthew Coefficient Correlation (MCC) parameter, and finally Informadness (Inform.) ratio <ref type="bibr" target="#b7">[9]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Investigated Datasets</head><p>Our initial investigations focus on applying the developed algorithm on several re-alworld datasets. Five datasets have been used during experiments. Four of them are gathered from the UCI ML repository, while the fifth set has been developed earlier by the authors <ref type="bibr" target="#b8">[10]</ref>. A summary of datasets is presented in Table <ref type="table" target="#tab_1">1</ref>. These datasets have diverse numbers of objects, features and their types as well as classes. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Results and Conclusions</head><p>To illustrate the proposed methodology, only results for Breast cancer datasets will be presented in details. The first step of the experiment revealed six features, that were recommended as important by all or almost all ranking measures. In Table <ref type="table" target="#tab_2">2</ref>, we can observe that deg-malig, node-caps, irradiat, inv-nodes, breast, and menopause features create a stable and core set of features which have the highest values of seven measures of importance, particularly using RQualityFS measure, introduced in our investigation.</p><p>In the same table, comparison with importance of contrast values (italic rows and contrast index) is also presented. The most important contrast feature is tumor-size (contrast) for which RQualityFS measure, defined earlier, is equal to 2.34. In this way, we also treated a threshold that separates the core, relevant set of attributes from other less informative attributes. Most of the measures (except SVM weight) used in this approach show that the selected set of features has higher values of these parameters than the gathered threshold value (underlined values). These values are denoted in bold style in Table <ref type="table" target="#tab_2">2</ref>. Hereby, we can observe that different measures give different thresholds. -0.01 0.00 0.00 0.00 0.00 -0.16 0.86 (contrast) breast-quad 0.07 0.00 0.00 0.00 0.03 -0.05 0.00 (contrast) inv-nodes -0.02 0.02 0.02 0.01 0.14 0.07 0.00 (contrast) node-caps -0.04 0.00 0.00 0.00 0.02 -0.03 0.00 (contrast) breast-quad -0.05 0.01 0.01 0.00 0.16 0.13 0.00</p><p>The second step of the experiment was devoted to evaluation of prediction of the quality of utilized machine learning algorithms described in Section 2. During this step, six different algorithms were applied using the 10-fold cross validation method. The average results for the Breast cancer dataset are shown in Figure <ref type="figure">1</ref>. This procedure was utilized for two specified sets: </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>Selection of dataset and features for investigation.</figDesc><table /><note>(a) Application of a set of ranking measures to calculate importance for each feature: i. With set of contrast features. ii. Without contrast features. (b) Definition (selection) of the most important feature subset. 2. Step 2. Application of different machine learning algorithms for classification of unseen objects using the 10-fold cross validation method: (a) Using all original features. (b) Using only selected, important features. 3. Step 3. Comparison of gathered results using different evaluation measures.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 .</head><label>1</label><figDesc>A summary characteristic of benchmark datasets</figDesc><table><row><cell>Dataset</cell><cell cols="3"># instances # features # classes</cell></row><row><cell cols="2">Breast cancer 286</cell><cell>9</cell><cell>2</cell></row><row><cell cols="2">Heart disease 303</cell><cell>13</cell><cell>2</cell></row><row><cell cols="2">Lung cancer 32</cell><cell>56</cell><cell>3</cell></row><row><cell cols="2">Primary tumor 339</cell><cell>17</cell><cell>21</cell></row><row><cell cols="2">Skin cancer 548</cell><cell>13</cell><cell>13</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 .</head><label>2</label><figDesc>Ranking of features using seven different measures</figDesc><table><row><cell>Feature</cell><cell cols="3">ReliefF Inf. Gain Gini SVM RF RQualityFS</cell></row><row><cell></cell><cell>gain Ratio</cell><cell>weight</cell></row><row><cell>deg-malig</cell><cell cols="2">0.08 0.08 0.05 0.02 0.07 2.01</cell><cell>8.78</cell></row><row><cell>node-caps</cell><cell cols="2">0.15 0.06 0.08 0.02 0.05 1.21</cell><cell>7.23</cell></row><row><cell>irradiat</cell><cell cols="2">0.13 0.03 0.03 0.01 0.00 0.88</cell><cell>4.94</cell></row><row><cell>inv-nodes</cell><cell cols="2">0.15 0.07 0.05 0.02 0.03 0.32</cell><cell>3.78</cell></row><row><cell>breast</cell><cell cols="2">-0.01 0.00 0.00 0.00 0.06 0.32</cell><cell>3.66</cell></row><row><cell cols="3">menopause -0.03 0.00 0.00 0.00 0.02 0.00</cell><cell>3.14</cell></row><row><cell cols="3">tumor-size -0.01 0.01 0.00 0.00 0.04 0.01</cell><cell>2.34</cell></row><row><cell>(contrast)</cell><cell></cell><cell></cell></row><row><cell>Age</cell><cell cols="2">0.01 0.01 0.01 0.00 0.04 -0.12</cell><cell>2.24</cell></row><row><cell>breast</cell><cell cols="2">0.01 0.00 0.00 0.00 0.00 0.06</cell><cell>2.05</cell></row><row><cell>(contrast)</cell><cell></cell><cell></cell></row><row><cell>tumor-size</cell><cell cols="2">0.07 0.06 0.02 0.01 0.10 0.04</cell><cell>1.74</cell></row><row><cell>age</cell><cell cols="2">0.05 0.01 0.00 0.00 0.01 -0.06</cell><cell>1.27</cell></row><row><cell>(contrast)</cell><cell></cell><cell></cell></row><row><cell>deg-malig</cell><cell cols="2">0.06 0.00 0.00 0.00 0.01 0.46</cell><cell>1.23</cell></row><row><cell>(contrast)</cell><cell></cell><cell></cell></row><row><cell cols="3">menopause 0.09 0.01 0.01 0.00 0.06 0.02</cell><cell>1.16</cell></row><row><cell>(contrast)</cell><cell></cell><cell></cell></row><row><cell>irradiat</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 5 .</head><label>5</label><figDesc>Average results of Random forest on original (normal font) and selected sets (italic font)</figDesc><table><row><cell>Dataset</cell><cell>CA Sens Spec AUC IS F1 Prec Brier MCC Inform.</cell></row><row><cell cols="2">Breast cancer 0.75 0.59 0.59 0.70 0.08 0.58 0.79 0.37 0.32 0.17</cell></row><row><cell></cell><cell>0.75 0.61 0.61 0.71 0.10 0.61 0.75 0.36 0.33 0.21</cell></row><row><cell cols="2">Heart disease 0.81 0.81 0.81 0.89 0.41 0.81 0.82 0.27 0.63 0.62</cell></row><row><cell></cell><cell>0.77 0.76 0.76 0.86 0.35 0.76 0.77 0.31 0.53 0.53</cell></row><row><cell cols="2">Lung cancer 0.35 0.33 0.65 0.73 0.17 0.34 0.43 0.61 0.01 -0.02</cell></row><row><cell></cell><cell>0.50 0.53 0.75 0.68 0.30 0.50 0.49 0.59 0.26 0.28</cell></row><row><cell cols="2">Primary tumor 0.45 0.21 0.97 0.87 1.03 0.33 0.37 0.71 0.35 0.17</cell></row><row><cell></cell><cell>0.33 0.11 0.96 0.82 0.68 0.37 0.31 0.80 0.25 0.07</cell></row><row><cell cols="2">Skin cancer 0.83 0.79 0.93 0.97 1.11 0.80 0.85 0.27 0.75 0.72</cell></row><row><cell></cell><cell>0.75 0.69 0.90 0.94 0.99 0.68 0.72 0.33 0.61 0.58</cell></row><row><cell>Average</cell><cell>0.64 0.54 0.79 0.83 0.56 0.57 0.65 0.44 0.41 0.33</cell></row><row><cell></cell><cell>0.62 0.54 0.79 0.80 0.48 0.58 0.61 0.48 0.40 0.33</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 6 .</head><label>6</label><figDesc>Average</figDesc><table><row><cell></cell><cell>results of CN2 rules on original (normal font) and selected sets (italic</cell></row><row><cell>font)</cell><cell></cell></row><row><cell>Dataset</cell><cell>CA Sens Spec AUC IS F1 Prec Brier MCC Inform.</cell></row><row><cell cols="2">Breast cancer 0.72 0.57 0.57 0.61 0.12 0.56 0.68 0.46 0.22 0.14</cell></row><row><cell></cell><cell>0.75 0.60 0.60 0.66 0.14 0.61 0.74 0.39 0.31 0.21</cell></row><row><cell cols="2">Heart disease 0.82 0.81 0.81 0.84 0.58 0.81 0.83 0.33 0.64 0.62</cell></row><row><cell></cell><cell>0.74 0.73 0.73 0.76 0.43 0.74 0.75 0.43 0.48 0.47</cell></row><row><cell cols="2">Lung cancer 0.44 0.44 0.71 0.65 0.39 0.44 0.46 0.72 0.15 0.14</cell></row><row><cell></cell><cell>0.56 0.55 0.77 0.69 0.25 0.68 0.58 0.60 0.46 0.32</cell></row><row><cell cols="2">Primary tumor 0.45 0.21 0.97 0.87 1.03 0.33 0.37 0.71 0.35 0.17</cell></row><row><cell></cell><cell>0.33 0.11 0.96 0.82 0.68 0.37 0.31 0.80 0.25 0.07</cell></row><row><cell cols="2">Skin cancer 0.82 0.79 0.93 0.94 1.32 0.81 0.84 0.27 0.75 0.72</cell></row><row><cell></cell><cell>0.76 0.70 0.90 0.92 1.09 0.72 0.78 0.34 0.64 0.60</cell></row><row><cell>Average</cell><cell>0.65 0.56 0.80 0.78 0.69 0.59 0.64 0.50 0.42 0.36</cell></row><row><cell></cell><cell>0.63 0.54 0.79 0.77 0.52 0.62 0.63 0.51 0.43 0.33</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 7 .</head><label>7</label><figDesc>Average results of Naive Bayes classifier on original (normal font) and selected sets (italic font) Lung cancer 0.62 0.64 0.81 0.74 0.67 0.63 0.64 0.72 0.44 0.44 0.60 0.63 0.80 0.68 0.43 0.59 0.61 0.60 0.43 0.43 Primary tumor 0.40 0.17 0.97 0.81 0.98 0.31 0.31 0.75 0.28 0.13 0.38 0.16 0.97 0.80 0.88 0.43 0.33 0.79 0.29 0.13 Skin cancer 0.78 0.77 0.92 0.96 1.24 0.78 0.80 0.27 0.71 0.69 0.73 0.70 0.90 0.94 1.12 0.71 0.73 0.33 0.61 0.59 Average 0.67 0.61 0.84 0.82 0.74 0.64 0.65 0.49 0.48 0.45 0.65 0.58 0.82 0.80 0.62 0.63 0.63 0.48 0.44 0.40</figDesc><table><row><cell>Dataset</cell><cell>CA Sens Spec AUC IS F1 Prec Brier MCC Inform.</cell></row><row><cell cols="2">Breast cancer 0.73 0.66 0.66 0.69 0.16 0.66 0.67 0.43 0.33 0.31</cell></row><row><cell></cell><cell>0.74 0.65 0.65 0.70 0.17 0.66 0.68 0.40 0.33 0.31</cell></row><row><cell cols="2">Heart disease 0.83 0.83 0.83 0.90 0.62 0.83 0.83 0.27 0.66 0.66</cell></row><row><cell></cell><cell>0.78 0.78 0.78 0.87 0.50 0.78 0.78 0.29 0.55 0.55</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 8 .</head><label>8</label><figDesc>Average results of kNN classifier on original (normal font) and selected sets (italic font)</figDesc><table><row><cell>Dataset</cell><cell>CA Sens Spec AUC IS F1 Prec Brier MCC Inform.</cell></row><row><cell cols="2">Breast cancer 0.71 0.60 0.60 0.65 0.16 0.61 0.64 0.47 0.24 0.21</cell></row><row><cell></cell><cell>0.72 0.60 0.60 0.61 0.10 0.61 0.65 0.45 0.25 0.20</cell></row><row><cell cols="2">Heart disease 0.77 0.76 0.76 0.85 0.51 0.76 0.76 0.36 0.53 0.53</cell></row><row><cell></cell><cell>0.70 0.70 0.70 0.80 0.40 0.70 0.70 0.46 0.40 0.40</cell></row><row><cell cols="2">Lung cancer 0.43 0.46 0.72 0.66 0.35 0.44 0.44 0.68 0.17 0.18</cell></row><row><cell></cell><cell>0.53 0.53 0.76 0.62 0.35 0.49 0.47 0.66 0.27 0.29</cell></row><row><cell cols="2">Primary tumor 0.49 0.26 0.98 0.84 1.48 0.41 0.27 0.75 0.26 0.24</cell></row><row><cell></cell><cell>0.37 0.19 0.97 0.82 1.13 0.36 0.24 0.78 0.20 0.16</cell></row><row><cell cols="2">Skin cancer 0.81 0.82 0.93 0.94 1.40 0.82 0.81 0.29 0.75 0.76</cell></row><row><cell></cell><cell>0.77 0.75 0.91 0.92 1.26 0.75 0.76 0.34 0.67 0.66</cell></row><row><cell>Average</cell><cell>0.64 0.58 0.80 0.79 0.78 0.61 0.59 0.51 0.39 0.38</cell></row><row><cell></cell><cell>0.62 0.55 0.79 0.76 0.65 0.58 0.57 0.54 0.36 0.34</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Application of highdimensional feature selection: evaluation for genomic prediction in man</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Bermingham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pong-Wong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Spiliopoulou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hayward</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Rudan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Campbell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">F</forename><surname>Wright</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">F</forename><surname>Wilson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Agakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Navarro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Haley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Sci. Rep</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Choosing SNPs using feature selection</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Phuong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">B</forename><surname>Altman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings -2005 IEEE Computational Systems Bioinformatics Conference</title>
				<meeting>-2005 IEEE Computational Systems Bioinformatics Conference<address><addrLine>CSB</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2005">2005. 2005</date>
			<biblScope unit="page" from="301" to="309" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Wrapper-filter feature selection algorithm using a memetic framework</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">S</forename><surname>Ong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dash</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. Syst. Man, Cybern. Part B Cybern</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="page" from="70" to="76" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Detecting multivariate differentially expressed genes</title>
		<author>
			<persName><forename type="first">R</forename><surname>Nilsson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Pena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bj Okegren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tegner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page">150</biblScope>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">All Relevant Feature Selection Methods and Applications</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">R</forename><surname>Rudnicki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wrzesień</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Paja</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Feature Selection for Data and Pattern Recognition</title>
				<editor>
			<persName><forename type="first">U</forename><surname>Stańczyk</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Lakhmi</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg; Berlin</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="11" to="28" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Ranking a Random Feature for Variable and Feature Selection</title>
		<author>
			<persName><forename type="first">H</forename><surname>Stoppiglia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Dreyfus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Dubois</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Oussar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Mach. Learn. Res</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="1399" to="1414" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Feature Selection Using Ensemble Based Ranking Against Artificial Contrasts</title>
		<author>
			<persName><forename type="first">E</forename><surname>Tuv</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Borisov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Torkkola</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Symposium on Neural Networks</title>
				<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="2181" to="2186" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">An introduction to ROC analysis</title>
		<author>
			<persName><forename type="first">T</forename><surname>Fawcett</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pattern Recognit. Lett</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page" from="861" to="874" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Diagnosing Skin Melanoma: Current versus Future Directions</title>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">S</forename><surname>Hippe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bajcar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Blajdo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Grzymala-Busse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Grzymala-Busse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Knap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Paja</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wrzesien</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">TASK Q</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="289" to="293" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A unified view of performance metrics: translating threshold choice into expected classification loss</title>
		<author>
			<persName><forename type="first">J</forename><surname>Hernandez-Orallo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Flach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ferri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Mach. Learn. Res</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="2813" to="2869" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
