Feature Selection in Under bagging Based Kernel ELM for Handling Class Imbalance Dataset Classification Roshani Choudhary a, Jayesh Kumar Kashyap a, Sanyam Shukla a and Manasi Gyanchandani a a Maulana Azad National Institute of Technology, Bhopal, India Abstract To learn from class imbalance dataset is a big hurdle in classification. As traditional classifiers are unable to handle class imbalance classification effectively. There are various algorithms developed which can handle the problem of class imbalance. To handle a class imbalance problem different approaches are used like sampling, making modifications at the algorithm level, use of ensembles, and evolutionary techniques. Under Bagging based Kernel ELM i.e. UBKELM is a developed variant of the Extreme Learning Machine (ELM) developed to solve the problem of class imbalance. The UBKELM is an ensemble technique that uses under- sampling to handle the imbalance ratio in the component classifiers. Feature selection in component classifiers when creating the ensemble model is a technique that needs more research. This work proposes a Feature Selection in Under bagging Based Kernel ELM (FS- UBKELM) for handling class imbalance dataset classification. In FS-UBKELM some of the features are removed from every component classifier in term to enhance the performance of classification. For the selection of features, we have used the data complexity method. In term to identify the advancement in performance of the developed method we have compared the outcomes with the other state of developed methods of imbalance classification. The results show the performance improvement in the proposed method significantly on class imbalance classification. Keywords 1 UBKELM, FS-UBKELM, Feature Selection, Data Imbalance, CIL(Class imbalance learning) 1. Introduction Various algorithms are designed for classification like SVM, Neural Network, Decision Tree, Naïve Bayes, etc. There are some factors which effects the performance of the classification problem. The problems which arise in classification are class imbalance, class overlapping, small disjuncts, etc. In real world datasets for classification like dataset for cancer detection, dataset for intrusion detection, fraud detection dataset etc. We can frequently observe the class imbalance problems in these datasets. The class imbalance is a situation when the distribution of the classes is different in a given dataset i.e., the instances in the datasets are present in different proportion. If in a class the no of instance are more than the average number of instances then they are called the majority class and if in a class the no. of instances are less than the average number of instances are classed the minority class. In a class imbalance dataset, the traditional classifiers are unable to perform well for classification, because the results are skewed in favor of the majority class. Different methods are developed to solve the problem of class imbalance classification. These techniques can be classified as data-level, algorithm-level, ensemble, and evolutionary techniques. To deal with the effect of class imbalance on the classifier, data level methods attempt to balance the class distribution. WCES-2022: Workshop on Control and Embedded Systems, April 22 – 24, 2022, Chennai, India. EMAIL: jysksyp@gmail.com (Jayesh Kumar Kashyap) ORCID: 0000-0001-5665-1097 (Jayesh Kumar Kashyap) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 23 Oversampling and under-sampling are two examples of data level method. The algorithm level methods attempt to change the classifier so that it may be used to classify class imbalances. e.g., cost- sensitive methods. The ensemble methods try to improve the classification performance by combining the results after making multiple classifiers. The problem of class imbalance is also addressed using evolutionary approaches like one-class classification, noise reduction, Universum learning, feature selection, etc. Feature selection is a very popular method for performance enhancement of classification problems, but it is very rarely used for imbalance learning. Under-Bagging based Kernel ELM (UBKELM) [4], It is a generalized single hidden layer feed- forward neural network (SLFN) built to tackle class imbalance classification and is a variation of Extreme Learning Machine (ELM) [1]. We use a feature selection strategy in the suggested study i.e. in the UBKELM to improve its performance on class imbalance problems. 2. Literature Review 2.1. Methods to handle Class Imbalance Learning (CIL) There exist various methods to handle the class imbalance classification model. Some of these methods are discussed below: 2.1.1. Data Level Methods The data level methods try to balance the proportion of classes in the dataset to reduce the biases of the classifier towards the majority class. These methods include different techniques like over sampling techniques, the under-sampling techniques, and the hybrid sampling techniques. Some popular data sampling methods include the synthetic minority oversampling method (SMOTE) [9], random under- sampling (RUS) [5]. 2.1.2. Algorithm Level Modifications The algorithm level methods perform the modification in the algorithm itself to make them suitable for class imbalance classification. Some popular algorithmic methods which are used to handle class imbalance learning are weighted ELM (WELM) [3], class specifier ELM (CSELM) [10]. 2.1.3. Ensemble Techniques Ensemble techniques make multiple classifiers and then combine the result of multiple classifiers to make the final decision. It is taught that a decision made by numerous classifiers is superior to a judgement made by a single classifier. Some of the ensemble techniques to handle class imbalance are Easy-Ensemble and Balance Cascade [5], BWELM [7]. 2.2. Evolutionary Methods There exist some evolutionary techniques which are being used for class imbalance learning like one class classification [12], Universum learning [11], feature selection [6], noise filtering [13] etc. 2.3. Data Complexity Analysis It is found in the literature [14] that there are various data complexities present in a dataset we use for classification. Some of these data complexity measures effect the performance of classification in 24 class imbalance datasets. This section discusses some of the data complexity measures. 2.4. Fisher’s discriminant ratio (F1) The basic variant of fisher’s discriminant ratio (F1) use to compute that if we consider any specific feature then how two classes are separated. [14]. 2.5. The volume of overlap region (F2) The volume of overlap region computes the length of coincide range i.e., overlap range normalized by total range’s length which contain the distributed values in both the classes, then we can obtain the value of volume of coincide region in two classes in term of product of normalized length of all features which are in overlapping ranges. [14]. 2.6. Feature efficiency (F3) Feature efficiency is termed as ratio of all the left points which are segregated by particular feature, The greatest feature efficiency (i.e., the largest % of points separable by utilizing that unique feature) is used as an estimate of overlap for a binary class classification issue. [14]. 2.7. Kernel Extreme Learning Machine (KELM) The Extreme Learning Machine (ELM) is a single hidden layer feed-forward neural network (SLFN) with excellent generalization and fast learning speed. Extreme Learning Machine can be used for both regression as well as classification also. ELM was originally proposed with two variants sigmoid node- based ELM and Gaussian kernel-based ELM (KELM) [2]. The KELM outperforms the ELM based on sigmoid nodes. The Gaussian kernel function is used by KELM to map the input data to the feature space, as shown below: The kernel matrix of KELM is given as: The kernel matrix Ω is represented as follow for N number of training instances: The following equation can be used to calculate the output of KELM [2]: 25 2.8. Under-Bagging based Kernel ELM (UBKELM) UBKELM [4] proposes an Under-Bagging ensemble which use as the component classifier i.e. use kernelized-ELM. The capabilities of random under-sampling and bagging were inferred using the Under Bagging ensemble.. The training and testing process of UB-KELM is shown in Figure-1.UB- KELM develops several balanced training subsets (BTSS) by randomly under sampling the majority class samples in every training sample. Each BTSS includes all cases of the minority class as well as the equivalent amount of randomly chosen majority class samples. The variety of training subsets i.e., T are dependent on degree of class imbalance which can be obtained by using the following equation: Here, tk represent the number of samples belonging to kth class, where m is the number of classes in the dataset. 3. Proposed Work This paper suggests a UBKELM variation to effectively handle a class imbalance classification challenge. The proposed work incorporates the feature selection in the UBKELM algorithm. UBKELM is an ensemble method, the proposed method reduces one feature in every component classification model of UBKELM. For the selection of feature which is to be removed the proposed work used the data complexity analysis. The proposed work used the volume of overlap region(F2) as the measure of complexity for feature reduction in the component classifier. The training and testing process of the developed method is shown in Figure-2. Figure 1: UBKELM training and testing process. Feature selection UBKELM perform random under-sampling of the majority class samples in each training subset to creates several balanced training subsets (BTSS) by random. Each subset contains all the minority class instances and the same number of randomly selected majority class instances as the minority class. After the creation of balances training subsets, one feature which is having the least volume of overlapping region (F2) value is removed from the corresponding BTSS. The training model is learned using the reduced no. of features for every component model. And number of training subsets i.e., T is identified in the same manner as UBKELM. 4. Experimental Setup and Analysis of the Results In this section, we go over the setup of the tests and the analysis of the results for evaluating the 26 suggested method's performance. 4.1. Evaluation Matrix In a class imbalance scenario accuracy cannot be a good measure to check whether the performance is good of a classifier. There are many evaluation matrices which are used for the performance evaluation of classifiers with class imbalance datasets. The Precision, Recall, G-mean, F-measure are some of such measures. Predicted 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 Actual 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 + 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝐺 − 𝑚𝑒𝑎𝑛 = √ × 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 4.2. Experimental Setup We have implemented the code of the proposed FS- UBKELM in Matlab-2017. The proposed method incorporates the comparison with 4 state art methods for class imbalance learning which are WELM, BELM, UBKELM-SV, and UBKELM-MV. The results of these methods are obtained from the work UBKELM. 4.3. Parameter Settings The proposed method uses the RBF kernel function in which we have to choose the value of the kernel width parameter i.e., represented as σ in the proposed work. The value of σ is tuned using the grid search. The proposed work used the regularized version of ELM i.e., RELM in which the regularization parameter needs to be tuned. 27 Figure 2: FS-UBKELM training and testing process. In this work the regularization parameter is referred to as C, the value of C is tuned using grid search. The range for tuning σ: The range for tuning C: 4.4. Result Analysis For performance evaluation with proposed model the G-mean is used and performance of other methods with proposed method is used for comparison. Keeping other methods in consideration table 1 provide G-mean of proposed methods. In term to show the significant improvement Wilcoxon signed rank test and T-test are carried out and the respective improved results are obtained. While performing T-test, for Null-Hypothesis it returns a test decision that the data into sample space are obtained from normal distribution keeping unknown variance and mean equal to zero., performing the paired sample T-test. If the population distribution doesn’t have the value of mean equal to zero then it is named as alternate hypothesis. And if the test fails to reject the null hypothesis at a level of significance of 5%, the result of H is 1., else the value of H is 0, the states contain information about the test statistics. In Table-2, we have shown the result of T-test, which confirm that the method which is proposed in this is significantly better compare to all the other methods in consideration. While performing Wilcoxon signed-rank, we get the p-value of paired, two-sided test in term to null hypothesis that the two- population obtained when the median of distribution is zero. To indicate the test decision, we have “H”, which return a logical value. If the value of H is “1”, It means that the null hypothesis is reject. Else if the value of H is “0”, It indicate that it is failing to reject the null hypothesis at 5% significance level. The Stats contain information about the test statistic. Table-3 shows the Wilcoxon signed rank test results, which show a considerable performance improvement of the suggested strategy over the other methods under consideration. Table 1: Test G-mean Algorithm Datasets WELM BELM UBKELM-SV UBKELM-MV FS-UBKELM σ C abalone9-18 89.76 90.12 91.07 91.53 92.15 18 36 ecoli-0137vs26 73.65 78.51 81.08 77.74 92.92 8 26 28 Algorithm Datasets WELM BELM UBKELM-SV UBKELM-MV FS-UBKELM σ C ecoli-01_vs_5 88.36 89.36 94.02 93.63 98.62 -6 12 glass016vs2 83.39 84.21 83.39 84.48 87.83 20 42 glass-0123vs456 95.41 94.21 95.45 95.24 96.41 -4 -8 glass2 82.59 85.50 83.26 85.94 87.45 20 44 glass4 91.17 90.34 92.86 92.91 96.69 2 20 haberman 66.26 65.14 66.49 66.70 68.55 10 16 yeast-05679vs4 82.21 80.96 83.45 82.24 83.20 0 -8 yeast-1289vs7 71.41 72.67 74.73 74.28 76.75 10 20 yeast-1458vs7 69.32 69.87 70.15 71.24 71.91 -4 40 yeast-1_vs_7 77.72 77.72 77.90 77.73 80.14 4 8 yeast-2_vs_8 77.89 78.35 81.69 80.39 81.11 10 10 yeast4 84.98 84.74 84.83 85.27 85.92 -2 50 Table 2: Statistical T-test Methods Statistical Result Compared Stats P value H (5%) WELM vs [-7.495145523745117; -1.864425904826311] 0.003287743 1 FS-UBKELM BELM vs [-6.239463381338741; -2.037250904375548] 9.38E-04 1 FS-UBKELM UBKELM_SV vs [-4.582737837292964; -1.026833591278464] 0.004669434 1 FS-UBKELM UBKELM_MV vs [-5.066495924931664; -0.693075503639768] 0.01378597 1 FS-UBKELM Table 3: Wilcoxon signed rank test Methods Compared Statistical Results Signed rank P value H (5%) WELM vs FS-UBKELM 0 1.22E-04 1 BELM Vs FS-UBKELM 0 1.22E-04 1 UBKELM_SV Vs FS-UBKELM 3 6.10E-04 1 UBKELM_MV Vs FS-UBKELM 0 1.22E-04 1 5. Conclusion and Future Work 29 In this paper the proposed methos is regarding handling class imbalance learning with the help of combining the feature selection in ensemble method. For the selection of feature, we have used the volume of overlap region (F2). In future we can use other Data complexity measures as well for feature reduction in the training process. Also, the feature selection can be combined with other variant of ELM, which can handle class imbalance like weighted kernel ELM(WKELM), Class-Specific Kernelized ELM(CSKELM), Under bagging Reduced Kernelized Weighted ELM (UBRKWELM). 6. References [1] Huang, G., Zhou, H., Ding, X., & Zhang, R. (2012). Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42 , 513-529. [2] Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70 , 489-501. Neural Networks. [3] Zong, W., Huang, G.-B., & Chen, Y. (2013). Weighted extreme learning machine for imbalance learning. Neurocomputing, 101 , 229- 242. [4] Raghuwanshi, B. S., & Shukla, S. (2019). Class imbalance learning using underbagging based kernelized extreme learning machine. Neurocomputing, 329 , 172-187. [5] Liu, X., Wu, J., & Zhou, Z. (2009). Exploratory undersampling for class imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39 , 539-550. [6] Liu, M., Xu, C., Luo, Y., Xu, C., Wen, Y., & Tao, D. (2018). Cost-sensitive feature selection by optimizing f-measures. IEEE Transactions on Image Processing, 27 , 1323-1335. [7] Li, K., Kong, X., Lu, Z., Wenyin, L., & Yin, J. (2014). Boosting weighted elm for imbalanced learning. Neurocomputing, 128 , 15-21. [8] Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42 , 463-484. [9] Chawla, N., Bowyer, K., O. Hall, L., & Philip Kegelmeyer, W. (2002). Smote: Synthetic minority over-sampling technique. J. Artif. Intell. Res. (JAIR), 16 , 321-357. [10] Raghuwanshi, B. S., & Shukla, S. (2018a). Class-specific extreme learning machine for handling binary class imbalance problem. Neural Networks, 105 , 206-217. [11] Zhiquan Qi, Yingjie Tian, Yong Shi, (2014). A nonparallel support vector machine for a classification problem with universum learning. Journal of Computational and Applied Mathematics, 263, 288–298. [12] Alexandros Iosifidis, Vasileios Mygdalis, Anastasios Tefas, Ioannis Pitas, (2016). One-Class Classification Based on Extreme Learning and Geometric Class Information. Neural Process Lett. [13] Qi Kang, XiaoShuang Chen, SiSi Li, MengChu Zhou. (2017) A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification. IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 12, 4263-2274 [14] J. M. Sotoca, J. S. Sánchez, R. A. Mollineda. (2005) A review of data complexity measures and their applicability to pattern classification problems. III Taller Nacional de Minería de Datos y Aprendizaje, TAMIDA, 77-83. [15] V.M. Janakiraman, X. Nguyen, D. Assanis, Stochastic gradient based extreme learning machines for stable online learning of advanced combustion engines, Neurocomputing 177 (2016) 304–316. [16] W. Xiao, J. Zhang, Y. Li, S. Zhang, W. Yang, Class-specific cost regulation extreme learning machine for imbalanced classification, Neurocomputing 261(2017) 70–82. [17] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera, A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42 (4)(2012) 463–484. [18] F. Luo, W. Guo, Y. Yu, G. Chen, A multi-label classification algorithm based on kernel extreme learning machine , Neurocomputing 260 (2017) 313–320. 30