=Paper=
{{Paper
|id=Vol-2353/paper8
|storemode=property
|title=Application of Global Optimization Methods to Increase the Accuracy of Classification in the Data Mining Tasks
|pdfUrl=https://ceur-ws.org/Vol-2353/paper8.pdf
|volume=Vol-2353
|authors=Anastasiya Doroshenko
|dblpUrl=https://dblp.org/rec/conf/cmis/Doroshenko19
}}
==Application of Global Optimization Methods to Increase the Accuracy of Classification in the Data Mining Tasks==
Application of global optimization methods to increase
the accuracy of classification in the data mining tasks
Doroshenko A. V.[0000-0002-7214-5108]
Lviv Polytechnic National University, Lviv, Ukraine
anastasia.doroshenko@gmail.com
Abstract. The article describes the solving of data mining task using neural-like
structures of Successive Geometric Transformations Model (NLS SGTM). The
main problems of this task are imbalanced dataset and different weigh of errors.
Therefore, to take into account these features, the method of penalties and re-
wards was used, as well as a piecewise linear approach to classification. The
supplement of the methods used by the final optimization procedure is pro-
posed. The procedure of final optimization using simulated annealing.
Keywords: data mining, classification, imbalance problem, cost-sensitive
learning, imbalanced data, principal components, neural-like structure of suc-
cessive geometric transformations model, NLS SGTM, simulated annealing,
analysis of the principal components, optimization methods.
1. Introduction
In the previous articles [1-2] the methods based on the combination of a Successive
Geometric Transformations Model with the method of penalties and rewards was
described. In addition, was developed the piecewise-linear approach to constructing
separating surfaces in classification tasks.
The purpose of these methods is to solve the data mining tasks of the classification.
The main features of these tasks are large-size datasets, imbalanced dataset and dif-
ferent weight of errors. The main goal of this research is increasing the accuracy of
classification and minimize the number of penalty points.
In order to increase the accuracy of classification, we propose to supplement the
developed methods with final optimization procedures, in particular by the method of
random correction of decomposition elements and the method of simulated annealing.
2. Problem statement
Today the data mining tasks are widespread because most companies today have a
huge amount of accumulated data with information about sales, customers, orders,
and more. This information is a source of hidden knowledge. In turn, the possession
of this knowledge allows this company to take a leading position in the market, to win
the competitive struggle. Among such tasks, one of the most popular is the task of
classification. These tasks are formulated daily; in such spheres of life how as com-
merce, telecommunication, and chemical industry, target marketing, insurance, medi-
cine, bioinformatics, and others. Researchers use different methods to solve classifica-
tion problems [1, 2, 3, 4].
The main features of classification tasks in data mining are imbalanced data, dif-
ferent weight of error, huge amounts of data. These features require using some spe-
cial additional methods to well-known methods of classification to provide high accu-
racy of classification.
Hereby as basic methods of classification, we used the neural-like structure of suc-
cessive geometric transformations model (NLS SGTM) [5, 6]. As an additional
method was used piecewise-linear approach [1] and cost-sensitive learning method [7,
8]. This allowed us to improve the classification accuracy and take into account the
specifics of a specific task. This article proposes to apply global optimization methods
to the neuro-like structure already trained as a result of previous experiments. This
will allow us to find such parameters of the neural-like structure, in which the sum of
points reaches the global max.
3. Increasing the accuracy of classification using random
correction of decomposition elements.
3.1. Analysis of the principal components
The analysis of the principal components is the standard method used to reduce the
dimensionality of data in statistical pattern recognition system and signal processing
systems. However, it is also advisable to use the analysis of the principal components
to solve the problems of data mining because of their high dimensionality [9, 10].
The main task of statistical recognition is the allocation of attributes - the process
in which the data space is transformed into space of attributes, which theoretically has
the same dimension as the input space. Conversions, however, are usually performed
so that a reduced number of the most effective features can represent the data space.
Consequently, only a substantial part of the information contained in the data remains,
the dimension of the data is reduced. If this approach is applied to data mining task,
we will reduce the size of the input data by extracting non-informative features with-
out losing significant data. Consider a more detailed analysis of the principal compo-
nents (in the theory of information is known as Karhunen-Loeve Transform) [11, 12].
Assume that there exists a vector X of dimension m, which we want to convey with
the help of l numbers, where l 0, repeat L times the following actions:
• choose a new solution w' from the vicinity w;
• calculate the change of target function; Δ = E(w')- E(w);
• if Δ≤0, take w = w'; otherwise, if Δ>0, take w = w' with probability exp(-Δ/T)
by generating a random number R from the interval (0,1), then comparing it
with the value exp(-Δ/T); if exp(-Δ/T) > R, take a new solution w = w'; in other
case - ignore it.
3. Reduce the temperature ( T rT ) using the reduction coefficient r, selected from
the interval (0,1) and return to step 2.
4. After lowering the temperature to zero, apply one of the deterministic methods
(Levenberg-Marquardt algorithm, error-return algorithm, fastest-speed algorithm,
etc.) to achieve the minimum of the target function.
The concept of "temperature" in this algorithm is quite formal, since the presented
optimization model is only a mathematical analogy of the annealing process.
The efficiency of the annealing algorithm has an extremely high impact with the
choice of parameters such as the initial temperature Tmax, the coefficient of reduction
of temperature r and the number of cycles L, performed at each temperature level.
The main problem is to determine the threshold level optimal for each annealing
simulation process. For some practical tasks, this level may have different meanings,
but the overall range remains unchanged. As a rule, the initial temperature is selected
so as to ensure the implementation of about 50% of the subsequent random changes in
the solution. Therefore, knowledge of the pre-distribution of such changes makes it
possible to estimate the initial temperature approximately.
Numerous computer experiments [22] prove that in the case where the time limit is
small, the best results give a single implementation. If simulation can be long lasting,
then statistically better results can be achieved thanks to the multiple implementation
of the annealing simulation if the value of the coefficient r is close to 1.
If we compare genetic algorithms with an annealing algorithm, then, in spite of the
significant external difference between the algorithms, they are essentially similar in
nature. An annealing algorithm according to [23] can be considered a genetic algo-
rithm with a population consisting of one instance. Consequently, an algorithm for
simulating annealing of a metal can be regarded as an algorithm that has only a muta-
tion operation, but not cross-linking.
In addition, if we compare these two algorithms from the applied point of view,
then it should be noted that, according to Kohonen's study [7], in the case when the
initial solution is sufficiently close to optimal, the annealing algorithm of the metal
has significant advantages over the genetic algorithms from a computational point of
view.
Since in our study the initial data are pre-processed by methods of fines and incen-
tives with sampling alignment and piecewise linear classification on the basis of the
model of geometric transformations, then the initial solution of the problem is suffi-
ciently close to the optimal. Accordingly, in this case, it is more appropriate to choose
for optimization of the solution.
Improvement of accuracy of the decision of tasks of intellectual data analysis on
the basis of correction of elements of decomposition by algorithm of simulated
annealing
Let's consider a more detailed algorithm of simulated annealing of metal in combi-
nation with methods of fines and incentives and piecewise linear classification on the
basis of a model of geometric transformations.
Fig. 3 depicts the structural scheme of the developed neural network based on the
model of geometric transformations, where x1 , x2 ,..., xn primary features of classifica-
tion objects – input data, PC1, PC2, ..., PCn the principal components derived from input
data, w1 , w2 ,..., wn weight coefficients, y an output that indicate on belonging to
certain classes.
Fig. 3. The scheme of neural-like structure GTM
Functioning of such a neural network can be described by the formula 14.
y i 1 PCwi
n
(14)
The method of simulating annealing of a metal is proposed to be used to op-
timize weight coefficients so that the resulting amount of penalty points is minimal,
that is, the optimization parameter is the amount of penalty points [15, 17].
Fig. 4. The division tree into classes based on NLS SGTM
As can be seen from the flowchart describing the solution of the data mining prob-
lem by combining the method of fines and incentives, simulated annealing and piece-
wise linear approach, a modified annealing method for which the target function is the
number of penalty points, will be applied separately for each cluster [22, 24].
Accordingly, if we have a two-step division into clusters for a sample of n classes,
then we will have division into clusters, which is depicted in Fig. 4, and for each of
the clusters a modified annealing algorithm will be implemented.
5. Experimental results
This article describes the solving of classification task, which was formulated in [1].
The training sample describes the transactions carried out by credit card holders
within two days and consists of 284,807 lines and 31 columns Also, the dataset con-
tains one target feature 'Class', which shows the client's affiliation to one of two
classes - frauds or ordinary clients. The main feature of the dataset is that the data set
is highly unbalanced - only 492 transactions out of 284807 (0.172% of all transac-
tions) have the value of the target field 1, that is, customers are fraudulent. The data-
set has been collected and analyzed during a research collaboration of Worldline and
the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de
Bruxelles) on big data mining and fraud detection [7].
According to the subject area, a matrix was formed. By analyzing this matrix, it
can be seen that a properly classified vector that belongs to the "fraud" class has a
much greater weight than a properly classified "ordinary client" vector. At the same
time, the case where an ordinary customer is classified as fraud has the highest num-
ber of penalty points (Table 1).
Then we used the modified method of imitation of annealing of the metal with pa-
rameters: initial temperature T=Tmax=20895, L=100, R is a random number from the
interval (0,1), r=0,9.
Table 1. Matrix of penalties and rewards solving task
Matrix of Rewards Values of Rewards and Penalties
and Penalties
The vector is The vector is
recognized as class 1 recognized as class 2
The vector belongs
1 -3
to class 1
The vector belongs
-2 5
to class 2
After lowering the temperature to zero, apply one of the deterministic methods
(Levenberg-Marquardt algorithm, error-return algorithm, fastest-speed algorithm,
etc.) to achieve the minimum of the target function. The results of classification are
described (Fig 5, Fig.6).
Fig. 5. Results of classification using NLS SGTM with the method of penalties and rewards
and the method of simulated annealing (in vectors)
Fig. 6. Results of classification using NLS SGTM with the method of penalties and rewards
and the method of simulated annealing (in points)
6. Conclusion
The application of global optimization methods to classification in the data mining
tasks allowed to increase the accuracy of classification, especially in combination
with other methods of classification, such as neural-like structure of successive geo-
metric transformations model. Also, the method of simulated annealing successfully
combined with such methods as piecewise-linear approach and cost-sensitive learning
method. The application of the method of simulated annealing made it possible to
reach the point of the global maximum and minimize the amount of penalty points for
this task.
References
1. Doroshenko, A.: Piecewise-Linear Approach to Classification Based on Geometrical
Transformation Model for Imbalanced Dataset. In: Proceedings of the 2018 IEEE
Second International Conference on Data Stream Mining & Processing (DSMP),
Lviv, pp. 231-235 (2018)
2. Tkachenko, R., Doroshenko, A.: Classification of Imbalanced Classes using the
Committee of Neural Networks. In: Proceedings of the XIIIth International Scientific
and Technical Conference Computer Sciences and Information Technologies (CSIT),
Lviv, 11-14 September 2018. (2018)
3. Tepla, T.L., Izonin, I.V., Duriagina, Z.А., Tkachenko, R.О., Trostianchyn, А.М.,
Lemishka, І.А., Kulyk, V.V., Kovbasyuk, Т.М.: Alloys selection based on the super-
vised learning technique for design of biocompatible medical materials Archives of
Materials Science and Engineering, 93/1, pp. 32-40 (2018)
4. Izonin, I., Trostianchyn, A., Duriagina, Z., Tkachenko, R., Tepla, T., Lotoshynska N.:
The Combined Use of the Wiener Polynomial and SVM for Material Classification
Task in Medical Implants Production. International Journal of Intelligent Systems and
Applications (IJISA), Vol.10, No.9, pp.40-47 (2018).
5. Tkachenko, R., Yurchak, I., Polishchuk, U.: Neurolike networks on the basis of Geo-
metrical Transformation Machine. In: Proceedings of the 2008 International Confer-
ence on Perspective Technologies and Methods in MEMS Design, Polyana, pp. 77-80
(2008). doi: 10.1109/MEMSTECH.2008.4558743
6. Polishchuk, U., Tkachenko, P., Tkachenko, R., Yurchak, I.: Features of the auto-
associative neuro like structures of the geometrical transformation machine (GTM).
In: Proceedings of the 2009 5th International Conference on Perspective Technolo-
gies and Methods in MEMS Design, Zakarpattya, pp. 66–67 (2009)
7. Pozzolo, A.D., Caelen, O., Johnson, R.A., Bontempi, G.: Calibrating probability with
underdamping for unbalanced classification. In: Proceedings of the 2015 IEEE Sym-
posium Series on Computational Intelligence, Cape Town, pp. 159–166 (2015)
doi: 10.1109/SSCI.2015.33
8. Liu, X.: A Benefit-Cost Based Method for Cost-Sensitive Decision Trees. In: Pro-
ceedings of the WRI Global Congress on Intelligent Systems, Xiamen, pp. 463-467
(2009).
9. Ebied, H. M.: Feature extraction using PCA and Kernel-PCA for face recognition. In:
Proceedings of the 2012 8th International Conference on Informatics and Systems
(INFOS), Cairo, pp. MM-72-MM-77 (2012)
10. Subbotin, S.: Quasi-Relief Method of Informative Features Selection for Classifica-
tion. In: Proceedings of the 2018 IEEE 13th International Scientific and Technical
Conference on Computer Sciences and Information Technologies (CSIT), Lviv, pp.
318-321 (2018).
11. Amar, A., Leshem, A., Gastpar, M.: A greedy approach to the distributed Karhunen-
Loève transform. In: Proceedings of the 2010 IEEE International Conference on
Acoustics, Speech and Signal Processing, Dallas, TX, pp. 2970-2973 (2010)
12. Mattera, D., Palmieri, F., Haykin, S.: An explicit algorithm for training support vector
machines. In: Proceedings of the IEEE Signal Processing Letters, vol. 6, no. 9, pp.
243-245 (1999).
13. Riznyk, O., Yurchak, I., Povshuk, O.: Synthesis of optimal recovery systems in dis-
tributed computing using ideal ring bundles. In: Proceedings of the 2016 XII Interna-
tional Conference on Perspective Technologies and Methods in MEMS Design
(MEMSTECH), Lviv, pp. 220-222 (2016)
14. Li, Xin, Qian, Xu, Wang, Ziqiang: Classification rule mining using feature selection
and genetic algorithm. In: Proceedings of the 2009 Asia-Pacific Conference on Com-
putational Intelligence and Industrial Applications, Wuhan, pp. 107-110 (2009)
15. Yue, S., Kaizhang, Liu, W., Wang, Y.: A Fuzzy Clustering Approach Using Reward
and Penalty Functions. In: Proceedings of the 2009 Sixth International Conference on
Fuzzy Systems and Knowledge Discovery, Tianjin, pp. 18-21 (2009)
16. Zhiwei, Ye, Juan, Yang, Xu, Zhang, Zhengbing, Hu: Remote Sensing Textual Image
Classification based on Ensemble Learning. International Journal of Image, Graphics
and Signal Processing(IJIGSP), Vol.8, No.12, pp.21-29 (2016).
17. Tkachenko, R., Cutucu, H., Izonin, I., Doroshenko, A., Tsymbal, Y.. (2019) Non-
iterative Neural-like Predictor for Solar Energy in Libya. In: Ermolayev V., Suárez-
Figueroa M., Yakovyna V., Mayr H., Nikitchenko M., Spivakovsky A. (eds) In: Pro-
ceedings of the 14-th International Conference ICTERI 2018. Volume I: Main Con-
ference. Kyiv, Ukraine, May 14-17, pp.35-45 (2018).
18. Oleynik, A., Subbotin, S.: Parametrical synthesis of neural network models based on
the evolutionary optimization. In: Proceedings of the 2009 10th International Confer-
ence The Experience of Designing and Application of CAD Systems in Microelec-
tronics, Lviv-Polyana, pp. 335-338 (2009).
19. Feiden, D., Tetzlaff, R.: Iterative annealing: a new efficient optimization method for
cellular neural networks. In: Proceedings of the 2001 International Conference on Im-
age Processing (Cat. No.01CH37205), Thessaloniki, Greece, 2001, pp. 549-552 vol.1.
20. Granville, V., Krivanek, M., Rasson, J.-P: Simulated annealing: A proof of conver-
gence. IEEE Transactions on Pattern Analysis and Machine Intelligence. 16 (6): 652–
656 (1994) doi:10.1109/34.295910.
21. Huang, K., Hsieh, Y.: Very fast simulated annealing for pattern detection and seismic
applications. In: Proceedings of the IEEE International Geoscience and Remote Sens-
ing Symposium, Vancouver, BC, pp. 499-502 (2011)
22. Shafti, L. S., Pérez, E.: Data Reduction by Genetic Algorithms and Non-Algebraic
Feature Construction: A Case Study. In: Proceedings of the 2008 Eighth International
Conference on Hybrid Intelligent Systems, Barcelona, pp. 573-578 (2008)
23. Zhengbing, Hu., Bodyanskiy, Y.V., Tyshchenko, O. K., Boiko, O. O.: An Ensemble
of Adaptive Neuro-Fuzzy Kohonen Networks for Online Data Stream Fuzzy Cluster-
ing. International Journal of Modern Education and Computer Science(IJMECS),
Vol.8, No.5, pp.12-18 (2016). doi: 10.5815/ijmecs.2016.05.02