=Paper= {{Paper |id=Vol-2353/paper8 |storemode=property |title=Application of Global Optimization Methods to Increase the Accuracy of Classification in the Data Mining Tasks |pdfUrl=https://ceur-ws.org/Vol-2353/paper8.pdf |volume=Vol-2353 |authors=Anastasiya Doroshenko |dblpUrl=https://dblp.org/rec/conf/cmis/Doroshenko19 }} ==Application of Global Optimization Methods to Increase the Accuracy of Classification in the Data Mining Tasks== https://ceur-ws.org/Vol-2353/paper8.pdf
  Application of global optimization methods to increase
  the accuracy of classification in the data mining tasks

                            Doroshenko A. V.[0000-0002-7214-5108]

                    Lviv Polytechnic National University, Lviv, Ukraine
                         anastasia.doroshenko@gmail.com



       Abstract. The article describes the solving of data mining task using neural-like
       structures of Successive Geometric Transformations Model (NLS SGTM). The
       main problems of this task are imbalanced dataset and different weigh of errors.
       Therefore, to take into account these features, the method of penalties and re-
       wards was used, as well as a piecewise linear approach to classification. The
       supplement of the methods used by the final optimization procedure is pro-
       posed. The procedure of final optimization using simulated annealing.

       Keywords: data mining, classification, imbalance problem, cost-sensitive
       learning, imbalanced data, principal components, neural-like structure of suc-
       cessive geometric transformations model, NLS SGTM, simulated annealing,
       analysis of the principal components, optimization methods.


  1.     Introduction

In the previous articles [1-2] the methods based on the combination of a Successive
Geometric Transformations Model with the method of penalties and rewards was
described. In addition, was developed the piecewise-linear approach to constructing
separating surfaces in classification tasks.
   The purpose of these methods is to solve the data mining tasks of the classification.
The main features of these tasks are large-size datasets, imbalanced dataset and dif-
ferent weight of errors. The main goal of this research is increasing the accuracy of
classification and minimize the number of penalty points.
   In order to increase the accuracy of classification, we propose to supplement the
developed methods with final optimization procedures, in particular by the method of
random correction of decomposition elements and the method of simulated annealing.


  2.     Problem statement

Today the data mining tasks are widespread because most companies today have a
huge amount of accumulated data with information about sales, customers, orders,
and more. This information is a source of hidden knowledge. In turn, the possession
of this knowledge allows this company to take a leading position in the market, to win
the competitive struggle. Among such tasks, one of the most popular is the task of
classification. These tasks are formulated daily; in such spheres of life how as com-
merce, telecommunication, and chemical industry, target marketing, insurance, medi-
cine, bioinformatics, and others. Researchers use different methods to solve classifica-
tion problems [1, 2, 3, 4].
   The main features of classification tasks in data mining are imbalanced data, dif-
ferent weight of error, huge amounts of data. These features require using some spe-
cial additional methods to well-known methods of classification to provide high accu-
racy of classification.
   Hereby as basic methods of classification, we used the neural-like structure of suc-
cessive geometric transformations model (NLS SGTM) [5, 6]. As an additional
method was used piecewise-linear approach [1] and cost-sensitive learning method [7,
8]. This allowed us to improve the classification accuracy and take into account the
specifics of a specific task. This article proposes to apply global optimization methods
to the neuro-like structure already trained as a result of previous experiments. This
will allow us to find such parameters of the neural-like structure, in which the sum of
points reaches the global max.


   3. Increasing the accuracy of classification using random
    correction of decomposition elements.

3.1.   Analysis of the principal components

The analysis of the principal components is the standard method used to reduce the
dimensionality of data in statistical pattern recognition system and signal processing
systems. However, it is also advisable to use the analysis of the principal components
to solve the problems of data mining because of their high dimensionality [9, 10].
   The main task of statistical recognition is the allocation of attributes - the process
in which the data space is transformed into space of attributes, which theoretically has
the same dimension as the input space. Conversions, however, are usually performed
so that a reduced number of the most effective features can represent the data space.
Consequently, only a substantial part of the information contained in the data remains,
the dimension of the data is reduced. If this approach is applied to data mining task,
we will reduce the size of the input data by extracting non-informative features with-
out losing significant data. Consider a more detailed analysis of the principal compo-
nents (in the theory of information is known as Karhunen-Loeve Transform) [11, 12].
   Assume that there exists a vector X of dimension m, which we want to convey with
the help of l numbers, where l