<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Decision Support System for the Machine Learning Methods Selection in Big Data Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Odessa National Polytechnic University</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Odessa</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ukraine nickolay.rud@gmail.com</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>@yandex.ua</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National University "Odessa Maritime Academy"</institution>
          ,
          <addr-line>Odessa</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Odessa National Maritime University</institution>
          ,
          <addr-line>Odessa</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>This article focuses on the aspects of the decision support system for the machine learning methods selection in big data mining development. The paper includes the results of the analysis of the problem of intellectual processing and analysis of big data. It describes proposal ways of using metadata as a basis for the formation of an analytical rating for evaluating machine learning methods. The paper presents the results of designing and using a decision support system for evaluating machine learning methods for solving data mining problems. The developed decision support system allows us to reduce the analysis time of suitable methods for solving machine learning problems by a data science analyst, taking into account the specifics of the input data arrays, their volumes, structure and other metadata.</p>
      </abstract>
      <kwd-group>
        <kwd>decision support system</kwd>
        <kwd>big data</kwd>
        <kwd>data mining</kwd>
        <kwd>data science</kwd>
        <kwd>machine learning</kwd>
        <kwd>data analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Currently, there is a steady trend of regular growth in the volume of data collected
in the various business organizations f production, operational and research activities
processes [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The sources of the such big data volumes (Big Data) appearance are
often customers various behavioral factors, the frequency and size of payments for the
services or goods, parameters and characteristics of installed technical equipment,
medical indicators for diagnosing human health and others [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2-4</xref>
        ].
      </p>
      <p>
        Due to the statistical visibility and representativeness, the value of such data lies in
the possibility of using it to search for hidden and unobvious relationships between
individual factors (attributes) and target actions of clients to adjust and formulate
business development strategies [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        In fact, the implementation of such tasks becomes possible based on the use of data
mining methods in order to extract new knowledge by forming and proving
hypotheses about significant relationships between the individual attributes of data samples
[
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6-8</xref>
        ].
      </p>
      <p>
        Thus, the constant companies need to ensure a sufficient level of quality in the
provision of goods and services, due to the high level of competition in the
organization’s business targets requires usage of the data science, data mining methods and
technologies in key business processes [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. For this purpose, machine learning (ML)
methods can be used to build various regression and predictive models for the
business objectives.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Description of Problem</title>
      <p>
        The concepts which are laid down in this approach allow data mining specialists to
conduct a comprehensive phased Big Data analysis and processing, sequentially
implementing the necessary processes, including distributed structuring of
heterogeneous data, their consolidation, aggregation, cleaning and pre-processing, eliminating
anomalies, omissions, errors side values [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10-12</xref>
        ].
      </p>
      <p>
        However, all these processes are time-consuming for analysts on the experimental
selection and mathematical model selection with the corresponding hyperparameters
and quality assessment metrics. One of the key factors to provide a successful and
prompt solution of a target problem by an analyst is his experience in building ML
models, business problems vision depth and the software tools, technologies and
libraries knowledge [
        <xref ref-type="bibr" rid="ref13 ref14">13,14</xref>
        ].
      </p>
      <p>
        This can lead to subjectivity of the analysis results and affect the accuracy and
generalizing ability of the formed ML models. At the same time, the computational
costs of the computer equipment used in the data analysis process are also significant,
which affects the total cost of developing ML models [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
      </p>
      <p>An additional complication is the correlation, evaluation, and required ML method
or their combination selection processes for the effective solution of the data mining
posed problem (with the achievement of the model created by a sufficient accuracy,
adequacy level and generalizing ability) without lengthy computational experiments.</p>
      <p>
        This problem becomes especially relevant in cases where the Big Data sample size
exceeds the permissible amount of disk space in the used data warehouses [
        <xref ref-type="bibr" rid="ref16 ref17 ref18 ref19">16-19</xref>
        ]. If
there are limitations in the throughput capacity Big Data transmission in serial or
parallel mode in a local or global network, then effective analysis of such data
becomes difficult [
        <xref ref-type="bibr" rid="ref20 ref21 ref22 ref23 ref24">20-24</xref>
        ]. A possible solution is to compress, transform or structure the
data with the extraction of quantitative and qualitative meta-information from them
[25-28].
      </p>
      <p>In this regard, the urgent and relevant task is to automate the processes of selecting
suitable ML methods based on the input Big Data volumes analysis, their various
statistical and probabilistic characteristics, taking into account the subject area
specifics and the ML problem type. This can be done by developing a decision support
system (DSS) with a number of intellectual functions for the formation and
accounting of meta-information about data attributes, their structure, level of generalization
and significance.</p>
      <p>Currently, there are various analytical systems on the market for comparing and
comparing various ML methods for solving classification and regression problems on
Big Data, however, their functionality is limited and does not allow to fully take into
account the nature of the input data, their volume and subject area of analysis. DSS
that can be adapted for such ML tasks are Wolfram Mathematica, EIDOS, Expert
Choice and ViEA. But, there is no possibility of a flexible system configuration for
the user’s needs, filling with new functionality and updating dependencies is not
performed regularly. This makes it impossible to use such solutions to obtain reliable and
reliable data mining results [29]. The purpose of this article is to develop a DSS
project for choosing ML methods to solve data mining tasks on user-specified data sets
based on their structure and volume to reduce time spent on detailed experimental
calculations.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Decision support system development</title>
      <sec id="sec-3-1">
        <title>System concept</title>
        <p>The functioning of the proposed system is carried out in several stages:
 data import;
 formation of meta-information from the downloaded data;
 Data Mining tasks type and methods selection;
 specification of criteria and metrics for ML model quality assessing;
 obtained ML models creation and evaluation;
 ML models obtained results visualization and the issuance of a methods ranked list
which provides the highest quality solution to the problem.</p>
        <p>The process of DSS in general is shown in fig. 1.</p>
        <p>DSS supports the ability to import various training datasets and test ML models in
*.csv, *.xls and *.json formats. If necessary, the processing of structured relational
data supports the integration of SQL database management systems MySQL and
Postgresql, as well as NoSQL MongoDB. Data loading is performed in multi-threaded
mode with blocks from 2 to 128 MB in each, which allows us to distribute computing
processes and scale the system in the future.</p>
        <p>After importing data, the system gives a message about the success of the
operations performed and provides a brief meta-information, which includes: the total
number of records in the loaded data set, signs number, signs and records number
ratio, data amount. The user can set the block size for further analysis depending on
the displayed data.</p>
        <p>Since the size of the analyzed data volume in data mining tasks can reach large
values, this can reduce the efficiency of computing operations in the analysis process.
Therefore, the user is invited to choose one of the options: make a selection of a given
size from the imported set from 0 to 100% or select a value randomly with the
possibility of stratified sampling by a specific column to maintain proportions. This allows
us to reduce the amount of RAM used to store data and speed up the computing
analysis operations process. To ensure the analysis process flexibility the user has the
opportunity to select the necessary features by disabling or removing unnecessary
from the system, as well as specify one or more target output variables.</p>
        <p>For each column from the dataset table, the selection indicates whether it is
numeric or categorical. For columns with numerical values, the following statistical
information is displayed: range of values, standard distribution, average value,
median, asymmetry coefficient, kurtosis coefficient, chi-square test of the normal
distribution test, Pearson correlation with the target variable (if one was specified) and its
confidence. For columns with categorical values, the following is displayed: number
of categories, relative mode frequency and Gini coefficient.</p>
        <p>The system allows us to take into account the specifics of the data mining task
being solved and set the priority of assessing the ML model quality based on the
operation speed or the accuracy obtained. This is one of the target criteria for forming a
ranked methods list. DSS implements support for 2 types of data mining tasks:
classification and regression. Depending on the type of task, the user must specify the
necessary metric for evaluating ML models.</p>
        <p>For classification tasks, the following metrics are supported: accuracy (share of
correct answers), recall (share of found objects of a positive class), precision (share of
truly positive from classified positive objects), F1 (harmonious average), AUC. For
regression tasks, metrics are supported: mean squared error (MSE), and mean
absolute error (MAE). After the data import is completed, the analysis process is started in
the background. The analysis consists in the phased creation of model instances with
hyperparameter values in the ranges specified by the user (or in the default range
specified for each ML method) and their assessment by the selected metrics. It is
possible to view detailed logs of obtained metric estimates at individual iterations and
epochs of training and testing ML models in *.txt format.</p>
        <p>After the system performs data analysis procedures, the results are displayed in the
form of a summary report on the most suitable ML methods for the criteria selected
by users, which automatically saves to the *.xls file and is shown at the system’s log
screen. The structure of the general report has a tabular form and contains: the name
of the method, the level of its adequacy for the selected data set and task (in relative
units from 0 to 100), the approximate predicted RAM amount needed for training and
testing ML method on the sample, and the predicted time spent on carrying out
computing processes.</p>
        <p>User can select the criteria for ordering methods in a window for displaying
analysis results. In particular, it is possible to sort the methods by individual metrics, speed,
accuracy or ease of results interpretation (which depends on the data amount).</p>
        <p>For a simpler and more understandable the analysis results interpretation DSS
supports visualization using a bar chart, where each column reflects the quality of an
individual model in the rating form for a given metric. It is possible to save the
constructed graphic visualizations in *.png format.</p>
        <p>Based on the developed concept of the system’s functioning, its design and
software implementation can be carried out next.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>DSS project implementation</title>
        <p>To display the relationship between users and the system, a diagram of use cases
was compiled (fig. 2). The main unary scenarios of user interaction with the system
are: selecting a data set, viewing data statistics, updating the task, viewing models
rating. The server side provides support of optional data set selected part retrieval,
imported data set statistics generation, computational processes for compiling the
rating and its graphical visualization, saving results to a file.</p>
        <p>The system is divided into some components that perform various parts of the task.
These components, as well as the data exchange between them, are shown in fig.3. In
this diagram, the UserInterface component is responsible for user interaction with the
system, providing controls, as well as reflecting the selected table, its statistics and the
ranking list generated. The TableManager component is responsible for loading and
storing tabular data, it receives the path and data loading mode from the interface,
after which it provides the other components with a dataset table.</p>
        <p>The TableAnalyzer component is responsible for removing statistics from a table.
After extracting statistical data and obtaining additional information from the user, the
ModelRanker component determines which of the ML models are most likely to be
suitable for this task. After that, the generated rating of models is displayed to the
user, and is also saved in the file system by the RankingSaver component.</p>
        <p>To formalize the key functional processes that are carried out in the system and
their relationship, a sequence diagram of actions has been drawn up, which reflects
the basic call operations (fig. 4). The user through the interface sets the path to the
data in the TableManager, which returns meta-information for the selected dataset
table. After specifying the sampling mode and confirming the selected set,
TableAnalyzer extracts statistical information, showing the result to the user. Next, user
specifies the necessary ML methods, hyperparameters, model quality assessment metrics
and selects the target and input columns (stored in the Preferences object), based on
which the required metadata is determined and computational operations are
performed to evaluate the method.</p>
        <p>ModelRanker object sequentially downloads the resulting ML models to generate a
consolidated rating by storing a detailed calculation log in a file and displaying the
output data in short form to the user.</p>
        <p>To visualize the system software implementation structure, a class diagram was
created, a fragment of which is shown in fig. 5. This diagram shows the structural
relationships between the various classes in the DSS. The user interface object has
one-way associations with the table manager object (which in turn is associated with
the table analysis object), the report saving object and the rating creating object. An
analyzing table object creates an object containing metadata and statistical data for
each column, which is a necessary process for rating models. After the user enters all
the necessary options for data analysis, UserInterface creates an object of the
preferences class.</p>
        <p>The functionality of this class is used as the basis for creating a rating. After
creating a rating of models according to user-specified metrics, this object is used by the
user interface to display and the ReportSaver class to save it to the file system. A
special method has been created in the ModelRanker class to load the ML model used
to compile the rating along a given path. This allows us to simplify the process of
choosing a suitable model for ranking in the ranked list form by quickly changing the
model estimation algorithm used, which will allow analysts to train and use the
created models in the future.</p>
        <p>After user receiving the rating it’s possible to estimate the approximate time of the
ML model training process for each of the methods involved and the RAM amount.
The approximate time of creating a model for non-iterative learning algorithms is
estimated based on extrapolating the obtained model training duration on a given
records number and attributes on an active workstation by the records number ratio
and attributes with the data set used. The approximate model training time is
calculated by multiplying the measured training time by the computational complexity with
the substituted ratios. The training speed can largely depend on the user's workstation
hardware therefore, it is not calculated accurately, the error can be up to 15-20%. The
approximate spatial complexity is calculated in a similar way using the known
memory usage and the table size and features ratio.</p>
        <p>In the creating DSS process were used several technologies such as: Python 3.7
programming language, Pandas library for analyzing and manipulating data, library
for working with ML sklearn models, XGBoost library, libraries for performing
mathematical operations and scientific calculations NumPy and SciPy. PyCharm is
used as an IDE. The matplotlib library was used to create the graph visualization
interface.</p>
        <p>The system user interface is implemented using block layout in the web application
form. It includes 4 tabs with graphical components for managing data import
processes (clicking on the corresponding button and selecting the desired data set in the
dialog box), displaying data and statistical information in a tabular form, selecting and
setting model parameters in text fields and drop-down lists and viewing the results of
operations in the graphs and charts forms using Matplotlib and Chart.js libraries.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and results analysis</title>
      <p>
        During the DSS creation several algorithm variants were considered for ranking
the models, taking into account the possibilities of their training and the
implementation complexity, due to the need in each sign of the dataset and metadata. Analysis of
literary sources [
        <xref ref-type="bibr" rid="ref15 ref16 ref17 ref18 ref19">15-19</xref>
        ] let us to perform the following algorithm which based on the
use of an artificial neural network (NN) multilayer perceptron:
      </p>
      <p>1. For each attribute, the values of its statistics, metadata, and task type are used as
input for NN. The NN model provides values that reflect the relative ranking of ML
models by metrics for each attribute.</p>
      <p>2. The generated ratings for each characteristic are taken into account by
calculating the arithmetic average of all issued ratings.</p>
      <p>3. The obtained models relative ratings values by metrics are reduced to a range
from 0 to 1 so that 0 are not suitable models (with a low metrics rating), and 1 is the
most suitable. If a certain metric isn’t compatible with the task type, it will not be
displayed to the user.</p>
      <p>This algorithm provides the possibility of combined accounting for the different
task type, meta-information about data and statistical data for each attribute in the
rating. If a single feature doesn’t affect the quality of the compared models, NN is
able to produce values close to 0 for the rating of such a feature that will not introduce
significant errors and will not affect the final analysis result. Since all operations in
this algorithm can be differentiated, it is possible to train the used ML model by the
stochastic gradient descent method.</p>
      <p>The model formation for ranking is based on the learning algorithm with a teacher.
Metadata, task type, and attribute statistics taken from imported data sets are used as
input for training.</p>
      <p>As the target variables, model ratings obtained by experimental models comparison
using various metrics on data sets are used. Ratings are converted to a range from 0 to
1 so that the value higher, the algorithm is better in evaluating relative to others. The
loss function ensures the correspondence of the generated rating for each ML
experimental rating model.</p>
      <p>The developed DSS has been tested on several experimental model comparisons.
The methods of linear regression, decision tree, random forest, support vector method
(for classification and regression), xgboost, logistic regression, and naive Bayes
classifier were evaluated.</p>
      <p>The comparable metrics used to assess the models quality were AUC, F1, precision
and recall for classification and MSE with MAE for regression. When creating
models, the values of their hyperparameters are set by default in accordance with the
predefined values by the library, except for random forest and xgboost algorithms, in
which the number of trees was set to 64.</p>
      <p>For experimental comparison, a cross-validation method was used on the basis of
dividing the sample into 10 equal parts, with a stratified breakdown for the
classification problem.</p>
      <p>In the study of the system operation the Diabetes dataset was used for the
regression task. The data set has 442 records and 10 attributes with real values. Signs
indicate age, gender, weight to height, average blood pressure, and 6 blood counts. The
target variable is the diabetes degree.</p>
      <p>At the stage of data cleaning, objects in which at least one of the attributes departed
by more than 3 standard deviations for all values of this attribute were deleted from
the dataset.</p>
      <p>After cleaning, 97.3% of the original objects remained. Due to the fact that during
the experimental comparison some of the models used require input data
normalization, all features were normalized to the range from -1 to 1, and the target variable to
the range from 0 to 1.</p>
      <p>After testing the models using the cross-validation method, regression model
metrics (MSE, MAE) were obtained (Table 1). Results are sorted by MSE metric (best to
worst).</p>
      <p>Model Name MSE MAE
Linear Regression 0.030 0.141</p>
      <p>Random Forest 0.033 0.149
Support Vector Regressor 0.034 0.144</p>
      <p>XGBoost 0.034 0.149</p>
      <p>Decision Tree 0.062 0.193</p>
      <p>The experimentally obtained metrics were converted to the models ranking. A
value 1 is the best result, 0 is the worst (Table 2). The results for experimental part were
obtained using the developed software script in Anaconda and Jupiter notebook
environment in usual "step by step" mode. The results for DSS part were obtained by
importing dataset with default model’s hyperparameter setting.</p>
      <p>Further, using the collected statistics on the characteristics, metadata and a data set,
the system generated a ML models rating for this task. As can be seen from the
obtained results, the order of the best models in the rating issued by the system
approximately coincides with the experimental rating.</p>
      <p>Based on this rating, the system was able to determine the best model (according to
experimental comparison), but with less confidence. It is also noticeable that the
evaluation of the support vector mechanism and XGBoost was put in the wrong order.
Evaluation of a bad model, decision trees, was put below others, but far from
experimental values. The reason for the decision tree low rating could be that in regression
tasks the tree outputs only discrete values, which will increase the error of MSE and
MAE.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Summarizing the comparisons of the estimated and experimental model ratings, we
can say that DSS is capable of fairly accurate ML models estimation depending on the
type of task, metadata, and data set statistical information, its level of accuracy
reaches 70-75%. Errors sources in the ML model ratings are: a relatively small training set
and the test data set similarity with the training one, which negatively affects the
model’s generalizing ability to analyze data dependencies.</p>
      <p>Since the recommended ML methods ranked list issued by the DSS is not always
accurate, in tasks with critical requirements for the model’s reliability, completeness
and accuracy it is advisable to conduct additional exploratory analysis on the selected
fragment of the data set according to the first 3 methods issued by the system.</p>
      <p>Due to the fact that the data set table’s size may be larger than free memory, it was
important to make possible records selection without loading the file completely into
memory. The implementation of this functionality depends on which sampling mode
is selected and in what format the table is saved. With stratified random sampling,
memory usage is higher than in other modes, since it is necessary to calculate the
different values number in the selected column. Therefore, there may be cases when
some data are missing in the table; therefore, this problem must be taken into account
when collecting statistics from the table.</p>
      <p>When compiling statistics for some data sets, the missing fragments in the samples
were ignored, which also introduced some errors in the data analysis process. The
ability to fill in the missing data using various algorithms is one of the options for the
future system development and upgrading.
25. Chichirin, E.N.: Intelligent methods in simulation of decision making processes. Computer
tools, networks and systems, vol. 17, pp.86-94 (2018).
26. Kogalovsky, М.Р.: Metadata, their properties, functions, classification and presentation
tools. In the 14th All-Russian Scientific Conference "Electronic libraries: perspective
methods and technologies, electronic collections". Yaroslavl (2012).
27. Kogalovsky М.Р.: Metadata in computer systems. Programming, MAIK Science
"Interperiodica", vol.39, pp.28-46 (2013).
28. Skvortsov, N. A., Bryukhov, D. O., Kalinichenko, L. A., Kovalev, D., Stupnikov. S. A.:
Metadata on scientific methods to ensure their reuse and reproducibility of results.. RCDL.</p>
      <p>Yaroslavl (2014).
29. Kalinichenko, L. A., Stupnikov, S. A., Vovchenko, A. E., Kovalev, D. A.: Conceptual
declarative problem specification and solving in data intensive domains. Informatics and
Applications, vol. 7. (2013).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Rudnichenko</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vychuzhanin</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shybaieva</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shybaiev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Otradskaya, Т.,
          <string-name>
            <surname>Petrov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>The use of machine learning methods to automate the classification of text data arrays large amounts. Information management systems and technologies. Problems and solutions</article-title>
          . Ecology, Odessa, pp.
          <fpage>31</fpage>
          -
          <lpage>46</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Sandryhaila</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moura</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          :
          <article-title>Big data analysis with signal processing on graphs:representation and processing of massive data sets with irregular structure</article-title>
          .
          <source>IEEESignal Process</source>
          . vol.
          <volume>31</volume>
          (
          <issue>5</issue>
          ), pp.
          <fpage>80</fpage>
          -
          <lpage>90</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dietrich</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heller</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Data Science &amp; Big Data Analytics: Discovering, Analyzing, Visualizing</article-title>
          and
          <string-name>
            <given-names>Presenting</given-names>
            <surname>Data</surname>
          </string-name>
          . Wiley, Hoboken (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Rudnichenko</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vychuzhanin</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shybaieva</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shybaiev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Big data intellectual analysis in the diagnosis of the transportation systems technical condition. Systems and means of transport. Problems of operation and diagnostics</article-title>
          . KSMA, Kherson, pp.
          <fpage>57</fpage>
          -
          <lpage>69</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          , Mitchell,
          <string-name>
            <surname>T.M.:</surname>
          </string-name>
          <article-title>Machine learning: trends, perspectives, and prospects</article-title>
          .
          <source>Science</source>
          , vol.
          <volume>349</volume>
          (
          <issue>6245</issue>
          ), pp.
          <fpage>255</fpage>
          -
          <lpage>260</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chiang</surname>
            ,
            <given-names>R.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Storey</surname>
            ,
            <given-names>V.C.</given-names>
          </string-name>
          :
          <article-title>Business intelligence and analytics: From big data to big impact</article-title>
          .
          <source>MIS Q</source>
          . vol
          <volume>4</volume>
          , pp.
          <fpage>1165</fpage>
          -
          <lpage>1188</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Vychuzhanin</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shibaev</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boyko</surname>
            ,
            <given-names>V.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shibaeva</surname>
            ,
            <given-names>N.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudnichenko</surname>
            <given-names>N.D.</given-names>
          </string-name>
          :
          <article-title>Big data mapping in the geopositioning systems for fishing industry</article-title>
          .
          <source>International Scientific and Technical Conference on Computer Sciences and Information Technologies</source>
          . pp.
          <fpage>28</fpage>
          -
          <lpage>31</lpage>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Phillips-Wren</surname>
          </string-name>
          , G.:
          <article-title>Ai Tools in Decision Making Support Systems: A Review</article-title>
          .
          <source>International Journal on Artificial Intelligence Tools</source>
          . vol.
          <volume>21</volume>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Rudnichenko</surname>
            <given-names>N.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vychuzhanin</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shybaiev</surname>
            ,
            <given-names>D.S.:</given-names>
          </string-name>
          <article-title>The use of cluster data analysis to highlight measures of factors affecting the performance similarity of complex technical systems</article-title>
          .
          <source>Informatics and mathematical methods in simulation. vol. 3</source>
          , pp.
          <fpage>214</fpage>
          -
          <lpage>219</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Decision Support Systems with Uncertainties in Big Data Environments. Knowledge-Based Systems</article-title>
          . vol.
          <volume>143</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rudnichenko</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gezha</surname>
            ,
            <given-names>N.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belyaev</surname>
            ,
            <given-names>K.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuzmin</surname>
            ,
            <given-names>A.D.</given-names>
          </string-name>
          :
          <article-title>Performance analysis of machine learning model ensembles. In III All-Ukrainian scientific-practical conference of young scientists, students and cadets “Information protection in information and communication systems”</article-title>
          .
          <source>Lviv</source>
          . pp.
          <fpage>259</fpage>
          -
          <lpage>260</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rojas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>An empirical overview of the no free lunch theorem and its effect on real-world machine learning classification (</article-title>
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          .,
          <string-name>
            <surname>Kamber</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Data mining: concepts and techniques</article-title>
          , Morgan Kaufmann (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Padhy</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mishra</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panigrahi</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>The survey of data mining applications (</article-title>
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Sumiran</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>An overview of data mining techniques and their application in industrial engineering (</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ramageri</surname>
            ,
            <given-names>B. M.</given-names>
          </string-name>
          :
          <article-title>Data mining techniques and applications (</article-title>
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Engels</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bratsas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koupidis</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Musyaffa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Requirements for statistical analytics and data mining (</article-title>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Shalev-Shwartz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben-David</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Understanding machine learning: from theory to algorithms</article-title>
          . Cambridge University Press (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M. I.</given-names>
          </string-name>
          , Mitchell,
          <string-name>
            <surname>T. M.:</surname>
          </string-name>
          <article-title>Machine learning: trends, perspectives, and prospects (</article-title>
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Ayon</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>: Machine learning algorithms:</article-title>
          A Review (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Chugreev</surname>
            ,
            <given-names>V.L.</given-names>
          </string-name>
          :
          <article-title>Decision support systems using machine learning methods and predictive analytics. Problems of economic growth and sustainable development of the Vologda territories</article-title>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>83</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Sinitsyn</surname>
            ,
            <given-names>E.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tolmachev</surname>
            ,
            <given-names>A.V.</given-names>
          </string-name>
          :
          <article-title>Model of the system of decision support in the financial markets for enterprises on the basis of probabilistic analysis and machine learning</article-title>
          .
          <source>Herald UFU. Economics and Management Series</source>
          . vol.
          <volume>18</volume>
          . pp.
          <fpage>378</fpage>
          -
          <lpage>393</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Savenkov</surname>
            ,
            <given-names>P.A.:</given-names>
          </string-name>
          <article-title>Using machine learning methods and algorithms in management decision support systems</article-title>
          .
          <source>Journal of Science and Education</source>
          , vol.
          <volume>55</volume>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>25</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Korneev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Decision support systems in business. Networks and business</article-title>
          . vol.
          <volume>25</volume>
          , pp.
          <fpage>102</fpage>
          -
          <lpage>110</lpage>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>