<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Statistical Analysis of the Popularity of Programming Language Libraries Based on StackOverflow Queries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ihor Rishnyak</string-name>
          <email>ihor.v.rishnyak@lpnu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yurii Matseliukh</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Taras Batiuk</string-name>
          <email>taras.batiuk.mnsa.2020@lpnu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lyubomyr Strembitska</string-name>
          <email>oleksandra.strembitska.sa.2019@lpnu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oksana Mlynko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viktoriia Liashenko</string-name>
          <email>viktoriia.liashenko.sa.2019@lpnu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrii Lema</string-name>
          <email>andrii.lema.sa.2019@lpnu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chyrun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ivan Franko National University of Lviv</institution>
          ,
          <addr-line>University Street, 1, Lviv, 79000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>S. Bandera Street, 12, Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents a statistical analysis of existing trends in the spread of programming language libraries based on data set studies. The various problems that arise when using specific libraries of different programming languages for certain periods, the most common is the month, are studied and analyzed. The results of the study of existing trends in the spread of programming language libraries, collected in the studied dataset, are presented graphically, set key descriptive characteristics, taking into account the correlation of data. Trends in the behavior of the studied indicators using the methods of smoothing time series are determined. A cluster analysis of programming language libraries was performed, making it possible to group data by clusters and form appropriate data groups for ranking programming language libraries.</p>
      </abstract>
      <kwd-group>
        <kwd>Oleksandra</kwd>
        <kwd>1 Statistical analysis</kwd>
        <kwd>information technologies</kwd>
        <kwd>business analysis</kwd>
        <kwd>programming language libraries</kwd>
        <kwd>StackOverflow queries</kwd>
        <kwd>data processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The rapid growth in popularity of programming language libraries based on Stack Overflow queries
has not yet solved the problem of solving complex technical questions that cannot be answered through
queries on the Internet. A typical problem is when developers, looking for answers by submitting
queries in search engines, get all kinds of results, often spam or incorrect, outdated, and sometimes
offtopic. You often have to look for a blog post and then sit with the source for a long time (more than ten
minutes) to identify a way to solve a technical problem in a particular post. Stack Overflow is a place
where developers ask and get a reliable answer. Stack Overflow allows developers to improve their
level as a programmer, using the experience of others. It increases the code experience even for those
already experienced, helping others who have not been able to figure it out themselves. Stack Overflow
is the formation of future technologies as the world's future. The above proves the relevance of the study
of the popularity of libraries of programming languages, where it is crucial to analyze the composition,
structure, and issues of various queries in specific libraries each month. This study is especially relevant
for beginners who are now trying to choose a language. The problem's urgency is no less for experienced
developers to expand their knowledge in studying each subsequent programming language.</p>
      <p>From the point of view of business analysts, this analysis can be considered the creation of a library
rating system, i.e., identifying the most significant number of queries and determining the most popular
languages. Based on the analyzed data, the business analyst will be able to assess the decline, growth,
and invariability of the popularity of languages in 2009-2019 and offer his vision of the possible
development of specific languages.</p>
      <p>The work aims to use the main methods of visualization, graphical display, and primary statistical
processing of numerical data presented by a sample or time series to identify trends in the studied
indicators of programming language libraries, present the nature of their trends, apply time series
smoothing methods and tabulation MS Excel. Determination by methods of correlation analysis of
experimental data presented by time sequences.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>Research on the popularity of programming languages, according to scientists [1-3], is one of the
components of the problem of human capital development. Having hard skills employees and acquiring
soft skills is an important task, as it allows you to solve important social [4-5], economic [6-8] and
technical [9-23] issues. Since its inception, the system we consider in this paper has provided an
opportunity to ask questions about programming and get answers to them for 12 years [24-36].</p>
      <p>Confectioners discuss recipes in culinary forums; students discuss their questions in help groups in
telegrams; parents of these children have joint chats on Viber, where they solve problems. Older people
gather under the porches to discuss neighbors or world news, i.e., every branch of people or
professionals should have a place where he can ask his question, hear an expert's opinion, discuss a
topic or give advice. Therefore, the importance of using the StackOverflow system is beyond doubt.</p>
      <p>Its relevance has been described in many articles [24-36] and videos on YouTube and other social
networks. Also, if you practice programming, you are interested in specific questions and decided to
enter them in the Google search. One of the first results will be the site Stack Overflow. One example
of relevance is the Wikipedia site. He reports that a 2016 study by Android developers using Stack
Overflow generated ten times more functional code (but less secure, which is a disadvantage) than
developers using official documentation [30, 37].</p>
      <p>In researching the chosen topic, we considered the HABR website [31], which was created to publish
news and opinions related to IT and business. These libraries of programming languages will be our
attributes.</p>
      <p>•
•</p>
      <p>Month is here are the day-month data on the library in the StackOverflow program;
NLTK is the number of queries about the NLTK library (a set of libraries and programs for
symbolic and statistical processing of natural languages for English, written in the Python
programming language);
• spaCy is the number of requests for the spacy library (open-source library for advanced
natural language processing (NLP) in Python);
• Stanford-NLP is number of queries about Stanford - NLP library;
• Python is the number of queries about the Python library;
• R is number of requests for r library;
• NumPy is the number of queries about the NumPy library (python language extension);
• SciPy - the number of queries about the spicy library;
• MATLAB is the number of queries about the MATLAB library;
• Machine-Learning - the number of requests for machine learning.</p>
      <p>In these works [31], the user described his story. For seven years, he used the system we are
discussing, and during this time, he "answered 3516 questions, asked 58, entered the hall of fame in
several languages, met many wise people, and actively used all the site's features.</p>
      <p>The issues of the most popular programming languages under discussion and their libraries are
already discussed on the website StackOverflow [30, 37].</p>
      <p>Is it possible to trust the answers to your questions on the site and actively use them? - After all,
users are interested in each question and will quickly correct it in case of error. The HABR website [31]
also explains that the average programmer cannot write code without a break of several hours.
Therefore, to avoid unnecessary distractions and overloads, you can have a great time with like-minded
people on StackOverflow [30, 37]. The user is set a rating by answering the question, and his
"reputation" can rise exponentially, depending on the site activity. After a reputation mark of 25,000,
the user gets access to all SO statistics and permission to store queries in the user database.</p>
      <p>Thus, the SO system is one of the most popular among professional software developers, system
administrators, and programmers. All questions are marked with a specific topic tag (or multiple tags,
depending on the topics involved) to which the question relates. By clicking on the label, you can view
their list to select the topic that interests you. In our case, these are the themes of libraries of different
programming languages.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>To solve the problems in this work, we will use standard methods [38-45].</p>
      <p>The correlation field is a graph that establishes a relationship between variables, where X of each
corresponds to the value of the factor feature (abscissa), and Y - the value of the resultant feature
(ordinate) of a particular unit of observation. The number of points on the Graph corresponds to the
number of observation units. The location of the points indicates the presence and direction of
communication [38-45].</p>
      <p>Building a correlation field is carried out mainly in the following steps: choose two variables that
change over time. Then measure the value of the dependent variable and enter the result in the table.
Then construct a coordinate plane on the X-axis to indicate the value of the independent variable and
on the Y-axis - the dependent. Then you need to mark the points of the correlation field on the Graph.
On the X-axis for the first value of the independent variable, mark a point on the Y-axis corresponding
to the value of the dependent variable. The resulting set of points is called the correlation field [38-45].
We analyze the received schedule and conclude the presence of communication or its absence.</p>
      <p>Correlation coefficient is an indicator used to measure the density of the relationship between traits
in the correlation-regression model of linear dependence [46-52]. The absolute value of the correlation
coefficient ranges from -1 to +1.</p>
      <p>The correlation ratio determines the correlation in any of its forms, namely in, straight or curved.
The correlation ratio can be determined to estimate the curvilinear relationship between the values of
X and Y. It always has a positive value and is in the range from 0 to + 1. The value of the zero ratios is
taken when the relationship between the features is absent [38-52].</p>
      <p>Autocorrelation is the correlation of a function with itself shifted by a certain amount of independent
variable. The autocorrelation function graph can be obtained by plotting the correlation coefficient of
two functions along the ordinate axis and the value along the abscissa axis [38-45]. The autocorrelation
function measures the linearity of the relationship between the elements of the time series spaced apart
at x points in time. The Graph of an autocorrelation function is called a correlogram.</p>
      <p>The correlation matrix is a table that represents the values of the correlation coefficients for different
variables. It shows the numerical value of the correlation coefficient for all combinations of variables.
It is generally used when we need to determine the relationship between more than two variables. It
consists of rows and columns that contain variables, and each cell contains coefficient values that inform
the degree of association and linear relationship between two variables [38-45]. In addition, it can be
used in specific statistical analyzes. Multiple linear regression, where we have several independent
variables and a correlation matrix, helps determine the degree of association.</p>
      <p>The multiple correlation coefficient describes the correlation's intensity, or the relationship's degree
of closeness, between a dependent variable and several independent variables [38-45]. Its value cannot
be less than the absolute value of any partial or straightforward correlation coefficient. The primary
indicator of the closeness of the connection in multiple correlations is the coefficient of multiple
correlations, which has a value from 0 to +1.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>The structure of the dataset [24] is presented in the Table 1 and has ten fields, among which month,
NLTK (Natural Language Toolkit), spaCy, Stanford-NLP, Python, R, NumPy, SciPy, MATLAB,
Machine-Learning, and 132 rows for 12 years.
Charts are used to represent data on a sheet graphically. There are several standard chart types in
Excel. Charts can be ¬placed directly on the sheet next to the data used to build the chart. Such charts
are called embedded. In addition, the chart can occupy a separate sheet in the book, which is called a
chart sheet. No matter how the chart was created, it is always linked to the sheet data. If the data changes,
the chart will be updated automatically [33]. The graphical form of data representation is called a chart.
In the form of a chart, you can provide sets of numbers, sums of money, percentages, dates, and time
values. Chart is created using the Chart Wizard, launched by the Chart Wizard button on the Standard
toolbar (Fig. 1). Output Range is a range of spreadsheet cells ¬that contains data that will be displayed
graphically or in the form of textual explanatory elements. A graphic representation of a single value is
called a data element in the chart. A row of data is a sequence of data arranged in a single row or column
of a spreadsheet and displayed graphically on a chart (Fig. 2). Typically, the value shown in the diagram
depends on another value or set of text values. Such independent values and text values are called data
categories [33].</p>
      <p>25000
20000
15000
10000
5000
0</p>
      <p>Descriptive statistics [25-29, 32-36] provide the basis for the formation of competencies for
choosing a measurement scale, automation of data processing using different formats at the stage of
their collection, presentation of results in various forms, graphical presentation of results, calculation
of statistical distribution parameters, and evaluation of general population parameters using information
technology. It selects quantitative information necessary (or interesting) for different people. Large data
sets must be generalized or collapsed before humans can study them. It is what descriptive statistics do,
which describes, summarizes, or reduces the properties of data sets to the desired type. Descriptive
statistics are used to analyze and interpret statistical data, construct statistical distributions and calculate
the relevant numerical parameters that characterize the study population. It is used to organize
information collection, check the quality of data and their interpretation, and the image of statistical
material [25-29, 32-37]. A result of descriptive statistics shows in the Table 2.</p>
      <p>The construction of histograms interprets the distribution data more apparent [32]. It involves
dividing the entire range of possible values of X into a finite number of intervals (in the
multidimensional case - rectangular) and counting the number of implementations that fall into each of
them (Fig. 3).</p>
      <p>Cumulate is the curve of the interval variation series's accumulated frequencies [34]. The Graph of
the integral distribution function F (x) is compared with the cumulative and is also considered in
probability theory [34]. The concepts of histograms and cumulates are associated with continuous data
and their interval variation series [34]. Their graphs are empirical estimates of the probability density
and distribution function (Fig. 3).</p>
      <p>The methods of smoothing time series are the method of moving average, exponential smoothing,
adaptive smoothing, and their modifications [25-29, 32-36]. They are used to reduce the influence of a
random component (random fluctuations) in time series. They make it possible to obtain more "pure"
values, which consist only of deterministic components. Some of the methods aim to highlight some
components, such as trends [25-29, 32-36]. Smoothing methods can be divided into two classes based
on analytical and algorithmic approaches.</p>
      <p>The simplest way of forecasting is considered to be the approach that determines the forecast
estimate from the achieved level using the average level, average growth, and average growth rate—
extrapolation based on the average level of the series [25-29, 32-36]. When extrapolating
socioeconomic processes based on the average level of the series, the predicted value is taken as the
arithmetic mean of the previous levels of the series. The reliability interval considers the uncertainty
hidden in the estimate of the mean. However, the projected indicator is assumed to be equal to the
average sample value. The approach doesn't consider that individual indicator values fluctuated around
the average in the past [25-29, 32-36]. It will also happen in the future.</p>
      <p>Methods of analytical smoothing include regression analysis and the method of least squares and its
modifications [25-29, 32-36]. To identify the primary trend by the analytical method means to give the
studied process the same development throughout the observation period. Therefore, for 4 of these
methods, choosing the optimal function of the deterministic trend (growth curve) is essential, which
smooths out several observations.</p>
      <p>Frequency</p>
      <p>Algorithm for calculating the weighted average is as follows [25-29, 32-36].
5.1.</p>
    </sec>
    <sec id="sec-5">
      <title>Smoothing according to Kendel formulas - simple moving average</title>
      <p>Smooth the data using the dimensions of the smoothing interval w = 3, 5, 7, 9, 11, 13, 15 are
presented in Fig. 4-Fig. 6. The smoothed data for queries about MatLab are calculated using to Kendel
formulas for the smoothing interval w = 3 (Fig. 4, a), w = 5 (Fig. 4, b), w = 7 (Fig. 4, c), w = 9 (Fig. 5,
a), w = 11 (Fig. 5, b), w = 13 (Fig. 5, c), w = 15 (Fig. 6).</p>
      <p>9,01 9,11 10,09 11,07 12,05 13,03 14,01 14,11 15,09 16,07 17,05 18,03 19,01 19,11</p>
      <p>Forecasting methods based on regression methods are used for short-term and medium-term
forecasting. They do not allow adaptation: the forecasting procedure must be repeated first with the
receipt of new data. The optimal length of the lead period is determined separately for each economic
process, taking into account its statistical instability.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Results</title>
      <p>The most commonly used method is smoothing time series using moving averages [25-29, 32-36].
The algorithm for calculating the moving average is as follows [25-29, 32-36].
20,00%
15,00%
10,00%
5,00%
0,00%
Bin
(1)
(2)
1800
1600
1400
1200
1000
800
600
400
200
matlab w=5
matlab w=7
1800
1600
1400
1200
1000
800
600
400
200
1800
1600
1400
1200
1000
800
600
400
200
0 1 6 11 61 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 011 106 111 116 121 126 131</p>
      <p>Wesmoothedthedatausingthesmoothingintervalw=3,thenwesmoothedtheobtainedsmoothed
dataagain, butusethesizeof the smoothinginterval w= 5.We continued smoothing the obtained data
with a smoothing intervalw = 7 and so on to w = 15.</p>
      <p>1800
1600
1400
1200
1000
800
600
400
200</p>
      <p>0
2000
1500
1000
500</p>
      <p>0
1800
1600
1400
1200
1000
800
600
400
200
0
matlab w=9
1 6 11 61 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 011 106 111 116 121 126 131
matlab w=11
matlab w=13
1 6 11 61 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 011 106 111 116 121 126 131</p>
      <p>The smoothed data for queries about MatLab obtained by the smoothing data again. In Fig. 7 there
are presented the smoothed data for queries about MatLab using the smoothing interval w = 5 (w = 3)
(a), w = 7 (w = 5) (b). Fig. 8 shown the smoothed data for queries about MatLab for w = 9 (w = 7) (a),
w = 11 (w = 9) (b), w= 13 (w = 11) (c), w = 15 (w = 13) (d) according to Kendel formulas.
matlab w=5(w=3)
1600
1400
1200
1000
800
600
400
200</p>
      <p>0
1600
1400
1200
1000
800
600
400
200
0
1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9
1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 0 0 1 1 2 2 2
1 1 1 1 1 1 1 1
a
1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5
1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 0 0 1 1 2 2
1 1 1 1 1 1 1
matlab w=7(w=5)</p>
      <p>In both cases, we find for each smoothing the number of turning points and correlation coefficients
between the original values and the smoothed ones.</p>
    </sec>
    <sec id="sec-7">
      <title>5.2. Smoothing according to Pollard formulas</title>
      <p>John Pollard's algorithm, proposed by him in 1975, is used to factorize integers [28]. It is based on
Floyd's algorithm for finding the length of the cycle in the sequence and some consequences of the
paradox of birthdays. The algorithm most effectively factored composite numbers with relatively minor
factors in the decomposition. All of Pollard's ρ-methods construct a numerical sequence, the elements
of which form a loop, starting with some number n, which can be illustrated by the arrangement of
numbers in the Greek letter ρ. It was the name for a family of methods [28].</p>
      <p>We smooth the data for queries about R using the same dimensions of the smoothing interval (w =
3, 5, 7, 9, 11, 13, 15). It is presented in Fig. 9-Fig. 11. The smoothed data for queries about R are
calculated using Pollard formulas for the smoothing interval w = 3 (Fig. 9, a), w = 5 (Fig. 9, b), w = 7
(Fig. 9, c), w = 9 (Fig. 10, a), w = 11 (Fig. 10, b), w = 13 (Fig. 10, c), w = 15 (Fig. 11).
1 5 9 31 71 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 011 105 109 113 117 121 125 129
r w=3
a
6000
5000
4000
3000
2000
1000</p>
      <p>0
6000
5000
4000
3000
2000
1000
0
1 5 9 31 71 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 011 105 109 113 117 121 125 129
r w=7</p>
      <p>c
Figure 10: The smoothed data for queries about R using the smoothing interval w = 9 (a), w = 11 (b),
w = 13 (c) according to Pollard formulas</p>
      <p>We smooth the data using the size of the smoothing interval w = 3, then we smooth the obtained
smoothed data again, but we use the size of the smoothing interval w = 5.</p>
      <p>The smoothed data for queries about R obtained by the smoothing data again. In Fig. 12 there are
presented the smoothed data for queries about R using the smoothing interval w = 5 (w = 3) (a), w = 7
(w = 5) (b). Fig. 13 shown the smoothed data for queries about R for w = 9 (w = 7) (a), w = 11 (w = 9)
(b), w = 13 (w = 11) (c), w = 15 (w = 13) (d) according to Pollard formulas.</p>
      <p>1 01 19 28 37 46 55 64 73 82 91 010 109 118 127
1 01 19 28 37 46 55 64 73 82 91 010 109 118 127</p>
    </sec>
    <sec id="sec-8">
      <title>5.3. Exponential smoothing</title>
      <p>We add all sample elements to construct exponential smoothing and multiply by a factor (1 - α). The
α takes values from zero to one, and the last element of the already created table of values for a certain
α is multiplied by α (the Sum of coefficients should be equal to 1). The following is a graph of
exponential smoothing for all required α.</p>
      <p>Exponential smoothing queries about Machine Learning for а=0.1 (a), а=0.15 (b), а=0.2 (c), а=0.25
(d), а=0.3 (e) are presented in the Fig. 14.</p>
      <p>We find the number of turning points and coefficients for each smoothing correlation between
original and smoothed values. The correlation coefficients between the original values and the
smoothed ones are calculated in the Table 4.</p>
      <p>1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9
1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 0 0 1 1 2 2 2
1 1 1 1 1 1 1 1
1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9
1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 0 0 1 1 2 2 2
1 1 1 1 1 1 1 1
1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9
1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 0 0 1 1 2 2 2
1 1 1 1 1 1 1 1
machine-learning а=0.1</p>
      <p>e
Figure14:Exponentialsmoothing queriesaboutMachine Learningfor а=0.1 (a), а=0.15 (b),а=0.2(c),
а=0.25 (d), а=0.3(e)</p>
      <sec id="sec-8-1">
        <title>The correlation coefficients between the original values and the smoothed ones</title>
      </sec>
      <sec id="sec-8-2">
        <title>Factor α</title>
      </sec>
      <sec id="sec-8-3">
        <title>Number of correct</title>
        <p>turning points
Correlationcoefficient 0,958867 0,964152 0,96739 0,969568 0,971129</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>5.4. Median smoothing</title>
      <p>Median smoothing queries about Python for w=3 (a), w=5 (b), w=7 (a), w=9 (a), w=11 (a), w=13
(a), w=15 (a) are presented in the Fig. 15-Fig. 17.</p>
      <p>1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9
1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 0 0 1 1 2 2 2
1 1 1 1 1 1 1 1
python w=3
a
b
1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9
1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 0 0 1 1 2 2 2
1 1 1 1 1 1 1 1
python w=5
25000
20000
15000
10000
5000</p>
      <p>0
25000
20000
15000
10000
5000</p>
      <p>0
1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9
1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 0 0 1 1 2 2 2
1 1 1 1 1 1 1 1
python w=7
a
1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9
1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 0 0 1 1 2 2 2
1 1 1 1 1 1 1 1
python w=9
1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9
1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 0 0 1 1 2 2 2
1 1 1 1 1 1 1 1
python w=11
python</p>
      <p>w=13
a
1 5 9 31 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 011 105 109 113 117 121 125 129
python</p>
      <p>w=15</p>
    </sec>
    <sec id="sec-10">
      <title>6. Discussions</title>
    </sec>
    <sec id="sec-11">
      <title>6.1. Data correlation</title>
      <p>Correlation analysis is a group of methods that can detect the presence and degree of relationship
between several parameters that change randomly [24]. Two samples (data sets) are studied in the
simplest case. Their multidimensional complexes (groups) are studied in the general case. The purpose
of correlation analysis is to determine whether one variable has a significant dependence on another
[25]. The main tasks of correlation analysis are the definition and expression of the form of analytical
dependence of the resultant trait y on the factor traits хі.</p>
      <p>There are the following stages of correlation analysis [24, 25].
• Identifying the relationship between the signs;
• Determining the form of communication;
• Determination of strength (tightness) and direction of communication.</p>
      <p>Advantages of correlation analysis are as following.
• Ability to create a new rule of the interaction of functions with each other;
• Estimation of the interaction of functions received strangely.</p>
      <p>Disadvantages are as following. The results obtained using the technique can be used only in the
field of this study or close to it.</p>
      <p>A correlation occurs when a series of values of a function (dependent variable) corresponds to the
same value of an argument (independent variable) [24].</p>
      <p>To construct a correlation field, we considered the definition of the concept of correlation field. The
correlation field (scatter plot) is a graphical representation of the relationship between the two studied
sequences [24, 25]. Thus, it is a set of points in a rectangular coordinate system, the abscissa of each of
which corresponds to the value of the factor feature (x), and the ordinate - the value of the resultant
feature (y) of a particular unit of observation. The number of points on the Graph corresponds to the
number of observation units. The location of points on the correlation field allows you to judge the
nature of the dependence, for example, linear, parabolic, hyperbolic, logistical, logarithmic,
exponential, exponential, or no dependence [24].</p>
      <p>Fig. 18 shows the behavior of the correlation field for queries about the python programming
language for only one month for each day. From Fig. 18 it is seen that the nature of the dependence is
linear. The dependence is described by an equation y = 1932x - 17317 with a high coefficient of
determination R² = 0,972.</p>
      <p>25000
20000
15000
10000
5000
0
0</p>
      <p>The correlation field is built from the input data (x and game) in the form of a scatter plot. Analyzing
the location of points on the correlation field, we can judge the nature of the dependence, namely that
it is linear. Request dates start from 2009 and are collected until 2019 inclusive, broken down by all
months. The lowest number of requests for Python was one month of 2009 and increased with each
passing month, indicating the language's growing popularity and increased number of users. Data from
2019 to 2021 are not collected in the network date. However, analyzing the statistics, we can predict
even more significant growth in the popularity of the programming language, as there are requests for
its library. That is, the data has a growing trend.</p>
      <p>We are determining the value of the correlation coefficient. A sample correlation coefficient is used
to quantify the closeness of the connection. The correlation coefficient characterizes the degree of
closeness of the linear dependence. In general, when some stochastic dependence relates the X and Y
values, the correlation coefficient may have a value in the range of -1 ≤ r ≤ +1 [24].</p>
      <p>The formula for calculating the correlation coefficient is as following.
3rd part
[89; 132]</p>
      <p>44
3rd part
1
(4)
(5)
1st
2nd
3rd
The formula for calculating the autocorrelation coefficient is as following [24-29].</p>
      <p>The calculated correlation coefficient for queries about Python is equal to R2=0,98588536. It is a
good correlation coefficient, and it shows that there is a dependence; it is linear and quite close.</p>
      <p>The correlation ratio is used in cases where there are following case [24-29].
• Between a pair of studied features, there is a nonlinear relationship;
• The nature of the sample data (number, density of location on the correlation field) allows their
grouping on the y-axis, and secondly, the ability to calculate "individual" mathematical expectations
within each grouping interval.</p>
      <p>According to the preliminary construction of the correlation field, we see that the Graph is linear, so
it is impractical to calculate the correlation ratio.</p>
      <p>To divide one of the sequences into three equal parts we divide the sequence, corresponding to the
number of queries about the python in the programming language library (Table 5).</p>
      <p>As we can see the partition is performed so that the number of sample elements at each interval is
the same, and it is equal 44. In the case of many observations, when the correlation coefficients need to
be calculated sequentially for several samples, for convenience, the obtained coefficients are
summarized in tables, which are called correlation matrices.</p>
      <p>The correlation matrix is a square table where the correlation coefficient between the corresponding
parameters is located at the corresponding row and column intersection [24-29].</p>
      <p>Dividing the sample into three equal parts, we build a correlation matrix (Table 6).</p>
      <p>The statistical scientific sources [24-29] are recommended to use the following expression to
calculate the correlation coefficient.</p>
      <p>To calculate the autocorrelation coefficient according to the formula (5). We used the CORREL
function. The autocorrelation coefficient for queries about Python is presented in Table 7. The sequence
of autocorrelation coefficients of the levels of the first, second, third, etc. orders is called the
autocorrelation function. The Graph of the autocorrelation function is called the correlogram [25-29].</p>
      <p>The correlogram for the queries about python is presented in Fig. 19. The pattern of the correlogram
shows that the studied series is not stationary because in the case of a stationary time series, the
correlogram must decline rapidly.</p>
      <p>0,988
0,986
0,984
0,982
0,980
0,978
0,976
0,974
0,972
0,987
0,981
0,983
0,978
0,978
0,977
0,977
1
2
3
4
5
6
7</p>
    </sec>
    <sec id="sec-12">
      <title>The cluster data analysis</title>
      <p>To form an "object-property" table from our data, let's split the data so that the 2nd, 3rd, 4th, and 5th
columns can be considered objects. The first column will then be considered a property. To calculate
each of the properties, we use the standard formulas [53-62]. To calculate the properties in column
2016, we used only the data for queries collected from this 2016 (Table 8). The term "average" in the
Table 8 means the average number of queries in the NumPy library overall 12 months. Accordingly,
"minimum" shows the lowest number of requests during the year (for a month), and "maximum" - the
most. "Volume" - the number of lines for a given year. There are 12 of them every year, because of 12
months a year. "Fashion" is the value of a certain quantity, which occurs most often in all observations.
Since the statistics on queries changed every month and there was never one repeated for at least two
months, cannot talk about fashion, it is impossible to determine. "Median" is a number that divides the
list of attribute values into two equal parts so that there is the same number of units on both sides.
"Standard error" is the approximate standard deviation of the statistical sample. The more data points
involved in calculating the mean, the smaller the standard error [63-79]. "Standard deviation" is the
deviation of all characteristic values from their average value.</p>
      <p>It is one of the essential methods to help determine how much a particular value change [74-79].
The larger the standard deviation, the more comprehensive the range of changes in the values of this
value "Amount" - the total number of requests to the library for twelve months for each described year.
The "level of reliability" is the ability to reject the null hypothesis when it is correct. It is a good
possibility of error of the first kind for this task. "Sampling variance" - allows you to measure how far
random values are distributed from their average value. Larger variance values indicate more significant
deviations of the values of the random variable from the center of the distribution. "Excess" is a
numerical characteristic of the probability distribution of an objective random variable. The excess
coefficient characterizes the "steepness," i.e., the rate of increase of the distribution curve compared to
the standard curve. "Asymmetry" measures how asymmetric the distribution (skew) can be. If we talk
about the opposite concept of symmetry, the distribution relative to the center on the right and left is
ideal mirror images of each other. "Interval" - the interval between the extreme values of the feature in
the group of units. To construct a matrix of similarities (Table 9) we used formula (6) by analogy with
the previous Table 8 [53-73].</p>
      <p>The resulting proximity matrix (Table 9) is a symmetric diagonal matrix that indicates the amount
of proximity between objects. Agglomerative hierarchical cluster analysis is performed based on such
a matrix. The choice of integration strategy is determined by the approach. We chose the strategy of the
nearest neighbor. In it, the distance between two groups is defined as the distance between the two
closest elements of these groups.</p>
      <p>After performing the cluster analysis procedure sequentially, we obtained proximity matrices for 3
(Table 10) and 2 clusters (Table 11).</p>
      <p>(6)</p>
      <p>The cluster analysis procedure starts with the proximity matrix. In it, we determine the smallest
number. It is 508047.4, located at the 1st and 3rd objects intersection. Therefore, we group the 1st and
3rd objects and create a new table. Now determine the minimum number again. This time it is at the
intersection of objects (1.3) and (2). We are grouping them again. We built a table "union-node-metric"
(Table 12).</p>
      <p>Our union-node-metric table is formed in 3 steps. In the first, there is a union of objects 1 and 3. In
the second step of objects (1,3) and 2. In the third (1,3,2) and 4. According to the steps, nodes are
formed, named d 5, d 6, and d 7, because there are four objects, and the next numbering begins after
the 4th. And the representation of the metric is the minimum value at each stage of the construction of
the table.</p>
      <p>The constructed dendrogram for programming language libraries can help us to visualize the results
of cluster analysis in the Fig.20. We construct the dendrogram of clustering several objects manually in
the draft version and then implement it in a graphical environment. Indicators on the dendrogram on
the left represent the metric, the bottom objects, and the top point to each node separately. Drawing
horizontal lines in the plane of the dendrogram at a given height, in this case, allows you to select
individual clusters.</p>
      <p>When interpreting the results of cluster analysis, we observe 3 clusters at level 2248031, among
which cluster 1 includes objects 1.3, the second cluster - only one object 2, and the third - only object
4. At level 983393,4 we observe 2 clusters, among which the first cluster includes three objects 1,3,2,
and the second - only one object 4. At the level of 508047,4 we observe one cluster of all elements.</p>
    </sec>
    <sec id="sec-13">
      <title>7. Conclusions</title>
      <p>In this work, we learned the basic visualization methods, graphical display, and primary statistical
processing of numerical data represented by a sample of time series.</p>
      <p>We got acquainted with the main methods of highlighting the trend of the behavior of the studied
indicator, which is represented by the nature of its trend, using methods of smoothing time series and
presenting the results using an MS Excel spreadsheet.</p>
      <p>We also got acquainted with the methods of correlation analysis of experimental data presented by
time sequences. We learned to build a correlation field, determine the value of the correlation
coefficient, calculate the correlation ratio, plot autocorrelation functions, divide one of the sequences
into three equal parts, build a correlation matrix for them and find multiple correlation coefficients. We
also divided a given set of objects, each characterized by the same set of specific features, into separate
groups using hierarchical agglomerative cluster analysis.</p>
      <p>A library rating system has been created, i.e., the most significant number of queries has been
identified, and the most popular language has been identified. In ranking queries in language libraries,
where the first is Python, the least popular - is spacy. The tendency of the growing popularity of all
language libraries characterizes the active development of programming and, most importantly,
people's interest in the work. The obtained data will allow experts to assess the decline, growth, and
invariability of the popularity of languages in the recent period (2009-2019) and offer their vision of
the possible development of specific programming languages.</p>
    </sec>
    <sec id="sec-14">
      <title>8. References</title>
      <p>[1] O. Kuzmin, M. Bublyk, A. Shakhno, O. Korolenko, H. Lashkun, Innovative development of
human capital in the conditions of globalization, E3S Web of Conferences 166 (2020) 13011.
[2] I. Bodnar, M. Bublyk, O. Veres, O. Lozynska, I. Karpov, Y. Burov, P. Kravets, I. Peleshchak, O.</p>
      <p>Vovk, O. Maslak, Forecasting the risk of cervical cancer in women in the human capital
development context using machine learning, CEUR workshop proceedings Vol-2631 (2020)
491501.
[3] M. Bublyk, V. Vysotska, Y. Matseliukh, V. Mayik, M. Nashkerska, Assessing losses of human
capital due to man-made pollution caused by emergencies, CEUR Workshop Proceedings
Vol2805 (2020) 74-86.
[4] D. Koshtura, M. Bublyk, Y. Matseliukh, D. Dosyn, L. Chyrun, O. Lozynska, I. Karpov, I.</p>
      <p>Peleshchak, M. Maslak, O. Sachenko, Analysis of the demand for bicycle use in a smart city based
on machine learning, CEUR workshop proceedings Vol-2631 (2020) 172-183.
[5] M. Bublyk, Y. Matseliukh, U. Motorniuk, M. Terebukh, Intelligent system of passenger
transportation by autopiloted electric buses in Smart City, CEUR workshop proceedings Vol-2604
(2020) 1280-1294.
[6] I. Rishnyak, O. Veres, V. Lytvyn, M. Bublyk, I. Karpov, V. Vysotska, V. Panasyuk,
Implementation models application for IT project risk management, CEUR Workshop Proceedings
Vol-2805 (2020) 102-117.
[7] V. Vysotska, A. Berko, M. Bublyk, L. Chyrun, A. Vysotsky, K. Doroshkevych, Methods and tools
for web resources processing in e-commercial content systems, in: Proceedings of 15th
International Scientific and Technical Conference on Computer Sciences and Information
Technologies, CSIT, 1, 2020, pp. 114-118. doi: 10.1109/CSIT49958.2020.9321950.
[8] M. Bublyk, A. Kowalska-Styczen, V. Lytvyn, V. Vysotska, The Ukrainian Economy
Transformation into the Circular Based on Fuzzy-Logic Cluster Analysis, Energies 2021 (14)
5951. doi: 10.3390/en14185951.
[9] A. Berko, I. Pelekh, L. Chyrun, M. Bublyk, I. Bobyk, Y. Matseliukh, L. Chyrun, Application of
ontologies and meta-models for dynamic integration of weakly structured data, in: Proceedings of
the IEEE 3rd International Conference on Data Stream Mining and Processing, DSMP, 2020, pp.
432-437. doi: 10.1109/DSMP47368.2020.9204321.
[10] V.-A. Oliinyk, V. Vysotska, Y. Burov, K. Mykich, V. Basto-Fernandes, Propaganda Detection in
Text Data Based on NLP and Machine Learning, CEUR workshop proceedings Vol-2631 (2020)
132-144.
[11] R. Lynnyk, V. Vysotska, Y. Matseliukh, Y. Burov, L. Demkiv, A. Zaverbnyj, A. Sachenko, I.</p>
      <p>Shylinska, I. Yevseyeva, O. Bihun, DDOS Attacks Analysis Based on Machine Learning in
Challenges of Global Changes, CEUR workshop proceedings Vol-2631 (2020) 159-171.
[12] V. Vysotska, Linguistic Analysis of Textual Commercial Content for Information Resources
Processing, in: Proceedings of the International Conference on Modern Problems of Radio
Engineering, Telecommunications and Computer Science, TCSET, 2016, pp. 709-713. doi:
10.1109/TCSET.2016.7452160.
[13] V. Lytvyn, V. Vysotska, A. Rzheuskyi, Technology for the Psychological Portraits Formation of
Social Networks Users for the IT Specialists Recruitment Based on Big Five, NLP and Big Data
Analysis, CEUR Workshop Proceedings Vol-2392 (2019) 147-171.
[14] Lytvyn Vasyl, Vysotska Victoria, Dosyn Dmytro, Holoschuk Roman, Rybchak Zoriana,
Application of Sentence Parsing for Determining Keywords in Ukrainian Texts, in: Proceedings
of the International Conference on Computer Sciences and Information Technologies, CSIT, 2017,
pp. 326-331. doi: 10.1109/STC-CSIT.2017.8098797.
[15] Y. Burov, V. Vysotska, P. Kravets, Ontological approach to plot analysis and modeling, CEUR</p>
      <p>Workshop Proceedings Vol-2362 (2019) 22-31.
[16] V. Vysotska, O. Kanishcheva, Y. Hlavcheva, Authorship Identification of the Scientific Text in
Ukrainian with Using the Lingvometry Methods, in: Proceedings of the International Conference
on Computer Sciences and Information Technologies, CSIT, 2018, pp. 34-38. doi:
10.1109/STCCSIT.2018.8526735.
[17] A. Gozhyj, I. Kalinina, V. Gozhyj, V. Vysotska, Web service interaction modeling with colored
petri nets, in: Proceedings of the International Conference on Intelligent Data Acquisition and
Advanced Computing Systems: Technology and Applications, IDAACS, 1, 2019, pp. 319-323.
doi: 10.1109/IDAACS.2019.8924400.
[18] A. Gozhyj, I. Kalinina, V. Vysotska, S. Sachenko, R. Kovalchuk, Qualitative and Quantitative
Characteristics Analysis for Information Security Risk Assessment in E-Commerce Systems,
CEUR Workshop Proceedings Vol-2762 (2020) 177-190.
[19] L. Podlesna, M. Bublyk, I. Grybyk, Y. Matseliukh, Y. Burov, P. Kravets, O. Lozynska, I. Karpov,
I. Peleshchak, R. Peleshchak, Optimization model of the buses number on the route based on
queueing theory in a smart city, CEUR workshop proceedings Vol-2631 (2020) 502 - 515.
[20] O. Bisikalo, O. Kovtun, V. Kovtun, V. Vysotska, Research of Pareto-Optimal Schemes of Control
of Availability of the Information System for Critical Use, CEUR Workshop Proceedings
Vol2623 (2020) 174-193.
[21] V. Vysotska, Ukrainian Participles Formation by the Generative Grammars Use, CEUR workshop
proceedings Vol-2604 (2020) 407-427.
[22] V. Vysotska, S. Holoshchuk, R. Holoshchuk, A comparative analysis for English and Ukrainian
texts processing based on semantics and syntax approach, CEUR Workshop Proceedings Vol-2870
(2021) 311-356.
[23] K. Tymoshenko, V. Vysotska, O. Kovtun, R. Holoshchuk, S. Holoshchuk, Real-time Ukrainian
text recognition and voicing, CEUR Workshop Proceedings Vol-2870 (2021) 357-387.
[24] Data Set, 2022. URL: https://www.kaggle.com/aishu200023/stackindex.
[25] M. Bublyk, Y. Matseliukh, Small-batteries utilization analysis based on mathematical statistics
methods in challenges of circular economy, CEUR workshop proceedings Vol-2870 (2021)
15941603.
[26] Standard error, 2022. URL: https://ua.nesrakonk.ru/standard-error/.
[27] Standard deviation, 2022. URL: https://studopedia.su/10_11382_standartne-vidhilennya.html.
[28] Statistical models of marketing decisions taking into account the uncertainty factor, 2022. URL:
https://excel2.ru/articles/uroven-znachimosti-i-uroven-nadezhnosti-v-ms-excel.
[29] Grouping of statistical data - BukLib.net Library, 2022. URL: https://buklib.net/books/35946/.
[30] Stack Overflow, 2022. URL: https://en.wikipedia.org/wiki/Stack_Overflow.
[31] StackOverflow is more than just a repository of answers to stupid questions, 2022. URL:
https://habr.com/ru/post/482232/.
[32] TechTrend, 2022. URL: http://techtrend.com.ua/index.php?newsid=20844.
[33] Graphic presentation of information, 2022. URL:
https://studopedia.com.ua/1_132145_grafichnepodannya-informatsii.html.
[34] Construction of an interval variable sequence of continuous quantitative data, 2022. URL:
https://stud.com.ua/93314/statistika/pobudova_intervalnogo_variatsiynogo_ryadu_bezperernih_k
ilkisnih_danih.
[35] Forecasting the trend of the time series by algorithmic methods, 2022. URL:
http://ubooks.com.ua/books/000269/inx42.php.
[36] Wikideck, 2022. URL: https://wp-uk.wikideck.com/.
[37] StackOverflow, 2022. URL: https://ru.stackoverflow.com.
[38] P. Bidyuk, A. Gozhyj, I. Kalinina, V. Vysotska, Methods for Forecasting Nonlinear
NonStationary Processes in Machine Learning, Communications in Computer and Information Science
1158 (2020) 470-485. doi: 10.1007/978-3-030-61656-4_32.
[39] P. Bidyuk, A. Gozhyj, I. Kalinina, V. Vysotska, M. Vasilev, R. Malets, Forecasting Nonlinear
Nonstationary Processes in Machine Learning Task, in: Proceedings of the IEEE 3rd International
Conference on Data Stream Mining and Processing, DSMP, 2020, pp. 28-32. doi:
10.1109/DSMP47368.2020.9204077.
[40] A. B. Lozynskyy, I. M. Romanyshyn, B. P. Rusyn, Intensity Estimation of Noise-Like Signal in
Presence of Uncorrelated Pulse Interferences, Radioelectronics and Communications Systems
62(5) (2019) 214-222. doi: 10.3103/S0735272719050030.
[41] N. Romanyshyn, Algorithm for Disclosing Artistic Concepts in the Correlation of Explicitness and
Implicitness of Their Textual Manifestation, CEUR Workshop Proceedings Vol-2870 (2021)
719730.
[42] O. Rudenko, O. Bezsonov, Robust Training of ADALINA Based on the Criterion of the Maximum
Correntropy in the Presence of Outliers and Correlated Noise, CEUR Workshop Proceedings
Vol2870 (2021) 1694-1705.
[43] Y. Yusyn, T. Zabolotnia, Methods of Acceleration of Term Correlation Matrix Calculation in the</p>
      <p>Island Text Clustering Method, CEUR workshop proceedings Vol-2604 (2020) 140-150.
[44] B. Rusyn, V. Ostap, O. Ostap, A correlation method for fingerprint image recognition using
spectral features, in: Proceedings of the International Conference on Modern Problems of Radio
Engineering, Telecommunications and Computer Science, TCSET 2002, 2002, pp. 219–220. doi:
10.1109/TCSET.2002.1015935.
[45] A. Lozynskyy, I. Romanyshyn, B. Rusyn, V. Minialo, Robust Approach to Estimation of the
Intensity of Noisy Signal with Additive Uncorrelated Impulse Interference. In: Proceedings of the
2018 IEEE 2nd International Conference on Data Stream Mining and Processing, DSMP 2018,
2018, pp. 251–254. doi: 10.1109/DSMP.2018.8478625.
[46] N. Boyko, O. Moroz, Comparative Analysis of Regression Regularization Methods for Life</p>
      <p>Expectancy Prediction, CEUR Workshop Proceedings Vol-2917 (2021) 310-326.
[47] L. Mochurad, Optimization of Regression Analysis by Conducting Parallel Calculations, CEUR</p>
      <p>Workshop Proceedings Vol-2870 (2021) 982-996.
[48] R. Yurynets, Z. Yurynets, D. Dosyn, Y. Kis, Risk Assessment Technology of Crediting with the</p>
      <p>Use of Logistic Regression Model, CEUR Workshop Proceedings Vol-2362 (2019) 153-162.
[49] A. Kucher, O. Boyko, K. Ilkanych, A. Fechan, N. Shakhovska, Retrospective analysis by
multifactor regression in the evaluation of the results of fine-needle aspiration biopsy of thyroid
nodules, CEUR Workshop Proceedings Vol-2753 (2020) 443–447.
[50] O. Murzenko, S. Olszewski, O. Boskin, I. Lurie, N. Savina, M. Voronenko, V. Lytvynenko,
Application of a combined approach for predicting a peptide-protein binding affinity using
regulatory regression methods with advance reduction of features, in: Proceedings of the 10th IEEE
International Conference on Intelligent Data Acquisition and Advanced Computing Systems:
Technology and Applications, IDAACS , 2019, 1, pp. 431–435, 8924244. doi:
10.1109/IDAACS.2019.8924244.
[51] B. van Stein, H. Wang, W. Kowalczyk, T. Bäck, M. Emmerich, Optimally weighted cluster kriging
for big data regression, Lecture Notes in Computer Science (including subseries Lecture Notes in
Artificial Intelligence and Lecture Notes in Bioinformatics) 9385 (2015) 310–321. doi:
10.1007/978-3-319-24465-5_27.
[52] C. L. M. Belusso, S. Sawicki, V. Basto-Fernandes, R. Z. Frantz, F. Roos-Frantz, Price modeling
of laaS providers using multiple regression [Modelagem de Preços de Provedores de IaaS
Utilizando Regressão Múltipla], in: Iberian Conference on Information Systems and Technologies,
CISTI, 2017. 10.23919/CISTI.2017.7975845.
[53] P. Kravets, Y. Burov, V. Lytvyn, V. Vysotska, Gaming method of ontology clusterization,</p>
      <p>Webology 16(1) (2019) 55-76.
[54] P. Kravets, Y. Burov, O. Oborska, V. Vysotska, L. Dzyubyk, V. Lytvyn, Stochastic Game Model
of Data Clustering, CEUR Workshop Proceedings Vol-2853 (2021) 214-227.
[55] I. Lurie, V. Lytvynenko, S. Olszewski, M. Voronenko, A. Kornelyuk, U. Zhunissova, О. Boskin,
The Use of Inductive Methods to Identify Subtypes of Glioblastomas in Gene Clustering, CEUR
Workshop Proceedings Vol-2631 (2020) 406-418.
[56] Y. Bodyanskiy, A. Shafronenko, I. Klymova, Adaptive Recovery of Distorted Data Based on</p>
      <p>Credibilistic Fuzzy Clustering Approach, CEUR Workshop Proceedings Vol-2870 (2021) 6-15.
[57] Y. Meleshko, M. Yakymenko, S. Semenov, A Method of Detecting Bot Networks Based on Graph
Clustering in the Recommendation System of Social Network, CEUR Workshop Proceedings
Vol2870 (2021) 1249-1261.
[58] N. Boyko, S. Hetman, I. Kots, Comparison of Clustering Algorithms for Revenue and Cost</p>
      <p>Analysis, CEUR Workshop Proceedings Vol-2870 (2021) 1866-1877.
[59] R. J. Kosarevych, B. P. Rusyn, V. V. Korniy, T. I. Kerod, Image Segmentation Based on the
Evaluation of the Tendency of Image Elements to form Clusters with the Help of Point Field
Characteristics, Cybernetics and Systems Analysis 51(5) (2015) 704-713. doi:
10.1007/s10559015-9762-5.
[60] S. Babichev, B. Durnyak, I. Pikh, V. Senkivskyy, An Evaluation of the Objective Clustering
Inductive Technology Effectiveness Implemented Using Density-Based and Agglomerative
Hierarchical Clustering Algorithms, Advances in Intelligent Systems and Computing 1020 (2020)
532-553. doi:10.1007/978-3-030-26474-1_37.
[61] S. Babichev, M. A. Taif, V. Lytvynenko, V. Osypenko, Criterial analysis of gene expression
sequences to create the objective clustering inductive technology, in: Proceedings of the
International Conference on Electronics and Nanotechnology, ELNANO, 2017, pp. 244–248. doi:
10.1109/ELNANO.2017.7939756.
[62] S. Babichev, V. Lytvynenko, V. Osypenko, Implementation of the objective clustering inductive
technology based on DBSCAN clustering algorithm, in: Proceedings of the 12th International
Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT,
2017, 1, pp. 479-484. doi: 10.1109/STC-CSIT.2017.8098832.
[63] S. A. Babichev, A. Gozhyj, A. I. Kornelyuk, V. I. Lytvynenko, Objective clustering inductive
technology of gene expression profiles based on SOTA clustering algorithm, Biopolymers and Cell
33(5) (2017) 379–392. doi: 10.7124/bc.000961.
[64] V. Lytvynenko, I. Lurie, J. Krejci, M. Voronenko, N. Savina, M. A. Taif., Two Step Density-Based</p>
      <p>Object-Inductive Clustering Algorithm, CEUR Workshop Proceedings Vol-2386 (2019) 117-135.
[65] S. Mashtalir, O. Mikhnova, M. Stolbovyi, Multidimensional Sequence Clustering with Adaptive</p>
      <p>Iterative Dynamic Time Warping, International Journal of Computing 18(1) (2019) 53-59.
[66] R. Melnyk, R. Tushnytskyy, 4-D pattern structure features by three stages clustering algorithm for
image analysis and classification, Pattern Analysis and Applications 16(2) (2013) 201-211. doi:
10.1007/s10044-013-0326-x.
[67] R. Melnyk, R. Tushnytskyy, Circuit board image analysis by clustering, in: Proceeding of the 4th
International Conference of Young Scientists on Perspective Technologies and Methods in MEMS
Design, MEMSTECH, 2008, pp. 44-45. doi: 10.1109/MEMSTECH.2008.4558732.
[68] N. Shakhovska, V. Yakovyna, N. Kryvinska, An improved software defect prediction algorithm
using self-organizing maps combined with hierarchical clustering and data preprocessing, Lecture
Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics) 12391 (2020) 414–424. doi: 10.1007/978-3-030-59003-1_27.
[69] S. Babichev, V. Osypenko, V. Lytvynenko, M. Voronenko, M. Korobchynskyi, Comparison
Analysis of Biclustering Algorithms with the use of Artificial Data and Gene Expression Profiles,
in: Proceeding of the IEEE 38th International Conference on Electronics and Nanotechnology,
ELNANO, 2018, pp. 298–304. doi: 10.1109/ELNANO.2018.8477439.
[70] S. Babichev, J. Krejci, J. Bicanek, V. Lytvynenko, Gene expression sequences clustering based on
the internal and external clustering quality criteria, Proceedings of the 12th International Scientific
and Technical Conference on Computer Sciences and Information Technologies, CSIT, 2017, 1,
pp. 91–94. doi: 10.1109/STC-CSIT.2017.8098744.
[71] S. Babichev, V. Lytvynenko, J. Skvor, J. Fiser, Model of the objective clustering inductive
technology of gene expression profiles based on SOTA and DBSCAN clustering algorithms,
Advances in Intelligent Systems and Computing 689 (2018) 21–39. doi:
10.1007/978-3-31970581-1_2.
[72] N. Shakhovska, V. Vysotska, L. Chyrun, Features of E-Learning Realization Using Virtual
Research Laboratory, in: Proceedings of the International Conference on Computer Sciences and
Information Technologies, CSIT, 2016, pp. 143–148. doi: 10.1109/STC-CSIT.2016.7589891.
[73] N. Shakhovska, V. Vysotska, L. Chyrun, Intelligent Systems Design of Distance Learning
Realization for Modern Youth Promotion and Involvement in Independent Scientific Researches,
Advances in Intelligent Systems and Computing 512 (2017) 175-198. doi:
10.1007/978-3-31945991-2_12.
[74] M. Emmerich, V. Lytvyn, I. Yevseyeva, V. B. Fernandes, D. Dosyn, V. Vysotska, Preface: Modern
Machine Learning Technologies and Data Science, CEUR Workshop Proceedings Vol-2386
(2019).
[75] M. Emmerich, V. Lytvyn, V. Vysotska, V. Basto-Fernandes, V. Lytvynenko, Preface: Modern
Machine Learning Technologies and Data Science, CEUR Workshop Proceedings Vol-2631
(2020).
[76] M. Emmerich, V. Lytvyn, V. Vysotska, V. B. Fernandes, V. Lytvynenko, Preface: 3rd International
Workshop on Modern Machine Learning Technologies and Data Science, CEUR Workshop
Proceedings Vol-2917 (2021).
[77] P. S., Malachivskyy, Y. V. Pizyur, V. A. Andrunyk, Chebyshev Approximation by the Sum of the
Polynomial and Logarithmic Expression with Hermite Interpolation, Cybernetics and Systems
Analysis 54(5), (2018) 765-770. doi: 10.1007/s10559-018-0078-0.
[78] B. van Stein, H. Wang, W. Kowalczyk, M. Emmerich, T. Bäck, Cluster-based Kriging
approximation algorithms for complexity reduction, Applied Intelligence 50(3) (2020) 778–791.
doi: 10.1007/s10489-019-01549-7.
[79] H. Wang, M. Emmerich, B. Van Stein, T. Back, Time complexity reduction in efficient global
optimization using cluster kriging, in: Proceedings of the 2017 Genetic and Evolutionary
Computation Conference on GECCO, 2017, pp. 889–896. doi: 10.1145/3071178.3071321.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>