<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>L. Chyrun);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Determining the Suitability of Water for Human Consumption</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksiy Tverdokhlib</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Denis Shavaev</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yurii Matseliukh</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aleksandr Gozhyj</string-name>
          <email>alex.gozhyj@gmail.com</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Maria</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Trzaskowska</string-name>
          <email>atrzaskowska@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maksym Korobchynskyi</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lyubomyr Chyrun</string-name>
          <email>Lyubomyr.Chyrun@lnu.edu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irina Kalinina</string-name>
          <email>irina.kalinina1612@gmail.com</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Leiden-Lviv, The Netherlands-Ukraine.</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Gdansk University of Technology</institution>
          ,
          <addr-line>G. Narutowicza Street 11/12, 80-233</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ivan Franko National University of Lviv</institution>
          ,
          <addr-line>University Street, 1, Lviv, 79000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>S. Bandera Street, 12, Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Military Academy named after Eugene Bereznyak</institution>
          ,
          <addr-line>81 Y. Il'enka Str., Kyiv, 04050</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Petro Mohyla Black Sea National University</institution>
          ,
          <addr-line>Desantnykiv Street, 68, Mykolayiv, 54000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2045</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>The work establishes the main trends in determining the suitability of water for human consumption: the most common indicator of the acid-base balance of water is from 6 to 7, most of our data set are not suitable for drinking water, the most common indicator of the sulfate balance of water is from 300 to 350, the most common indicators of the carbon balance of water are within 12-15. The average and most popular value of the acid-alkaline balance of water is 7; the standard deviation from this parameter is insignificant, the indicators vary in the range of 0-14, and the sign of the acid-alkaline balance of water is quite stable. In this work, we constructed graphs in Cartesian and polar coordinate systems, derived quantitative characteristics of descriptive statistics, and formed histograms and cumulates. Investigating this problem, we used the main methods of visualization, graphic representation and primary statistical processing of numerical data. Methods of correlation analysis of experimental data presented by time sequences were also used in work. Analysis method, determining, suitability, water, human consumption, cluster analysis, information technologies, intelligent analysis, system analysis, exponential smoothing, median filtering, data processing MoMLeT+DS 2022: 4th International Workshop on Modern Machine Learning Technologies and Data Science, November, 25-26, 2022, ORCID: 0000-0002-7211-7370 (O.Tverdokhlib); 0000-0002-1707-1723 (D. Shavaev); 0000-0002-1721-7703 (Y. Matseliukh); 0000-00023517- 580X (A. Gozhyj); 0000-0002-0911-945X (A. M. Trzaskowska); 0000-0001-8049-4730 (M. Korobchynskyi); 0000-0002-9448-1751</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The problem of determining the suitability of water for human consumption belongs to the goals of
sustainable development and affects the development of human capital. The study of the impact of the
quality of life on the sustainable development of countries was carried out in their works [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1-4</xref>
        ]. Authors
[
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5-7</xref>
        ] substantiated the role of the state in the preservation of natural resources, scientists [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8-10</xref>
        ] studied
the importance of existing environmental protection systems [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. Also, well-known researchers
[
        <xref ref-type="bibr" rid="ref13 ref14">13-15</xref>
        ] have developed methods for assessing damages from environmental pollution and their impact
on the quality of life of the population. The volumes of water bodies and their quality affect their
consumption by humans [16-20]. Everyone, people use water in one form or another. Water is in food,
air and, accordingly, in substances. Nowhere without water. No matter how it sounds, a person is made
up of 70% water. Water ensures the body's normal functioning; therefore, any violation of the use of
      </p>
      <p>EMAIL</p>
      <p>Oleksiy_Tverdokhlib@gmail.com
(O.Tverdokhlib);</p>
      <p>Denis_Shavaev@gmail.com
(D.</p>
      <p>2022 Copyright for this paper by its authors.
water in the diet leads to inevitable consequences and even fatalities. And when there is a lot of water,
but it is of dubious quality (they usually do not drink it), people start water starvation; they cannot stand
and drink this water, and as a result, they get serious diseases of the digestive system, which in the
absence of normal medicine (for example, Africa or poor countries) leads to deaths [16-20]. Therefore,
it is very important to correctly determine whether this or that water from a certain place is suitable for
consumption. Our inputs are pH, Water Hardness, Solids, Chloramines, Sulphate, Conductivity,
Organic Carbon, Trihalomethanes, Turbidity, and Potable. From the point of view of analysis, if some
indicators exceed the norm too much, then even cleaning will not help here. And if the indicators are
within the acceptable range, then it makes sense to attract investments, social projects, etc. [21-30].
These indicators affect the required reagents for water purification, which, in turn, affects the amount
required to construct treatment facilities [31-42]. And if everything is normal, then why not inform the
residents that the water is suitable for drinking, or it will be enough to boil it so as not to get an infection?
And maybe this water is suitable for bottling in general; you need to remove the turbidity. Lack of water
is a tragedy, especially with climate warming. When the water evaporates, it is clean, and as a result the
percentage of pollution increases; add to these unscrupulous residents upstream who throw away
everything they can get their hands on and we get a large-scale collapse. Therefore, this topic is more
relevant than ever in the period of total pollution of the environment.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>The most common approaches to detecting and classifying water quality were found [16-53]. You
can start with works [16-20] applied a CNN-LSTM amalgam model to predict two water quality
variables, dissolved oxygen and chlorophyll. The results showed that the CNN-LSTM amalgam model
outperformed the separate CNN and LSTM models. Authors [16-42] compared statistical methods,
including Fuzzy Logic based on modern machine learning technology and different AI methods for
development of similar systems as a component of a smart city [54-65]. Inference (FLI) and WQI for
water quality assessment in the community of Ikare, Nigeria [35]. They identified moderate and poor
water quality using FLI and WQI methods, respectively. They also found that the FLI method was
superior to the WQI method because of the relationship between the measured and standard WQI values
[43-53]. To estimate dissolved oxygen in aquaculture, authors [43] proposed a synthetic model.
Although CNN-LSTM models and Sparse - autoencoder - LSTMs showed excellent performance
because they only predicted DO and chlorophyll, it can be difficult to deal with more water quality
variables using such models. In another study [44] authors applied Extra Tree Regression (ETR), which
combines multi-week studies to predict WQI values in Tsuen River, Hong Kong. They applied the ETR
method to ten water quality variables. The results showed that the ETR method achieved 98% prediction
accuracy, outperforming other state-of-the-art models such as support vector regression and decision
trees. A complete study on the application of methods for river water quality modelling was conducted
by authors [45], where they reviewed 51 articles published between 2000 and 2016. According to this
study, artificial neural networks and wavelet neural networks were the most widely used methods for
water quality prediction. In addition, scientists [46] developed an artificial neural network. For this
study, the most significant water quality parameters were found using spatial discriminant analysis
(SDA). But in another study [16] these studies can barely show an accuracy of 71%. In the work [37]
applied an artificial neural network to predict WQI in the Akaki River in Ethiopia. In this analysis, an
artificial neural network with eight hidden layers and 15 hidden neurons predicts WQI with more than
90% accuracy. Also, authors [47] applied an artificial neural network with one hidden layer to predict
the sustainability of water quality in São Paulo, Brazil. Applying neural networks for WQI prediction
requires a large amount of water quality data, which is expensive and time-consuming. Researchers
[41] applied a decision tree to classify water quality status in the Klang River, Malaysia. They
considered three scenarios where; they used six water quality variables in the first scenario. They then
removed water quality parameters such as NH3-N, pH, and SS during each procedure to evaluate the
ability of the decision tree algorithm in different situations. They achieved classification accuracies of
84.09%, 81.82%, and 77.27% in each scenario, which are higher than the 75% classification accuracy
comparison [39-41]. This study used 22 water quality samples, making the model computationally
expensive.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>To solve the problems to be considered in this paper, we will use several standard methods, such as:
1. The moving average method [66, 67]. This method estimates the average level for a certain
period. The longer the time interval to which the average belongs, the more smoothly the level
will be smoothed, but the less accurately the trend of the original dynamics series will be
described [68-70]. The moving average method is the simplest way of smoothing empirical
curves. The essence of this method consists of replacing the indicator's actual values with their
averaged values, which have a much smaller variation than the original levels of the series.</p>
      <p>Moving averages calculated for odd and even numbers of time intervals are distinguished depending
on the averaging period [71, 72]. A more complex calculation scheme is used in cases where an even
number of elements determines the moving average. The following algorithm is used for calculation.</p>
      <p>First, it is necessary to determine the length of the smoothing interval l, which includes l consecutive
levels of the series (l &lt; n) [73-75]. At the same time, it should be taken into account that the wider the
smoothing interval, the greater the mutual fluctuations, and the trend of development has a smoother,
smoother character. The stronger the oscillation, the wider the smoothing interval should be. Next, it is
necessary to break down the entire period of observation at the site while the smoothing interval, as it
were, slides along the row with a step equal to l. Calculate the arithmetic mean of the levels of the series
forming each section. Replace the actual values of the row in the centre of each plot with the
corresponding average values. The algorithm for calculating a simple moving average is as follows
[7679]. The definition of the moving average in the case of an even number of levels in the moving interval
is complicated by the fact that then the average should be attributed only to the middle between two
moments located in the middle of the smoothing interval and at such a moment no observations were
made. If the graphic representation of the time series resembles a straight line, then the moving average
does not distort the dynamics of the studied phenomenon.</p>
      <p>2. Weighted moving average method [66-70, 73, 80] A more subtle technique, based on the same
idea as simple moving averages, is to use weighted moving averages. If, when applying a simple
moving average, all levels of the series are recognized as equal, then when calculating the
weighted average, each level within the smoothing interval is assigned a weight that depends on
the distance measured from the given level to the middle of the smoothing interval. When
building a weighted moving average on each active site, the value of the main level is replaced
by the calculated one, calculated according to the formula of the weighted arithmetic average.
In other words, a weighted moving average differs from a simple moving average because the
levels included in the averaging interval are summed with different weights. A simple moving
average takes into account all the series levels included in the smoothing interval with equal
weights, and the weighted average assigns to each level a weight that depends on the distance
of the given level to the level standing in the middle of this interval [66-70, 73, 81-82]. This is
because for a simple moving average in the smoothing interval, calculations are performed based
on a straight line - a polynomial of the first order, and for smoothing with a weighted moving
average, polynomials of higher orders, preferably of the second or third order, are used.
Therefore, the simple moving average method is possibly considered a special case of the
weighted moving average method [66-72]. The calculation of the moving average is presented
as a simple and safe operation with a completely clear meaning. However, this operation
transforms the dynamic series to a greater extent than it seems at first glance. So, if the levels of
the series were independent before the smoothing, then after this transformation, the successive
calculated levels (within the smoothing interval) are somewhat dependent on each other. Indeed,
each level of the smoothed series has a common part with several previous and subsequent
members. The algorithm of smoothing with a weighted moving average with the size of the
"window" - the smoothing interval w = 2k + 1, which is successively shifted along the series
levels and averages the levels covered by it. The formula for calculation [66-72, 83-85]:
3. Correlation field [67, 86-88]. A correlation field is a graph that establishes a relationship
between variables, where X of each corresponds to the abscissa value and Y to the ordinate value
of a specific unit of observation. The number of points on the graph corresponds to the number
of observation units. The placement of points shows the presence and direction of
communication. To build a correlation field, you usually need to take the following steps: choose
two variables that change over time. Then the value of the dependent variable is measured. As
a result, the result is entered in the table. Then a coordinate grid is built, the value of the
independent variable is indicated on the X axis, and the dependent variable is indicated on the
Y axis. After that, you need to mark the points of the correlation field. On the X-axis for the first
value of the independent variable, mark the point on the Y-axis corresponding to the value of
the dependent variable. The obtained result is called the correlation field. Next, it is necessary
to analyze the schedule and form a conclusion[67, 86-89].</p>
      <p>a. Correlation coefficient.
b. Correlation relationship.
c. Correlation matrix.</p>
      <p>
        d. Autocorrelation.
4. Cluster analysis is one of the methods of multivariate statistical analysis; that is, each
observation is represented not by a single indicator but by a set of values of various indicators
[
        <xref ref-type="bibr" rid="ref5">5, 86, 91-99</xref>
        ]. It includes algorithms with the help of which the clusters' formation and the
distribution of objects by clusters are carried out. Cluster analysis, first of all, solves the problem
of adding structure to the data and also ensures the selection of groups of objects, that is, looks
for the division of the population into areas of accumulation of objects. Cluster analysis allows
you to consider fairly significant volumes of data, sharply shorten and compress them, make
them compact
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
    </sec>
    <sec id="sec-5">
      <title>4.1. Analysis of existing software products</title>
      <p>To begin with, we downloaded the dataset [89] and began familiarization with it.</p>
      <p>Fig.1 is what the original dataset looks like in Excel (our dataset are pH, Water hardness, Solids,
Chloramines, Sulfate, Conductivity, Organic carbon, Trihalomethanes, Turbidity, and Potable):</p>
      <p>pH is an important parameter for assessing the acid-alkaline balance of water. Water hardness is
mainly due to calcium and magnesium salts [90]. These salts dissolve from geological deposits through
which water moves. Solids - a wide range of inorganic and organic minerals or salts, such as potassium,
calcium, sodium, bicarbonates, chlorides, magnesium, sulfates, etc., can dissolve in water. This is an
important parameter for water use. Chloramines are the main disinfectants used in public water systems.
Sulfates are naturally occurring substances found in minerals, soil and rocks. They are present in
atmospheric air, underground water, plants and food products. Conductivity: Pure water is not a good
conductor of electricity but a good insulator [90]. An increase in ion concentration increases the
electrical conductivity of water. Total organic carbon in source waters comes from decaying natural
organic matter and synthetic sources. Trihalomethanes are chemicals found in chlorinated water. The
turbidity of water depends on the number of suspended solids. Potable indicates whether the water is
safe for human consumption, where one means potable and 0 means non-potable [90]. Next, we loaded
our dataset into the RStudio development environment:
water &lt;- read.csv( file ='D:/water_potability.csv')</p>
      <sec id="sec-5-1">
        <title>Present a graphical presentation of the dataset.</title>
        <p>For visualization, we will use the ggplot2 library, which allows you to build beautiful graphs. First,
install the library:
plot2 &lt;- ggplot ( water , aes (x = ph , fill = Organic_carbon )) + geom_histogram ( binwidth = 15, boundary = -7.5) +
coord_polar () + scale_x_continuous ( limits = c(0,360)) plot2 + labs ( title = " Water quality ", x = " ph ", y = "
Organic_carbon ")</p>
        <p>Figure 4 shows the dependence of ph on organic_carbon in polar coordinates.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Water acidity to sulfate content:</title>
        <p>water_sorted_ph &lt;- water [ order ( water$ph ), ]
plot1 &lt;- ggplot () + geom_line ( aes (y = ph , x = Sulfate ), data = water_sorted_ph ) plot1 + labs ( title = " Water quality ", x =
" ph ", y = " Sulfate ")</p>
        <p>A histogram is a way of graphically presenting tabular data and their distribution. A histogram can
be created using the hist () function in the R programming language. This function accepts a vector of
values for which the histogram is constructed.</p>
        <p>This graph shows the dependence of ph (acidity) on solids. You can see that most of the data ranges
from 15000 to 30000 for ph and 5 to 10 for solids. It can be concluded that most of the water from this
dataset is not of the best quality, and in some places, it is very toxic.</p>
        <p>Fig. 6. ph indicator</p>
        <p>Program code for constructing a histogram of water acidity:
library (ggplot2)
hist ( water$ph , main =" Ph histogram ", xlab =" Ph ", col =" blue ")</p>
        <p>Similarly, the program code for building a histogram of water hardness:
hist ( water$Hardness , main =" Hardness histogram ", xlab =" Hardness ", col =" blue ")</p>
        <p>It can be concluded that most of the water is not suitable for consumption because the indicators are
too high. This histogram shows that the largest number of cases is in the interval 6-8, with about 500
cases in the interval 6-7. From Fig. 7, it can be seen that 1200 (60% of the entire sample) cases are
unsuitable for use, and 800 are suitable.</p>
        <p>PerformanceAnalytics is a package of econometric functions for analyzing the performance and
risks of financial instruments or portfolios. Let's try to determine some parameters of the pH indicator:</p>
        <p>Arithmetic means - the average value of the sample. Let's use the mean () method :
library ( PerformanceAnalytics ) #Arithmetic mean seredne &lt;- mean ( water$ph )</p>
        <p>The median is the number that divides the set of sample numbers in half. Him median () method :
#median
median &lt;- median ( water$ph )
#asymmetry
asumetrychnist &lt;- skewness ( water$ph )</p>
        <p>The interval is the difference between the minimum and maximum value of the sample:
#interval
interval &lt;-( max ( water$ph ) - min ( water$ph ))</p>
        <p>Minimum - the smallest value of the sample
#minimum
minimum &lt;- min ( water$ph )</p>
        <p>Maximum - the largest sample value:
#maxymum maxsymum &lt;- max ( water$ph )</p>
        <p>Sum of all sample values:
#sum suma &lt; - sum ( water$ph )</p>
        <p>Total number of columns with data:
#sample size
Nradkiw &lt;- nrow ( water )</p>
        <p>The coefficient of variation is an indicator that determines the percentage ratio of the average
deviation to the average value:
#coefficient of variation
coef_variacii &lt;-( sd (( water$ph )) / mean (( water$ph )) * 100)</p>
        <p>Cumulants are a representation of the distribution in the form of a curve, the ordinates of which are
proportional to the accumulated frequencies of the variation series. To make a series of accumulated
frequencies, you need to add the frequency of the second class to the frequency of the first, smallest
class, then add the frequency of the third class, etc.</p>
        <p>Cumulative sometimes have an advantage over the variation curve.
ph = water$ph breaks = seq (0, 14, by =0.1) ph.cut = cut ( ph , breaks , right =FALSE) ph.freq = table ( ph.cut ) cumfreq0 =
c(0, cumsum ( ph.freq ))
plot ( breaks , cumfreq0, main =" ph ", xlab =" ph ", ylab =" Number")
lines ( breaks , cumfreq0)
Hardness = water$Hardness breaks = seq (74, 315, by =1)
Hardness.cut = cut ( Hardness , breaks , right =FALSE) Hardness.freq = table ( Hardness.cut ) cumfreq0 = c(0, cumsum
(Hardness.freq ))
plot ( breaks , cumfreq0, main =" Hardnes ", xlab =" Hardness ", ylab =" Number") lines ( breaks , cumfreq0)</p>
        <p>A cumulant is a continuous curve graphically depicted in a coordinate system, where the value of
the characters or the limits of its intervals is indicated on the abscissa axis, and the increasing sum of
frequencies is indicated on the ordinate axis.</p>
        <p>Fig. 9. Cumulative indicators of ph</p>
        <p>Having analyzed the created cumulates, we can conclude that all indicators have a sharp increase in
pollution, which correlates with the increase in the numerical value of the parameters, that is, more
water with higher indicators.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Results and discussion</title>
      <p>Smoothing methods reduce the influence of the random component (random fluctuations) in time
series [66-90]. They make it possible to obtain more "pure" values, which consist only of deterministic
components. Some methods aim to highlight only some components, for example, a trend. We will
perform smoothing using different methods. We will use the following libraries:
• library (tidyverse);
• library (lubridate);
• library (fpp2);
• library (zoo);
• library (pastecs);
• library (TTR).</p>
      <p>We import and number the data:
water &lt;- read.csv( file ='D:/water_potability1.csv') id &lt;- c(1:3276) water &lt;- cbind ( id , water )</p>
      <p>1. The moving average method [67]. We will use Kendel's formulas for smoothing according to the
moving average. The method is often used for statistical evaluation in statistical hypothesis testing to
determine whether two variables can be considered statistically dependent. Under the null hypothesis
of independence of X and Y, the sampling distribution τ has an expected value of zero. The exact
distribution cannot be characterized in terms of joint distributions but can be calculated for small
samples; for larger samples, it is common to use the approximation for a normal distribution with a
mathematical expectation equal to zero and a random variable variance. We will smooth our data by
the following sizes of the smoothing interval w = 3, 5, 7, 9, 11, 13, 15 to obtain seven bars using the
rollmean () function:
ma &lt;- water %&gt;% select ( id , Hardness ) %&gt;% mutate (ma1 = rollmean ( Hardness , k = 3, fill = NA), ma2 = rollmean (
Hardness , k = 5, fill = NA), ma3 = rollmean ( Hardness , k = 7, fill = NA), ma4 = rollmean ( Hardness , k = 9, fill = NA), ma5 =
rollmean ( Hardness , k = 11, fill = NA), ma6 = rollmean ( Hardness , k = 13, fill = NA), ma7 = rollmean ( Hardness , k = 15, fill
= NA))</p>
      <p>Next, we visualize the data:
ma %&gt;%
gather ( metric , Hardness , Hardness:ma7) %&gt;% ggplot ( aes ( id , Hardness , color = metric )) + geom_line ()
ma1 = rollmean ( water$Hardness , k = 3) ma2 = rollmean ( water$Hardness , k = 5) ma3 = rollmean ( water$Hardness , k =
7) ma4 = rollmean ( water$Hardness , k = 9) ma5 = rollmean ( water$Hardness , k = 11) ma6 = rollmean ( water$Hardness ,
k = 13) ma7 = rollmean ( water$Hardness , k = 15)</p>
      <p>Search for turning points:
tp1 &lt;- turnpoints (ma1) summary (tp1) tp2 &lt;- turnpoints (ma2) summary (tp2) tp3 &lt;- turnpoints (ma3) summary (tp3) tp4
&lt;- turnpoints (ma4) summary (tp4) tp5 &lt; - turnpoints (ma5) summary (tp5) tp6 &lt;- turnpoints (ma6) summary (tp6) tp7
&lt;turnpoints (ma7) summary (tp7)</p>
      <p>Visualization of turning points for 7 distribution:
plot (tp7) plot (ma7, type = "l") lines (tp7)</p>
      <p>We are looking for correlation coefficients of the smoothed values with the original ones, taking into
account that with each smoothing, subtract the columns:
cor ( water$Hardness [2:3275],ma1) cor ( water$Hardness [3:3274],ma2) cor ( water$Hardness [4:3273],ma3) cor (
water$Hardness [5:3272], ma4) cor ( water$Hardness [6:3271],ma5) cor ( water$Hardness [7:3270],ma6) cor (
water$Hardness [8:3269],ma7)</p>
      <p>We smooth the data using the size of the smoothing interval w = 3, then we smooth the obtained
smoothed data again, but use the size of the smoothing interval w = 5. Continue the smoothing of the
received data with the smoothing interval w = 7 and so on until w = 15. We should get seven columns
in a row:
maRecursive &lt;- water %&gt;% select ( id , Hardness ) %&gt;% mutate (ma1 = rollmean ( Hardness , k = 3, fill = NA), ma2 =
rollmean (ma1, k = 5, fill = NA), ma3 = rollmean (ma2, k = 7, fill = NA), ma4 = rollmean (ma3, k = 9, fill = NA), ma5 =
rollmean (ma4, k = 11, fill = NA), ma6 = rollmean (ma5, k = 13, fill = NA), ma7 = rollmean (ma6, k = 15, fill = NA))</p>
      <p>We smooth the data using the sizes of the smoothing interval w = 3, 5, 7, 9, 11, 13, 15 to obtain
seven columns. In order to build a moving average. we took as parasetters hardness and id of each water
record. You can see 7 columns based on given intervals.</p>
      <sec id="sec-6-1">
        <title>Visualization of smoothing:</title>
        <p>maRecursive %&gt;% gather ( metric , Hardness , Hardness:ma7) %&gt;% ggplot ( aes ( id , Hardness , color = metric )) +
geom_line () maR1 = maRecursive$ma1[!is.na(maRecursive$ma1)] maR2 = maRecursive$ma2[!is.na(maRecursive$ma2)]
maR3 = maRecursive$ma3[!is.na(maRecursive$ma3)] maR4 = maRecursive$ma4[!is.na(maRecursive$ma4)] maR5 =
maRecursive$ma5[!is.na(maRecursive$ma5)] maR6 = maRecursive$ma6[!is.na(maRecursive$ma6)] maR7 =
maRecursive$ma7[!is.na(maRecursive$ma7)]</p>
        <p>Search for turning points:
tpR1 &lt;- turnpoints (maR1) summary (tpR1) tpR2 &lt;- turnpoints (maR2) summary (tpR2) tpR3 &lt;- turnpoints (maR3) summary
(tpR3) tpR4 &lt;- turnpoints (maR4) summary (tpR4) tpR5 &lt; - turnpoints (maR5) summary (tpR5) tpR6 &lt;- turnpoints (maR6)
summary (tpR6) tpR7 &lt;- turnpoints (maR7) summary (tpR7)
Visualization of turning points: plot (tpR7) plot (maR7, type = "l") lines (tpR7)</p>
        <p>We are looking for correlation coefficients of the smoothed values with the original ones, taking into
account that with each smoothing subtract the columns:
cor ( water$Hardness [2:3275],maR1) cor ( water$Hardness [4:3273],maR2) cor ( water$Hardness [7:3270],maR3) cor (
water$Hardness [11:3266], maR4) cor ( water$Hardness [16:3261],maR5) cor ( water$Hardness [22:3255],maR6) cor (
water$Hardness [29:3248],maR7)</p>
        <p>From this graph, you can see the hardness parameter fluctuations over the entire interval. The main
thing here is hardness and ma7. we see that there is a certain trend here. It's hard to see from the graph,
but the end result is a more smooth description of the data.</p>
        <p>The turning points are quite numerous and detailed smoothing interval increases, the correlation
coefficient decreases, because the data is increasingly modified.</p>
        <p>We smooth the data using the size of the smoothing interval w = 3; then we smooth the obtained
smoothed data again using the size of the smoothing interval w = 5. We continue the smoothing of the
received data with the smoothing interval w = 7 and so on until w = 15.</p>
      </sec>
      <sec id="sec-6-2">
        <title>It can be seen that we lost more rows and got less accurate data.</title>
        <p>Fig. 17. Turning points at the smoothing interval w = 15
Fig. 19. Correlation coefficients between smoothed and original data</p>
        <p>The correlation coefficients also differ, but not much, so the relationship with the raw data remains
approximately the same.</p>
        <p>2. Median smoothing [67]. The content of the time series's median smoothing algorithm consists of
the median's defined values for the smoothing interval levels. Next, the time series level value
corresponding to the middle of the smoothing interval is replaced by the median value. Median
smoothing completely removes single extreme or anomalous values of levels that are separated from
each other by at least half of the smoothing interval; preserves sharp changes in the trend (moving
average and exponential smoothing smooth them); effectively removes single levels with very large or
very small values that are random and stand out sharply from other levels. We smooth the data using
the sizes of the smoothing interval w = 3, 5, 7, 9, 11, 13, 15 to obtain seven columns using the runmed()
function:
ms &lt;- water %&gt;% select ( id , Hardness ) %&gt;% mutate (ms1 = runmed ( Hardness , 3), ms2 = runmed ( Hardness , 5), ms3 =
runmed ( Hardness , 7), ms4 = runmed ( Hardness, 9), ms5 = runmed (Hardness, 11), ms6 = runmed (Hardness, 13), ms7 =
runmed (Hardness , 15))</p>
        <p>We used the same smoothing intervals and operations as in the previous point.</p>
        <p>Visualization of smoothing:
ms %&gt;%
gather ( metric , Hardness , Hardness:ms7) %&gt;% ggplot ( aes ( id , Hardness , color = metric )) + geom_line () Turnpoints
search: tp1 &lt;- turnpoints (ms$ms1) summary (tp1) tp2 &lt;- turnpoints (ms$ms2) summary (tp2) tp3 &lt;- turnpoints (ms$ms3)
summary (tp3) tp4 &lt;- turnpoints (ms$ms4) summary (tp4) tp5 &lt;- turnpoints (ms$ms5) summary (tp5)
tp6 &lt;- turnpoints (ms$ms6) summary (tp6) tp7 &lt;- turnpoints (ms$ms7) summary (tp7)</p>
        <p>Visualization of turning points:</p>
        <p>plot (tp7) plot (ms$ms7, type = "l") lines (tp7)</p>
        <p>Now let's find the turning points for the last smoothing with step 15:</p>
      </sec>
      <sec id="sec-6-3">
        <title>Correlation coefficients of smoothed values with original ones:</title>
        <p>cor (water$Hardness,ms$ms1) cor (water$Hardness,ms$ms2) cor (water$Hardness,ms$ms3) cor
(water$Hardness,ms$ms4) cor (water$Hardness,ms$ms5) cor (water$Hardness,ms$ms6) cor (water$Hardness,ms$ms7)</p>
        <p>We smooth the data using the size of the smoothing interval w = 3, then we smooth the obtained
smoothed data again, but use the size of the smoothing interval w = 5. Continue the smoothing of the
received data with the smoothing interval w = 7 and so on until w = 15. We should get seven columns
in a row:
msR &lt;- water %&gt;% select ( id , Hardness ) %&gt;% mutate (ms1 = runmed ( Hardness , 3), ms2 = runmed (ms1, 5), ms3 =
runmed (ms2, 7), ms4 = runmed (ms3 , 9), ms5 = runmed (ms4, 11), ms6 = runmed (ms5, 13), ms7 = runmed (ms6, 15))</p>
        <p>Visualization of smoothing:
msR %&gt;%
gather ( metric , Hardness , Hardness:ms7) %&gt;% ggplot ( aes ( id , Hardness , color = metric )) + geom_line () Turnpoints
search: tp1 &lt;- turnpoints (msR$ms1) summary (tp1) tp2 &lt;- turnpoints (msR$ms2) summary (tp2) tp3 &lt;- turnpoints
(msR$ms3) summary (tp3)
tp4 &lt;- turnpoints (msR$ms4) summary (tp4) tp5 &lt;- turnpoints (msR$ms5) summary (tp5) tp6 &lt;- turnpoints (msR$ms6)
summary (tp6) tp7 &lt;- turnpoints (msR$ms7) summary ( tp7)</p>
        <p>Visualization of turning points:
plot (tp7) plot (msR$ms7, type = "l") lines (tp7)</p>
        <p>Correlation coefficients of smoothed values with original ones:
cor (water$Hardness,msR$ms1) cor (water$Hardness,msR$ms2) cor (water$Hardness,msR$ms3) cor
(water$Hardness,msR$ms4) cor (water$Hardness,msR$ms5) cor (water$Hardness,msR$ms6) cor
(water$Hardness,msR$ms7)</p>
        <p>The graph looks exactly like this because the data has acquired a complete form.</p>
        <p>The correlation coefficient is smaller than the data of the previous methods, which means that this
method is not quite suitable for the given dataset because it reduces its reliability.</p>
        <p>Correlation analysis [66-80] is a group of methods that allow detecting the presence and degree of
relationship between several randomly changing parameters. Special numerical characteristics and their
statistics assess the degree of such a relationship. The correlation appears in the form of a tendency to
change the average values of the function depending on changes in the argument. ggpubr library - it is
a library for data visualization in R. We build a correlation field:
library ( ggpubr )
plot ( water$ph , water$Solids , main =" Correlation field ", xlab =" Age ",
ylab = " Cholesterol ")</p>
        <p>From the graphically presented field, it can be concluded that the indicators correlate quite strongly
[55].</p>
      </sec>
      <sec id="sec-6-4">
        <title>We determine the correlation coefficient:</title>
        <p>correlation &lt;- cor ( water$ph , water$Solids )</p>
        <p>Using the ggscatter method of the ggrubr library , we calculate correlation relation:
qwe &lt;- ggscatter ( water , x = " ph ", y = " Solids ", add = " reg.line ", conf.int = TRUE, cor.coef = TRUE, cor.method = "
person ", xlab = " ph ", ylab = " Solids ")</p>
        <p>We divide the data into 3 parts:
ph1 &lt;- water$ph [1:1092] ph2 &lt;- water$ph [1093:2184] ph3 &lt;- water$ph [2185:3276]
For parts, we build a correlation matrix ( rcorr ): mydata.rcorr = rcorr ( as.matrix ( cbind (ph1, ph2, ph3)))</p>
        <p>We find multiple correlation coefficients:
numericData &lt;- cbind ( water$id,water$ph , water$Hardness , water$Solids , water$Chloramines ,
water$Sulfate,water$Conductivity,water$Organic_carbon,water$Trihalome thanes,water$Turbidity ) chart.Correlation (
numericData , histogram =TRUE, pch =19)</p>
        <p>Let's plot graphs of autocorrelation functions using acf :
data &lt;- cbind ( water$ph , water$Solids ) colnames ( data ) &lt;- c(" ph ", " Solids ")
autocorrelation &lt;- acf ( data , lag.max = 1, type = c(" correlation "),
plot = TRUE, xlab =" ph ", ylab =" Solids ")</p>
        <p>
          The matrix displays all the coefficients and even graphically displays the relationships. Multiple
correlation coefficients show that the dataset has weak but present relationships, based on which results
can be constructed. Cluster analysis is one of the methods of multivariate statistical analysis; that is,
each observation is represented not by a single indicator but by a set of values of various indicators [
          <xref ref-type="bibr" rid="ref5">5,
86, 91-99</xref>
          ]. It includes algorithms with the help of which the clusters' formation and the distribution of
objects by clusters are carried out. Cluster analysis, first of all, solves the problem of adding structure
to the data and also ensures the selection of groups of objects, that is, looks for the division of the
population into areas of accumulation of objects. Cluster analysis allows you to consider fairly
significant volumes of data, sharply shorten and compress them, make them compact.
Fig. 26. Graphic representation of cluster analysis
        </p>
        <p>Because we use the RStudio environment and the R language to perform the laboratory work in
order to build clusters, it is not necessary to form an "object-property" table from the provided data, to
form from the closely located "original table" and "table-copy", to build a proximity matrix and the like.
We can immediately perform the cluster analysis procedure.</p>
        <p>Performing a cluster analysis procedure using built-in R methods:</p>
        <p>Let's select the parameters MaxHR, Cholesterol and ChestPainType and build a graphical
representation of the clustering: factoextra - The library provides some easy-to-use functions to extract
and visualize the results of multivariate data analysis.
library (ggplot2) library ( factoextra ) library ( rEMM ) ggplot ( water , aes ( ph , Solids , col = Hardness )) + geom_point ()
Let's build the clustering matrix: set.seed (55) cluster &lt;- kmeans ( cbind ( water$ ph , water$Solids ), 3, nstart = 10) cluster
table ( cluster$cluster,water$Hardness )
build a dendrogram : data &lt;- cbind ( water$ph , water$Solids )</p>
        <p>data.hclust =hclust(dist(scale(data,center=apply(data,2,mean),scale=apply(data,2,sd)))) plot ( data.hclust )
We chose the parameters Solids, Hardness and built a graphical representation of the clustering:</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusions</title>
      <p>The work establishes the main trends in determining the suitability of water for human consumption:
the most common indicator of the acid-base balance of water is from 6 to 7, most of our data set are not
suitable for drinking water, the most common indicator of the sulfate balance of water is from 300 to
350, the most common indicators of the carbon balance of water are within 12-15. The average and
most popular value of the acid-alkaline balance of water is 7; the standard deviation from this parameter
is insignificant, the indicators vary in the range of 0-14, and the sign of the acid-alkaline balance of
water is quite stable. In this work, we constructed graphs in Cartesian and polar coordinate systems,
derived quantitative characteristics of descriptive statistics, and formed histograms and cumulates.
Investigating this problem, we used the main methods of visualization, graphic representation and
primary statistical processing of numerical data. Methods of correlation analysis of experimental data
presented by time sequences were also used in work.</p>
      <p>The most common indicator values determined by histograms:
• The most common indicator of the acid-alkaline balance of water is from 6 to 7;
• Most of our data set are non-potable water;
• The most common indicator of the sulfate balance of water is from 300 to 350;
• The most common indicators of the carbon balance of water are in the range of 12-15.</p>
      <p>As can be seen from the histogram in Fig. 33, most of the studied water from our dataset is unsuitable
for consumption (more than 1200 records).</p>
      <p>The results of the descriptive statistics of the level of acidity are the following data:
• Average is 7.08599; • Asymmetry is 0.04891027;
• The standard error is 0.035; • Interval is 13.7725;
• Median is 7.027297; • The minimum is 0.23;
• Fashion is 8.316766; • The maximum is 14;
• The standard deviation is 1.573337; • The amount is 14249.93;
• Sample variance is 2.474157; • Volume (quantity) is 2011;
• Skewness is 0.6185764; • Coefficient of variation is 22.2%.</p>
      <p>After finding some statistical data for the water acidity level, we saw that this level ranges from 5 to
9. The level of acidity should be in the range of 6.5 - 8.5. We see an average value of 7, which is within
these limits; the standard error is relatively small. The median also falls within these limits.</p>
      <p>We see a minimum of 0.23, which is completely abnormal and can almost be equated to car battery
acid, and a maximum of 14, which can be equated to soapy water. The difference between the maximum
and the minimum is the indicator - the interval, which in our case is 13.7725. Consider the indicator
kurtosis. For a normal distribution, the kurtosis is zero. If the kurtosis of some distribution is different
from zero, then this distribution's density curve differs from the normal distribution's density curve.
Since our kurtosis is positive, the theoretical curve has a higher and "sharper" peak than the normal
curve. Otherwise, this curve would have a theoretically lower and flatter peak than the normal curve.</p>
      <p>The value of the variation parameter can provide interesting information - this is the difference in
the numerical values of the characteristics of the population units and their fluctuations around the
average value that characterizes the population. The smaller the variation, the more homogeneous the
population and the more reliable (typical) the average value. If the variation percentage is lower than
33%, then the data set is quantitatively homogeneous, which corresponds to our result of 22.2%. You
can also form certain facts based on our results:
• The average and most popular value of the acid-alkaline balance of water is 7;
• The standard deviation from this parameter is insignificant;
• Indicators range from 0.23 to 14;
• The sign that the acid-alkaline balance of water is quite stable.</p>
    </sec>
    <sec id="sec-8">
      <title>7. References</title>
      <p>M. S. I. Khan, N. Islam, J. Uddin, S. Islam, M. K. Nasir, Water quality prediction and
classification based on principal component regression and gradient boosting classifier approach,
Journal of King Saud University-Computer and Information Sciences 34(8) (2022) 4773-4781.
https://doi.org/10.1016/j.jksuci.2021.06.003.</p>
      <p>T. H. H Aldhyani, M. Al-Yaari, H. Alkahtani, M. Maashi, Water Quality Prediction Using
Artificial Intelligence Algorithms, Applied Bionics and Biomechanics 2020, Article ID 6659314,
12 p., 2020. https://doi.org/10.1155/2020/6659314.</p>
      <p>S. Chatterjee, S. Sarkar, N. Dey, S. Sen, T. Goto, N. C. Debnath, Water quality prediction:
Multi objective genetic algorithm coupled artificial neural network based approach, in: Int. Conf.
on Industrial Informatics, 2017, pp. 963-968. https://ieeexplore.ieee.org/document/8104902.</p>
      <p>Water quality describes the condition of the water, including chemical, physical, and biological
characteristics, usually with respect to its suitability for a particular purpose such as drinking or
swimming. URL: https://floridakeys.noaa.gov/ocean/waterquality.html.</p>
      <p>Importance of Water Quality and Testing. URL:
https://www.cdc.gov/healthywater/drinking/public/water_quality.html.</p>
      <p>A. N. Ahmed, et al. Machine learning methods for better water quality prediction, Journal of
Hydrology 578 (2019) 124084.</p>
      <p>Y. Chen, L. Song, Y. Liu, L. Yang, D. Li, A review of the artificial neural network models for
water quality prediction. Applied Sciences 10(17) (2020) 5776.</p>
      <p>A. Gozhyj, I. Kalinina, V. Gozhyj, Fuzzy cognitive analysis and modeling of water quality, in:
International Conference on Intelligent Data Acquisition and Advanced Computing Systems:
Technology and Applications (IDAACS), 2017, pp. 289-294.</p>
      <p>M. Linan, B. Gerardo, R. Medina, Self-Organizing Map with Nguyen-Widrow Initialization
Algorithm for Groundwater Vulnerability Assessment, International Journal of Computing 19(1)
(2020) 63-69.</p>
      <p>D.K. Mozgovoy, V.V. Hnatushenko, V.V. Vasyliev, Automated recognition of vegetation and
water bodies on theterritory of megacities in satellite images of visible and IR bands, ISPRS Ann.
Photogramm. Remote Sens. Spatial Inf. Sci. IV-3 (2018) 167–172,
https://doi.org/10.5194/isprsannals-IV-3-167-2018.</p>
      <p>W. Wójcik, et. al., Hydroecological investigations of water objects located on urban areas, in:
Environmental Engineering V – Proceedings of the 5th National Congress of Environmental
Engineering, 2017, pp. 155–160.</p>
      <p>R.Ya. Kosarevich, et. al., Assessment of damages caused by thermal fatigue cracks in water
economizer collector, Fiziko-Khimicheskaya Mekhanika Materialov 40(1) (2004) 109–115.</p>
      <p>O. Alokhina, et. al., Solar Activity and Water Content of Closed Lake Ecosystems, in: General
Assembly and Scientific Symposium of the International Union of Radio Science, 2020, 9232274.</p>
      <p>N. Anufrieva, Y. Obukh, B. Rusyn, I. Fartushok, Expert computer system for technical
diagnostics of the efficiency of main constitutive elements of the water steam route, in: The
Experience of Designing and Application of CAD Systems in Microelectronics - Proceedings of
the 9th International Conference, CADSM, 2007, pp. 206.</p>
      <p>N. Anufrieva, Y. Obukh, B. Rusyn, I. Fartushok, Typical damage image database of the main
constitutive elements of the water steam route, in: The Experience of Designing and Application of
CAD Systems in Microelectronics - Proceedings of the International Conference, 2007, pp. 518.</p>
      <p>R Elmahdi. Predicting Water Quality Variables. URL:
https://scholar.sun.ac.za/bitstream/handle/10019.1/108072/elmahdi_predicting_2020.pdf?sequenc
e=2&amp;isAllowed=y.</p>
      <p>V. Sagan, K. T. Peterson, M. Maimaitijiang, P. Sidike, J. Sloan, B. A Greeling, S. Maalouf, C.
Adams, Monitoring inland water quality using remote sensing: potential and limitations of spectral
indices, bio-optical simulations, machine learning, and cloud computing. URL:
https://www.sciencedirect.com/science/article/abs/pii/S0012825220302336.</p>
      <p>T. S. Kapalanga, Z. Hoko, W. Gumindoga, L. Chikwiramakomo, Remote-sensing-based
algorithms for water quality monitoring in Olushandja Dam, north-central Namibia, Water Supply
21(5) (2021) 1878-1894.</p>
      <p>Y. F. Zhang, P. J. Thorburn, M. P. Vilas, P. Fitch, Machine learning approaches to improve and
predict water quality data, in: International Congress on Modelling and Simulation-Supporting
Evidence-Based Decision Making: the Role of Modelling and Simulation, MODSIM 2019.</p>
      <p>J. O., Oladipo, A. S., Akinwumiju, O. S., Aboyeji, A. A. Adelodun, Comparison between fuzzy
logic and water quality index methods: A case of water quality assessment in Ikare community,
Southwestern Nigeria, Environmental Challenges 3 (2021) 100038.</p>
      <p>O. S. Aboyeji, S. F. Eigbokhan, Evaluations of groundwater contamination by leachates around
Olusosun open dumpsite in Lagos metropolis, southwest Nigeria, Journal of environmental
management 183 (2016) 333-341.</p>
      <p>M. Yilma, Z. Kiflie, A. Windsperger, N. Gessese, Application of artificial neural network in
water quality index prediction: a case study in Little Akaki River, Addis Ababa, Ethiopia, Modeling
Earth Systems and Environment 4(1) (2018) 175-187.</p>
      <p>D. M. Bushero, Z. A. Angello, B. M. Behailu, Evaluation of hydrochemistry and identification
of pollution hotspots of little Akaki river using integrated water quality index and GIS,
Environmental Challenges 8 (2022) 100587.</p>
      <p>M. F. M. Nasir, M. S. Samsudin, I. Mohamad, M. R. A. Awaluddin, M. A. Mansor, H. Juahir,
N. Ramli, River water quality modeling using combined principle component analysis (PCA) and
multiple linear regressions (MLR): a case study at Klang River, Malaysia, World Applied Sciences
Journal 14 (2011) 73-82.</p>
      <p>M. Hameed, S. S. Sharqi, Z. M. Yaseen, H. A. Afan, A. Hussain, A. Elshafie, Application of
artificial intelligence (AI) techniques in water quality index prediction: a case study in tropical
region, Malaysia, Neural Computing and Applications 28(1) (2017) 893-905.</p>
      <p>J. Y. Ho, et. al., Towards a time and cost effective approach to water quality index class
prediction, Journal of Hydrology 575 (2019) 148-165.</p>
      <p>R. Barzegar, M. T. Aalami, J. Adamowski, Short-term water quality variable prediction using
a hybrid CNN–LSTM deep learning model, Stochastic Environmental Research and Risk
Assessment 34(2) (2020) 415-433.</p>
      <p>Z. Li, F. Peng, B. Niu, G. Li, J. Wu, Z. Miao, Water quality prediction model combining sparse
auto-encoder and LSTM network, in: IFAC-PapersOnLine 51(17) (2018) 831-836.</p>
      <p>S. B. H. S. Asadollah, A. Sharafati, D. Motta, Z. M. Yaseen, River water quality index
prediction and uncertainty analysis: A comparative study of machine learning models, Journal of
environmental chemical engineering 9(1) (2021) 104599.</p>
      <p>T. Rajaee, S. Khani, M. Ravansalar, Artificial intelligence-based single and hybrid models for
prediction of water quality in rivers: A review, Chemometrics and Intelligent Laboratory Systems
200 (2020) 103978.</p>
      <p>M. S. Samsudin, A. Azid, S. I. Khalit, M. S. A. Sani, F. Lananan, Comparison of prediction
model using spatial discriminant analysis for marine water quality index in mangrove estuarine
zones, Marine pollution bulletin 141 (2019) 472-481.</p>
      <p>M. Imani, M. M. Hasan, L. F. Bittencourt, K. McClymont, Z. Kapelan, A novel machine
learning application: Water quality resilience prediction Model, Science of the Total Environment
768 (2021) 144459.</p>
      <p>M. Ranjithkumar, L. Robert, Machine Learning Techniques and Cloud Computing to Estimate
River Water Quality-Survey, Inventive communication and computational technologies, Springer,
Singapore, 2021, p. 387-396.</p>
      <p>Y. Trach, R. Trach, M. Kalenik, E. Koda, A. Podlasek, A Study of Dispersed, Thermally
Activated Limestone from Ukraine for the Safe Liming of Water Using ANN Models, Energies
14(24) (2021) 8377.</p>
      <p>Y. Trach, D. Chernyshev, O. Biedunkova, V. Moshynskyi, R. Trach, I. Statnyk, Modeling of
Water Quality in West Ukrainian Rivers Based on Fluctuating Asymmetry of the Fish Population,
Water 14(21) (2022) 3511.</p>
      <p>L. V. Hryhorenko, Drinking water quality influence to the peasants’ morbidity in the Ukrainian
settlements, International Journal of Statistical Distributions and Applications 3(3) (2017) 38-46.</p>
      <p>J. Ober, J. Karwot, S. Rusakov, Tap Water Quality and Habits of Its Use: A Comparative
Analysis in Poland and Ukraine, Energies 15(3) (2022) 981.
[56]
[57]
[59]
[60]
[62]
[63]
[64]
[65]
[53]
[54]
[66]
[67]
[68]
[78]
[79]
[80]
[81]
[82]</p>
      <p>N. Vlasova, M. Bublyk, Intelligent Analysis Impact of the COVID-19 Pandemic on Juvenile
Drug Use and Proliferation, CEUR Workshop Proceedings 3171 (2022) 858–876.</p>
      <p>M. J. Schervish, Theory of Statistics, Springer Science &amp; Business Media, New York, 2012.
Grouping of statistical data - BukLib.net Library, 2022. URL: https://buklib.net/books/35946/
O. Prokipchuk, L. Chyrun, M. Bublyk, V. Panasyuk, V. Yakimtsov, R. Kovalchuk, Intelligent
system for checking the authenticity of goods based on blockchain technology, CEUR Workshop
Proceedings Vol-2917 (2021) 618-665.</p>
      <p>C. Baum, An Introduction to Modern Econometrics Using Stata, Mc Graw Hill, Boston, 2020.
Standard error, 2022. URL: https://ua.nesrakonk.ru/standard-error/.</p>
      <p>Standard deviation, 2022. URL: https://studopedia.su/10_11382_standartne-vidhilennya.html.
K.O. Soroka, Fundamentals of Systems Theory and Systems Analysis, Kharkiv, 2004.</p>
      <p>A. Kowalska-Styczen, K. Sznajd-Weron, From consumer decision to market share - unanimity
of majority? JASSS, 19(4) (2016). DOI:10.18564/jasss.3156.</p>
      <p>I.V. Stetsenko, Systems modeling, Cherkasy, 2010.</p>
      <p>S.S. Velykodnyi, Modeling of systems, Odessa, 2018.</p>
      <p>Graphic presentation of information, 2022. URL:
https://studopedia.com.ua/1_132145_grafichne-podannya-informatsii.html.</p>
      <p>Y. Yusyn, T. Zabolotnia, Methods of Acceleration of Term Correlation Matrix Calculation in
the Island Text Clustering Method, CEUR workshop proceedings Vol-2604 (2020) 140-150.</p>
      <p>N. Romanyshyn Algorithm for Disclosing Artistic Concepts in the Correlation of Explicitness
and Implicitness of Their Textual Manifestation, CEUR Workshop Proceedings Vol-2870 (2021)
719-730.</p>
      <p>B. Rusyn, V. Ostap, O. Ostap, A correlation method for fingerprint image recognition using
spectral features, in: Proceedings of the International Conference on Modern Problems of Radio
Engineering, Telecommunications and Computer Science, TCSET, 2002, pp. 219–220.</p>
      <p>Dataset https://www.kaggle.com/adityakadiwal/water-potability.</p>
      <p>Drinking Water Analysis Solutions
https://resources.perkinelmer.com/labsolutions/resources/docs/BRO_Drinking_Water_Analysis_Solutions_Brochure.pdf.</p>
      <p>S. Babichev, B. Durnyak, I. Pikh, V. Senkivskyy, An Evaluation of the Objective Clustering
Inductive Technology Effectiveness Implemented Using Density-Based and Agglomerative
Hierarchical Clustering Algorithms, Lecture Notes in Computational Intelligence and Decision
Making 1020 (2020) 532-553.</p>
      <p>S. Babichev, V. Lytvynenko, V. Osypenko, Implementation of the objective clustering
inductive technology based on DBSCAN clustering algorithm, in: Proceedings of Int. Scientific
and Technical Conf. on Computer Sciences and Information Technologies, 2017, pp. 479-484.</p>
      <p>O. Veres, Y. Matseliukh, T. Batiuk, S. Teslia, A. Shakhno, T. Kopach, Y. Romanova, I.
Pihulechko, Cluster Analysis of Exclamations and Comments on E-Commerce Products, CEUR
Workshop Proceedings Vol-3171 (2022) 1403-1431.</p>
      <p>S. Babichev, M.A. Taif, V. Lytvynenko, V. Osypenko, Criterial analysis of gene expression
sequences to create the objective clustering inductive technology, in: Proceedings of Int. Conf. on
Electronics and Nanotechnology, 2017, pp. 244–248. doi: 10.1109/ELNANO.2017.7939756.</p>
      <p>A. Kowalska-Styczen, K. Sznajd-Weron, Access to information in word of mouth marketing
within a cellular automata model. Advances in Complex Systems, 15(8) (2012).
DOI:10.1142/S0219525912500804.</p>
      <p>S. A. Babichev, A. Gozhyj, A. I. Kornelyuk, V. I. Lytvynenko, Objective clustering inductive
technology of gene expression profiles based on SOTA clustering algorithm, Biopolymers and Cell
33(5) (2017) 379–392. doi: 10.7124/bc.000961.</p>
      <p>I. Lurie, V. Lytvynenko, S. Olszewski, M. Voronenko, A. Kornelyuk, U. Zhunissova, О.
Boskin, The Use of Inductive Methods to Identify Subtypes of Glioblastomas in Gene Clustering,
CEUR Workshop Proceedings Vol-2631 (2020) 406-418.</p>
      <p>V. Lytvynenko, et. al., Two step density-based object-inductive clustering algorithm, CEUR
Workshop Proceedings 2386 (2019) 117–135.</p>
      <p>I. Lurie, et. al., Inductive technology of the target clusterization of enterprise's economic
indicators of Ukraine, CEUR Workshop Proceedings 2353 (2019) 848–859.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O.</given-names>
            <surname>Kuzmin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bublyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shakhno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Korolenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lashkun</surname>
          </string-name>
          ,
          <article-title>Innovative development of human capital in the conditions of globalization</article-title>
          ,
          <source>E3S Web of Conferences</source>
          <volume>166</volume>
          (
          <year>2020</year>
          )
          <fpage>13011</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Ilyash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yildirim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Doroshkevych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Smoliar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vasyltsiv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lupak</surname>
          </string-name>
          ,
          <article-title>Evaluation of enterprise investment attractiveness under circumstances of economic development</article-title>
          ,
          <source>Bulletin of Geography. Socio-economic Series 47</source>
          (
          <year>2020</year>
          )
          <fpage>95</fpage>
          -
          <lpage>113</lpage>
          . http://doi.org/10.2478/bog-2020-0006.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Jonek-Kowalska</surname>
          </string-name>
          ,
          <article-title>Housing Infrastructure as a Determinant of Quality of Life in Selected Polish Smart Cities Smart Cities 5(3) (</article-title>
          <year>2022</year>
          )
          <fpage>924</fpage>
          -
          <lpage>946</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Maslak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Danylko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Skliar</surname>
          </string-name>
          ,
          <article-title>Automation and Digitalization of Quality Cost Management of Power Engineering Enterprises</article-title>
          , in
          <source>: Proceedings of the 25th IEEE International Conference on Problems of Automated Electric Drive. Theory and Practice</source>
          ,
          <string-name>
            <surname>PAEP</surname>
          </string-name>
          <year>2020</year>
          . https://doi.org/10.1109/MEES52427.
          <year>2021</year>
          .9598744
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bublyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kowalska-Styczen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lytvyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <article-title>The Ukrainian Economy Transformation into the Circular Based on Fuzzy-Logic Cluster Analysis</article-title>
          ,
          <source>Energies</source>
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <article-title>5951</article-title>
          . doi: https://doi.org/10.3390/en14185951.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Yurynets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yurynets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Budіakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gnylianska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kokhan</surname>
          </string-name>
          ,
          <article-title>Innovation and Investment Factors in the State Strategic Management of Social and Economic Development of the Country</article-title>
          .
          <article-title>Modeling and Forecasting</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          Vol-
          <volume>2917</volume>
          (
          <year>2021</year>
          )
          <fpage>357</fpage>
          -
          <lpage>372</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matseliukh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bublyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kopach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Korolenko</surname>
          </string-name>
          ,
          <article-title>Network modelling of resource consumption intensities in human capital management in digital business enterprises by the critical path method</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          Vol-
          <volume>2851</volume>
          (
          <year>2021</year>
          )
          <fpage>366</fpage>
          -
          <lpage>380</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Jonek-Kowalska</surname>
          </string-name>
          ,
          <article-title>Towards the Reduction of CO2 Emissions. Paths of Pro-Ecological Transformation of Energy Mixes in European Countries with an Above-Average Share of Coal in Energy Consumption</article-title>
          .
          <source>Resources Policy</source>
          <volume>77</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1016/j.resourpol.
          <year>2022</year>
          .
          <volume>102701</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bublyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matseliukh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mayik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nashkerska</surname>
          </string-name>
          ,
          <article-title>Assessing losses of human capital due to man-made pollution caused by emergencies</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          Vol-
          <volume>2805</volume>
          (
          <year>2020</year>
          )
          <fpage>74</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bublyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matseliukh</surname>
          </string-name>
          ,
          <article-title>Small-batteries utilization analysis based on mathematical statistics methods in challenges of circular economy</article-title>
          ,
          <source>CEUR workshop proceedings</source>
          Vol-
          <volume>2870</volume>
          (
          <year>2021</year>
          )
          <fpage>1594</fpage>
          -
          <lpage>1603</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>I.</given-names>
            <surname>Jonek-Kowalska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wolniak</surname>
          </string-name>
          ,
          <article-title>Economic opportunities for creating smart cities in Poland. Does wealth matter?</article-title>
          ,
          <source>Cities114</source>
          (
          <year>2021</year>
          )
          <fpage>103222</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I.</given-names>
            <surname>Rishnyak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Veres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lytvyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bublyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Karpov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Panasyuk</surname>
          </string-name>
          ,
          <article-title>Implementation models application for IT project risk management</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          Vol-
          <volume>2805</volume>
          (
          <year>2020</year>
          )
          <fpage>102</fpage>
          -
          <lpage>117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>O.</given-names>
            <surname>Kuzmin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bublyk</surname>
          </string-name>
          ,
          <article-title>Economic evaluation and government regulation of technogenic (ManMade) damage in the national economy</article-title>
          ,
          <source>in: International Scientific and Technical Conference on Computer Sciences and Information Technologies</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wolniak</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Jonek-Kowalska, The level of the quality of life in the city and its monitoring Innovation</article-title>
          ,
          <source>The European Journal of Social Science Research</source>
          <volume>34</volume>
          (
          <issue>3</issue>
          ) (
          <year>2021</year>
          )
          <fpage>376</fpage>
          -
          <lpage>398</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>