=Paper=
{{Paper
|id=Vol-2713/paper07
|storemode=property
|title=Fuzzy cluster analysis of indicators for assessing the potential of recreational forest use
|pdfUrl=https://ceur-ws.org/Vol-2713/paper07.pdf
|volume=Vol-2713
|authors=Evstakhii Kryzhanivs'kyi,Liliana Horal,Iryna Perevozova,Vira Shyiko,Nataliia Mykytiuk,Maria Berlous
|dblpUrl=https://dblp.org/rec/conf/m3e2/KryzhanivskyiHP20
}}
==Fuzzy cluster analysis of indicators for assessing the potential of recreational forest use==
<pdf width="1500px">https://ceur-ws.org/Vol-2713/paper07.pdf</pdf>
<pre>
                                                                                                125


    Fuzzy cluster analysis of indicators for assessing the
            potential of recreational forest use

       Evstakhii Kryzhanivs’kyi[0000-0001-6315-1277], Liliana Horal[0000-0001-6066-5619],
           Iryna Perevozova[0000-0002-3878-802X], Vira Shiyko[0000-0002-2822-0641],
        Nataliia Mykytiuk[0000-0001-3194-3891] and Maria Berlous[0000-0003-2856-9832]

           Ivano-Frankivsk National Technical University of Oil and Gas,
                15 Karpatska Str., Ivano-Frankivsk, 76019, Ukraine
rector@nung.edu.ua, liliana.goral@gmail.com, perevozova@ukr.net,
     viraSh@i.ua, nataliamykytiukmmm@gmail.com, masher@i.ua


       Abstract. Cluster analysis of the efficiency of the recreational forest use of the
       region by separate components of the recreational forest use potential is provided
       in the article. The main stages of the cluster analysis of the recreational forest use
       level based on the predetermined components were determined. Among the
       agglomerative methods of cluster analysis, intended for grouping and combining
       the objects of study, it is common to distinguish the three most common types:
       the hierarchical method or the method of tree clustering; the K-means Clustering
       Method and the two-step aggregation method. For the correct selection of
       clusters, a comparative analysis of several methods was performed: arithmetic
       mean ranks, hierarchical methods followed by dendrogram construction, K-
       means method, which refers to reference methods, in which the number of groups
       is specified by the user. The cluster analysis of forestries by twenty analytical
       grounds was not proved by analysis of variance, so the re-clustering of certain
       objects was carried out according to the nine most significant analytical features.
       As a result, the forestry was clustered into four clusters. The conducted cluster
       analysis with the use of different methods allows us to state that their combination
       helps to select reasonable groupings, clearly illustrate the clustering procedure
       and rank the obtained forestry clusters.

       Keywords: cluster analysis, k-means clustering method, forestry, recreation.


1      Introduction

The intensive development of recreation in the world creates motivation to use
significant reserves of recreational resources. To expand the use of forest recreational
resources, it is necessary to use for this purpose not only nature reserves, but also to
involve more and more forests of state forestry farms in this use. The reserves of
recreational forest use on the territory of Ukraine are significant. Therefore, there is a
need to assess their development on the basis of the classification of forestry areas on
many analytical grounds. Taking into account the fact that such classification is a rather

___________________
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
                                                                                       126


time-consuming task, it is proposed to carry out forests clustering with the help of
software.
   The use of cluster analysis methods is dictated primarily by the fact that they help to
build scientifically based classifications, identify internal links between the observed
population units. In addition, cluster analysis methods can be used to compress
information, which is an important factor in the conditions of constant increase and
complication of statistical data flows. That is why this type of statistical analysis is of
great importance when analyzing the development of recreational facilities. It should
be noted that recently cluster analysis has received considerable attention from
domestic and foreign experts in various scientific fields. One of the reasons is that
modern science is increasingly relying on classification for its development. Moreover,
this process deepens as knowledge specialization grows, which in its turn is based
largely on objective classification. Another reason is related to the accompanying
deepening of specialized knowledge, the increase in the number of variables, taken into
account in the analysis of certain objects.
   Clustering of the studied forests will allow the effective management of recreational
areas, taking into account the reserves for improving the development of areas for
selected components and also to develop at the state level the Strategy of recreational
forest use development in Ukraine for the maintenance of the National recreational
product competitive in the domestic and world markets. Taking into consideration the
fact that each region of Ukraine is characterized by its natural and climatic conditions,
ethnic traditions and historical and cultural recreational features, there is a problem of
qualitative analysis and assessment of the level of recreational facilities development.


2      Background

The foreign scientists, who studied the issue of recreational forest management, are
Simon Bell [2], William M. Murphy [11], Lloyd C. Irland, Darius Adams, Ralph Alig,
Carter J. Betz, Chi-Chung Chen, Mark Hutchins, Bruce A. McCarl, Ken Skog and Brent
L. Sohngen [6], Nerida Anderson, Rebecca M. Ford, Lauren T. Bennett, Craig Nitschke
and Kathryn J. H. Williams [1], Artti Juutinen, Anna-Kaisa Kosenius and Ville
Ovaskainen [7], Markus A. Meyer, Joachim Rathmann and Christoph Schulz [10], Tina
Gerstenberg, Christoph F. Baumeister, Ulrich Schraml and Tobias Plieninger [5], Kee-
Cheo Lee and Kee-Rae Kang [9], Hyun-Kyu Shin and Hong-Chul Shin [12], Yevstakhii
Kryzhanivskyi, Liliana Horal, Vira Shyiko, Oleksii Holubchak and Nataliia Mykytiuk
[8].
   Markus A. Meyer, Joachim Rathmann and Christoph Schulz proved in [10] that
visitors cluster along major paths or regions in urban and rural forest, recreation of the
local population is highly driven by relaxation, forest structures and demographic
factors play a minor role for forest benefits, forest benefits do not strongly vary within
the area of the forests, forest management should focus on avoiding nuisances to
support forest benefits. They found a weak connection between recreational behavior
and demand for specific forest characteristics. For local recreation, we recommend to
                                                                                           127


provide a basic level of highly rated FB and to avoid nuisances rather than designing
forests for a desired appearance.
   Tina Gerstenberg, Christoph F. Baumeister, Ulrich Schraml and Tobias Plieninger
in [5] identified frequencies of activities in urban forests, visualized activity-specific
hot routes, and unveiled the contributions of landscape features to recreational use
intensity. The hot route maps represent an advancement of existing forest function
maps, as they were based on more reliable spatially explicit data on where people move
in forests. They used a public participation mapping procedure as a basis for visualizing
recreational use intensity. These maps may aid forest managers to tailor management
according to residents’ forest uses and preferences, prioritize objectives, and prevent
conflicts between re-creational user groups, conservationists and representatives of the
timber industry. They conclude that urban forest managers may promote outdoor
recreation by maintaining large proportions of broadleaved dominated stands. Finally,
accessibility to water bodies as well as unique structural compositions – as represented
by protected habitats – may enhance recreational use [5].
   The purpose of Kee-Cheo Lee and Kee-Rae Kang [9] is to classify the forests by
considering the supplier’s perspective as well as the user’s perspective in order to
provide fundamental materials for the operation of the natural recreation forests. A
factor analysis was conducted to identify the common characteristics of the selected
twelve variables by pre-selection and survey of experts. K-means cluster analysis was
conducted among those factors to classify the natural recreation forests in Korea. Four
factors were drawn after the factor analysis and the factors were named according to
the variables and sizes as ‘The use performance and visiting condition factor',
‘Education and settlement factor’, ‘Internal activation factor’ and ‘Potential factor’. In
addition, the cluster analysis of the matrix was conducted for the points of the drawn
factors and the final classification consists of five groups. The results of this study may
contribute to providing fundamental materials for the operation and management of
natural recreation forests. Also, it may act as a reference when investigating the natural
recreation forests of Korea. Proposing the classification natural recreation forests could
be helpful in selecting the proper recreation forest in the future. Based on the established
model, fundamental materials could be provided to improve the profitability of the
natural recreation forests by effectively expanding the number of tourists, creating new
natural recreation forests and proper maintenance and management [9].
   Hyun-Kyu Shin and Hong-Chul Shin in [12] segmented recreational forest’s visitors
for marketing based on purpose of visit. Using the factor analysis, cluster analysis, cross
tab, and t-test to find out different behavioral intention in each cluster, the result elicited
some implications. First, 2 clusters were founded and has difference in behavioral
intentions. Cluster 1 (married, 200~300 hundred won income) has higher satisfaction,
revisit intention, recommendation intention. The result shows that market researcher in
recreational forest should approach different marketing strategy and has various
facilities, active program. This research needs to survey broad region to generalized
result [12].
   Thus, having considered the scientific works of both foreign and domestic
researchers of the recreational forest management problems and without diminishing
their scientific value to improve development of recreational forest management, it is
                                                                                       128


possible to consider and necessarily classify the recreational region for a component
that is its own manufacturer [8].


3      Methodology

As it is known, for complex evaluation of every economic process or its components,
the methods of integrated indicators calculation are conventionally applied using
different economic and mathematical methods and approaches. The complex
evaluation is required to define the potential of recreational forest management,
considering the development of all its components. Therefore, in [8] we propose to
evaluate the potential of recreational forest use by performing the following steps: to
identify the recreational forest use potential components; to develop and form a system
of quantitative and qualitative indicators (indices) in order to evaluate the efficiency of
recreational forest use potential by its component composition; to evaluate the
efficiency of recreational forest use of the regional territories by individual components
of the recreational forest use potential using certain indicators; to comprehensively
evaluate the efficiency of each recreational forest use potential component; to conduct
an integrated evaluation of the efficiency of recreational forest use by means of using
taxonomic analysis methods and fuzzy set theory; to determine the level of the
recreational forest use potential by comparing the integrated indicator value with its
standard (critical) values [8]. Based on the previous studies of recreational forest
management, the following structural components of recreational forest management
potential can be formed: a resource component, social component, economic
component, innovation and investment component. Each component of recreational
forest use is characterized by a system of performance indicators. According to the
above characteristics of each component, the following system of indicators can be
proposed, considering the attributes of recreational activity, which are listed in table 1
[8].
    Economic and mathematical modelling of evaluation of the recreational forest
management potential determined the efficiency of recreational forest use of regional
territories by individual components of recreational forest management potential using
indicators specified in table 1. A taxonomic method based on determination of
taxonomic indicators of each component [8] was used for this stage.
    To approve the methodology of assessing the recreational potential of forest use, a
typical forestry of the Western region of Ukraine was selected, including 8 forestries.
It is worth mentioning that as a result of the underdeveloped information and statistical
infrastructure of forestries, it was not possible to calculate a required system of
indicators, shown in table 1. However, the taxonomic indicators were calculated based
on the actual statistical base on the resource and social components of each forestry.
The calculation results of forestry activity were summarized in table 2.
    Therefore, based on obtained calculations we can conclude that recreational forest
management in Ukraine is low, confirmed by the level of recreational forest
management potential (table 2). Of 8 analysed forests only in Forestry 1 the potential
level is average, in two forestries the integrated indicator of recreational forest
                                                                                             129


management potential level has been set at a level below average, and the remaining 5
forests have a low level of recreational forest management. Graphically obtained results
are shown in figure 1 [8].

 Table 1. Evaluation indicators of the recreational forest management potential components.
 Component          Indicator                                Substantiation
            Area of recreational ter- Total area of forestry intended for recreational forest
            ritories, km2              use
            Number of recreational Number of recreational places located on the forestry
            places, quantity           territory intended for recreational forest management
            The level of attractive- The indicator can be evaluated according to the
Resource    ness of natural and re- following criteria: exoticism, uniqueness, aesthetics,
component creational resources         comfort, etc.
            Quality factor of forest
                                       It describes the level of recreation applicability
            vegetation
            Exoticism degree (cont-
                                       It is determined as a contrast ratio degree of the resting
            rast) of recreational ter-
                                       place relative to a recreant's permanent residence
            ritory
            Proportion of total fo-
            restry costs on mainte- It shows the proportion of the total costs on
            nance of recreational maintenance of recreational territories
            places, %
            Efficiency factor of re-
                                       It shows attractiveness of recreational forest
            creational forest mana-
                                       management
            gement
Economic
            Wear coefficient of re-
component
            creational fixed assets It characterizes wear level of recreational fixed assets
            (FA)
            Volume of marginal
                                       They reflect the effect, achieved by improving the forest
            costs for growing 1 ha
                                       as a means of labor in recreation sphere
            of recreational forest
            Capacity of a single re- It shows the maximum permissible number of persons
            creational load            on recreational territory
            Proportion of recreant It shows a proportion of recreant employees in the total
            employees                  number of staff involved in recreational activities
                                       The capacity of recreation centres (resorts, tourist,
                                       health, recreational complexes) is a simultaneous
            Recreational capacity number of recreants that can be located in this centre,
Social com-                            without disturbing ecological balance within this centre
ponent                                 and surrounding territories
            Recreational load per 1 It determines attendance intensity for any segment of
            ha of forest               the day, during weekends, weekdays
            The average stay of va-
                                       It shows an average length of stay of visitors on the
            cationers on the recrea-
                                       recreational territory of forest area
            tional territory, h
            Cost amount on marke-
Innovation                             It characterizes the development level of marketing
            ting activities of recrea-
and invest-                            activities
            tional territories
                                                                                                                                                   130


 Component          Indicator                             Substantiation
ment compo- Efficiency of innovation
nent        implementation of re- It characterizes the innovation level and efficiency of
            creational forest mana- recreational innovation use
            gement
            Amount of investments It shows the amount of investment resources aimed at
            in recreational activity recreational activities
            Proportion of foreign in-
                                      It shows amount of recreational activity financing at the
            vestments in recreatio-
                                      expense of foreign financial sources
            nal activities financing
            Quantity of the won
            grants (programs) to fi- It characterizes relevance of the recreational sphere
            nance recreational acti- development
            vities

 Table 2. Taxonomic analysis results of recreational forest management of a typical forestry.
                                                        Forestry 1

                                                                     Forestry 2

                                                                                  Forestry 3

                                                                                               Forestry 4

                                                                                                            Forestry 5

                                                                                                                         Forestry 6

                                                                                                                                      Forestry 7

                                                                                                                                                   Forestry 8
                     Indicator


Taxonomic indicator of resource component              1.00 0.51 0.33 0.36 0.32 0.33 0.31 0.33
Taxonomic indicator of social component                1.00 0.56 0.56 0.30 0.30 0.21 0.39 0.39
Taxonomic indicator of economic component              0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Taxonomic indicator of innovation and investment
                                                       0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
component
Integrated indicator of recreational forest management
                                                       0.50 0.27 0.22 0.16 0.16 0.14 0.17 0.18
potential level

   Thus, according to the results of economic and mathematical modelling of the
integrated indicator of recreational forest management potential level, it can be
concluded that the recreational forest management potential in Ukraine is low
(figure 1), so measures should be taken to improve recreational activity results and
develop this industry. As the calculations indicate, first of all, it is urgent to develop
economic and innovation investment components of the recreational forest
management potential in Ukraine.
   Thus, having obtained the results of calculating the integrated indicator of the
recreational forest use level in the studied forests, we consider it necessary to conduct
a fuzzy cluster analysis of forestry based on the analysis of forest use potential
individual indicators for the studied objects. The main stages of cluster analysis of the
recreational forest use level by predetermined components are shown in the figure 2.
   To implement the clustering process, it is necessary to develop a matrix of
observations xij. In this case, the original set consists of m elements described by n
parameters, and each of its lines can be interpreted as a point or vector placed in i-
dimensional space with coordinates equal to the value of n features for a particular
forestry. Thus, in the observation matrix xij is the value of feature i for j forestry; j – a
number of classification objects (forestry); i – a number of features of the objects.
                                                                                      131


            Fig. 1. Integrated indicator of recreational forest management level.

        Selection of indicators that are most influential in the
        analysis process


        Standardization of selected indicators


        Clustering of forestries using the Ward method, full
        connection, k-mean algorithm


        Formulation of conclusions


             Fig. 2. The main stages of cluster analysis of the recreational forest

Using element multiplicity w, described by n-signs, each unit can be interpreted as a
point of n-dimensional space with coordinates equal to the value of n attributes for the
analysed unit. Let us represent the matrix as follows:
                                       x11 x12 ...x1k ....x1n 
                                                                  
                                       x21 x22 ...x2 k ...x2 n 
                                      ......................... 
                                   X                             
                                       xi1 xi 2 ....xik ....xin                     (1)
                                      ......................... 
                                                                  
                                       xw1 xw 2 ....xwk ....xwn 

where: w is the number of study periods, n is the number of indicators of each
recreational forest management potential, xik – indicator value k of each specific
component for a year (k = 1 n, і = 1w).
   As indicators of recreational forest use management level assessment are reflected
in various measures, they need to be standardized. One of the most common means of
                                                                                        132


statistical generalization for inhomogeneous populations is the standardization of
indicators by the ratio of deviation (xi) to the unit of standardization. In our case, σi is
chosen as the standardization unit. These features should be normalized using the
following formula:

                                          =                                             (2)

when

                                      =    ∑                                            (3)
                                                          1

                                    1 w          
                                                         2
                              s k    ( x ik  x k ) 2                               (4)
                                     w i 1             

where: zij – standardized value of indicator j for the i-th study period; xij – standardized
value of indicator j for the i-th study period; xj – arithmetic mean of kj indicator; σj –
standard deviation of k indicator; w – a number of periods.
   The main feature of clusters is that objects belonging to one of them are more similar
to each other than objects from different clusters. Such a classification with the help of
software and computer system STATISTICA, can be performed simultaneously on a
fairly large number of analytical features. In our case, clusters will be called
geographically concentrated and interconnected by the level of recreational potential of
forestry.
   Among the agglomerative methods of cluster analysis, which are intended for
grouping and combining objects of study, it is common to distinguish three most
common types: hierarchical method (I) or the method of tree clustering; K-means
Clustering Method (II) and two-step aggregation method (III).
   I.        Hierarchical clustering is used in the formation of clusters by determining
             the distances between objects and allows you to graphically visualize the
             results of the study in the form of a dendrogram. These distances can be
             determined in one-dimensional or multidimensional space. However, an
             important step in conducting a cluster analysis is to select the correct
             method for calculating the distances between the studied objects. The main
             ways to determine distances are: Euclidean distance, square of Euclidean
             distances, distance of city squares (Manhattan), Chebyshev distance, power
             distance.
   II.       K-means Clustering Method is the most common among non-hierarchical
             methods of cluster analysis. Unlike hierarchical methods, which did not
             require prior assumptions about the number of clusters, to be able to use
             this method it is necessary to have a hypothesis about the most probable
             number of clusters. K-means Clustering Method builds k clusters located at
             as large distances from each other as possible. Note that the K-means
             Clustering Method assumes that the number of clusters includes
             observations with the closest average value. The method is based on
             minimizing the sum of the distances squares between each observation and
                                                                                         133


            the center of its cluster, i.e. the function. In this case, the choice of the
            number of clusters is based on the research hypothesis. If it is not present,
            it is recommended to create 2 clusters, further 3, 4, 5, comparing the
            received results. The input will be Xu = {x1u, x2u,…, xmu} – a set of unmarked
            data; Xkl = {x1l, x2l,…, xpl} is a set of marked data in the class k, Xl=Kk=1Xkl.
            At the output, we want to obtain separated K sets {Ck} Kk = 1 of Xu, which
            minimizes the objective function in k-means. Set parameters:
  1.   t = 0.
  2.   Initialization of cluster centers:

                                      =     ∑ ∈                                          (5)

   3. Repeat until convergence:
   provide cluster data:
   For marked data: x  xkl provide x to the cluster Ckt+1.
   For unlabeled data: for xiu  xu provide to Ckt+1 a cluster obtained from
k = arg mink||xiu – μkt||2.
   4. Update centers:

                                       =     ∑ ∈                                         (6)

   t←t+1.
   Another component of the algorithm is based on the discrepancy KL, which is a
measure of the mismatch between the two probability distributions. Taking into account
the K-dimensional probability vector of assignment of clusters p and q corresponding
to points respectively xp and xq, the discrepancy KL between p and q is given by the
formula:

                                  ( ‖ )=∑          log ,                                 (7)

where K is the number of clusters. In this approach, we use a symmetric variant of the
discrepancy KL, because we are dealing only with the optimization of the loss function
for p and q simultaneously:

                              ,   =    ( ‖ )+       ( ‖ )                                (8)

Losses are obtained by first fixing p and calculating the discrepancy q with p and vice
versa.
   The described method makes it possible to automate the process of cluster data
analysis, especially if the number of clusters is unknown from the beginning. For this
purpose, the model of the neural network-based cluster data analysis system was
described on the basis of k-means and KL discrepancy methods.
   III.     The two-way aggregation method is used in cases when you want to
            perform simultaneous clustering of objects (columns) and observations
            (rows) [11].
                                                                                         134


   The key to the adequacy of the economic objects cluster analysis results is a
reasonable choice of factors by which the grouping is carried out. Regarding the factor
characteristics, we used a four-component system of indicators, which are shown in
table 1.
   The main purpose of cluster analysis is to break down the set of studied objects and
features into homogeneous in the appropriate sense groups or clusters. This means that
the task of classifying data and identifying the appropriate structure in it is solved.
Methods of cluster analysis can be used in different cases, even when it comes to a
simple grouping, and which all comes down to creating groups by the number of
similarities.
   The need for an objective division of different economic objects into groups exists
constantly, because this classification allows you to find methods for effective
management of these objects. Methods of cluster analysis allow to solve the following
tasks: classification of objects taking into account the features that reflect the essence,
nature of objects; verification of the assumptions about the presence of some structure
in the studied set of objects, i.e. search for the existing structure; building new
classifications for phenomena that have been little studied when it is necessary to
establish the existence of relationships within the population and try to introduce a
structure into it.
   Cluster analysis has certain shortcomings and limitations. In particular, the
composition and number of clusters depends on the selected breakdown criteria. When
reducing the original data set to a more compact form, certain distortions may occur,
and individual features of individual objects may be lost by replacing their
characteristics with generalized values of cluster parameters.
   When classifying objects, the possibility of the absence of any cluster values in the
considered set is often ignored. In the cluster analysis it is considered that: 1) the chosen
characteristics allow, in principle, a desirable division into clusters; 2) the units of
measurement (scale) are chosen correctly.
   The quality criterion of clustering to some extent reflects the following informal
requirements: 1) within groups, objects must be closely related; 2) objects of different
groups must be far from each other; 3) other things being equal, the distribution of
objects by groups must be uniform. The key point in cluster analysis is the choice of
metrics (or measures of proximity of objects), which crucially depends on the final
version of the objects division into groups with a given algorithm of division.
   The task of cluster analysis is to, based on the data of the set X, divide the set of
objects G into m (m is an integer) of clusters (subsets) G1, G2,…, Gm, so that each object
Gj belongs to one and only one subset of the breakdown and that objects belonging to
the same cluster are similar, while objects belonging to different clusters are
heterogeneous. The solution to the problem of cluster analysis is the breakdowns that
satisfy some criterion of optimality. This criterion may be some functionality that
expresses the levels of different breakdowns desirability and groups, called the
objective function. For further research, it was possible to use the methods of theories
of complex systems and equipment made by tools used to examine the necessary
systems of complexity, which were used in conventional [4; 3; 14; 13].
                                                                                                 135


  Let’s perform cluster analysis according to the K-means Clustering method
described above for each of the selected components (table 3).

                       Table 3. Substantiation of component’s indicator.
 Component                                  Indicator                            Substantiation
                              Area of recreational territories, km2                  var2
                             Number of recreational sites, quantity                  var3
   Resource          The level of attractiveness of natural and recreational
                                                                                     var4
  component                                 resources
                               Quality factor of forest vegetation                   var5
                      Exoticism degree (contrast) of recreational territory          var6
                    A proportion of total forestry costs on maintenance of
                                                                                     var7
                                      recreational sites, %
                      Efficiency factor of recreational forest management            var8
  Economic
                       Wear coefficient of recreational fixed assets (FA)            var9
  component
                 Volume of marginal costs for growing 1 ha of recreational
                                                                                     var10
                                              forest
                             Capacity of a single recreational load                  var11
                               Proportion of recreant employees                      var12
    Social                            Recreational capacity                          var13
  component                   Recreational load per 1 ha of forest                   var14
                The average stay of vacationers on the recreational territory, h     var15
                Cost amount on marketing activities of recreational territories      var16
                Efficiency of innovation implementation of recreational forest
                                                                                     var17
  Innovation                              management
     and                Amount of investments in recreational activity               var18
  investment     Proportion of foreign investments in recreational activities
                                                                                     var19
  component                                 financing
                  Quantity of grants (programs) won to finance recreational
                                                                                     var20
                                            activities

To begin with, we will standardize certain input data and summarize the results in table
4.

    Table 4. The results of the standardization of the features of the recreational forest use
                                      assessment features.
                                                                                                                              136


In the first stage of the cluster analysis, we find out whether the selected objects of
study (Forestris) form “natural clusters”. To do this, use the method of hierarchical
classification, in which we select the following characteristics: Amalgamation (joining)
rule: Complete Linkage, Single Linkage and Ward’s method; Distance metric is:
Euclidean distances (non-standardized). The obtained clustering results are shown in
figures 3-6.
                                                          Tree Diagram f or 15 Cases
                                                                 Single Linkage
                                                               Euclidean distances
                       6,0


                       5,5


                       5,0
    Linkage Distance


                       4,5


                       4,0


                       3,5


                       3,0
                             C_12          C_10          C_8         C_6         C_13          C_11         C_2         C_1
                                    C_15          C_14         C_7         C_5           C_4          C_3         C_9


                                    Fig. 3. Tree diagram for 15 forestries (Single Linkage).

Complete Linkage defines a relationship between clusters as the longest distance
between two objects in different clusters (“the farthest neighbor”). Distance metric is
Euclidean distances is a geometric distance in n-dimensional space and is calculated by
the formula:

                                                   ( , )=            ∑     (   −     )                                        (9)

From the obtained calculations and the constructed dendrogram it is possible to draw
conclusions that the investigated forestries form 5 natural clusters. Let’s test the above
hypothesis by dividing the original data of K-means clustering into 5 clusters and check
the significance of the difference between the obtained groups.
   The best results in terms of meaningful interpretation were obtained by using an
iterative method of cluster analysis, in particular the K-means clustering algorithm with
division into three clusters. After the procedures performed by using the previously
mentioned computer program, the results of clustering were obtained, which are shown
in figure 6.
                                                                                                                              137


                                                     Tree Diagram f or 15 Cases
                                                            Ward`s method
                                                          Euclidean distances
                   12

                   11

                   10

                   9
Linkage Distance


                   8

                   7

                   6

                   5

                   4

                   3

                   2
                         C_8           C_6         C_12          C_14          C_4           C_11          C_2          C_1
                                C_7          C_5          C_15          C_13          C_10          C_3           C_9


                                 Fig. 4. Tree diagram for 15 forestries (Ward’s method).

                                                     Tree Diagram for 15 Cases
                                                           Complete Linkage
                                                          Euclidean distances
                   9


                   8


                   7
Linkage Distance


                   6


                   5


                   4


                   3


                   2
                        C_14          C_13         C_8           C_6           C_10          C_2           C_11         C_1
                               C_15          C_4          C_7           C_5           C_3           C_12          C_9

                                Fig. 5. Tree diagram for 15 forestries (Complete Linkage).
                                                                                            138


                                 Plot of Means for Each Cluster
  4


  3


  2


  1


  0


  -1


  -2


  -3
                                                                                     Cluster 1
                                                                                     Cluster 2
  -4                                                                                 Cluster 3
         Var3   Var5    Var7    Var9    Var11      Var13   Var15   Var17   Var19     Cluster 4
                                       Variables                                     Cluster 5

        Fig. 6. Average level of normed values of indicators for the selected clusters.

To check the quality of the clustering, a variance analysis was performed, the results of
which (table 5) indicate the relative quality of the clustering procedure: intergroup
values of variances (Between SS) do not significantly exceed intragroup values (Within
SS), except for 9 factors and the level of p- significance reaches the optimal value only
for 9 characteristics.
   Next, for qualitative clustering in the cluster analysis, we include the 9 most
significant features of the previously performed analysis of variance. To implement
clustering, we use the method of hierarchical classification, in which we select the
following characteristics: Amalgamation (joining) rule: Complete Linkage, Single
Linkage and Ward’s method; Distance metric is Euclidean distances (non-
standardized). The obtained clustering results are shown in the figures 7-9.
   From the obtained calculations and the constructed dendrogram we can conclude
that the studied forests form 4 natural clusters. Let’s test the above hypothesis by
dividing the original data of K-means clustering into 4 clusters and check the
significance of the difference between the obtained groups.
   The best results in terms of meaningful interpretation were obtained by using an
iterative method of cluster analysis, in particular the K-means clustering algorithm with
division into four clusters. After the procedures performed by using the previously
mentioned computer program, the results of clustering are obtained, which are shown
in figure 10.
                                                                                                                         139


                                                    Table 5. Analysis of variance.
                                         Analysis of Variance (Апробація)
                                          Between      df    Within    df              F          signif.
                          Variable           SS                SS                                    p
                          Var2              9,01688       4   4,98312 10              4,52371     0,024111
                          Var3             10,37888       4   3,62112 10              7,16553     0,005449
                          Var4              6,29275       4   7,70725 10              2,04118     0,164208
                          Var5              9,85135       4   4,14865 10              5,93648     0,010325
                          Var6              4,56180       4   9,43820 10              1,20833     0,366127
                          Var7              8,97487       4   5,02513 10              4,46500     0,025055
                          Var8              7,08283       4   6,91717 10              2,55987     0,103927
                          Var9             10,50677       4   3,49323 10              7,51937     0,004596
                          Var10             6,80881       4   7,19119 10              2,36707     0,122708
                          Var11             6,76535       4   7,23465 10              2,33783     0,125890
                          Var12             5,38430       4   8,61570 10              1,56235     0,258010
                          Var13            10,04167       4   3,95833 10              6,34211     0,008287
                          Var14             5,62076       4   8,37924 10              1,67699     0,230981
                          Var15             5,44553       4   8,55447 10              1,59143     0,250834
                          Var16             9,92733       4   4,07267 10              6,09387     0,009470
                          Var17             3,19534       4 10,80466 10               0,73934     0,586234
                          Var18             2,41334       4 11,58666 10               0,52072     0,722951
                          Var19            11,55609       4   2,44391 10             11,82130     0,000831
                          Var20            10,15565       4   3,84435 10              6,60426     0,007224


                                                     Tree Diagram for 15 Cases
                                                            Single Linkage
                                                         Euclidean distances
                   3,4

                   3,2

                   3,0

                   2,8
Linkage Distance


                   2,6

                   2,4

                   2,2

                   2,0

                   1,8

                   1,6
                         C_12          C_10         C_15          C_13         C_4         C_7         C_2         C_1
                                C_11          C_6          C_14          C_5         C_8         C_3         C_9

                                  Fig. 7. Tree diagram for 15 forestries (Single Linkage).
                                                                                                                                    140


                                                                Tree Diagram f or 15 Cases
                                                                      Complete Linkage
                                                                     Euclidean distances
                          7


                          6


                          5
Linkage Distance


                          4


                          3


                          2


                          1
                                  C_6         C_4            C_14          C_8         C_10         C_12          C_2         C_1
                                        C_5         C_15            C_13         C_7          C_3          C_11         C_9


                                         Fig. 8. Tree diagram for 15 forestries (Complete Linkage).

                                                                Tree Diagram for 15 Cases
                                                                     Ward`s method
                                                                   Euclidean distances
                          12


                          10


                              8
       Linkage Distance


                              6


                              4


                              2


                              0
                                  C_15          C_13          C_5          C_8         C_10         C_12          C_2         C_1
                                         C_14          C_6          C_4          C_7          C_3          C_11         C_9

                                         Fig. 9. Tree diagram for 15 forestries (Complete Linkage).
                                                                                            141


                                 Plot of Means for Each Cluster
 2,5


 2,0


 1,5


 1,0


 0,5


 0,0


 -0,5


 -1,0


 -1,5

                                                                                      Cluster 1
 -2,0                                                                                 Cluster 2
          Var2   Var3    Var5   Var7     Var9   Var13 Var16 Var19 Var20
                                                                                      Cluster 3
                                       Variables                                      Cluster 4


        Fig. 10. Average level of normed values of indicators for the selected clusters.

The distance between the clusters, which are selected by K-means Clustering Method,
was calculated by a simple Euclidean distance and are presented in table 6.

                        Table 6. Euclidean distances between clusters.
                        Euclidean Distances between Clusters (Апробація)
                        Distances below diagonal
           Cluster      Squared distances above diagonal
           Number          No. 1       No. 2     No. 3        No. 4
           No. 1          0,000000 2,445802 1,394819 1,277132
           No. 2          1,563906 0,000000 1,068632 1,250228
           No. 3          1,181025 1,033747 0,000000 1,106422
           No. 4          1,130103 1,118136 1,051866 0,000000

To check the quality of the clustering, a dispersion analysis was performed, the results
of which (table 7) indicate the high quality of the clustering procedure: intergroup
values of variances (Between SS) significantly exceed intragroup values (Within SS),
and the level of p-significance is much better than the normative (0.05).
   Also, the contribution to the division of objects into groups is characterized by the
values of Fisher’s criterion (F-criterion) and its significance level (p): the higher the
values of the first and the smaller the values of the second, the better the clustering. For
                                                                                        142


all parameters, without exception, the significance level approaches 0, which indicates
the high statistical significance of the F-criterion. Depending on the levels of these
indicators, forestry was grouped into four clusters (table 8).

                       Table 7. Euclidean distances between clusters.
                     Analysis of Variance (Апробація)
                      Between      df    Within    df            F        signif.
         Variable        SS                SS                                p
         Var2           9,79064       4 4,209359 10             5,81481   0,011049
         Var3          10,46539       4 3,534605 10             7,40210   0,004860
         Var5          10,59468       4 3,405316 10             7,77805   0,004073
         Var7          10,19347       4 3,806533 10             6,69472   0,006896
         Var9          10,47683       4 3,523172 10             7,43423   0,004786
         Var13         10,41854       4 3,581461 10             7,27255   0,005172
         Var16         10,38962       4 3,610375 10             7,19428   0,005373
         Var19         12,47685       4 1,523151 10            20,47867   0,000083
         Var20          8,85809       4 5,141908 10             4,30681   0,027826

                                 Table 8. Forestry clusters.
                               Forestry group Forestry
                                  1 cluster  1, 2, 9, 11, 12
                                  2 cluster      4, 5, 6
                                  3 cluster    3, 7, 8, 10
                                  4 cluster    13, 14,15


4      Results and conclusion

For the correct selection of clusters, a comparative analysis of several methods was
performed: the arithmetic mean, hierarchical methods followed by dendrogram
construction, K-means Clustering Method, which refers to reference methods in which
the number of groups is specified by the user. The cluster analysis using different
methods allows us to state that their combination helps to select reasonable groupings,
visually illustrate the clustering procedure and rank the obtained clusters.
   Thus, the results of the cluster analysis on 9 analytical grounds confirmed the
hypothesis of separation of 4 clusters from 15 forestries. The first cluster is formed by
five forestries 1, 2, 9, 11, 12, which are characterized by an average area of recreational
territories, biggest number of recreational sites and recreational capacity, lowest quality
factor of forest vegetation, proportion of total forestry costs on maintenance of
recreational sites, wear coefficient of recreational fixed assets, cost amount on
marketing activities of recreational territories, proportion of foreign investments in
recreational activities financing, quantity of grants (programs) won to finance
recreational activities. The second cluster is formed by three forestries 4, 5, 6. This
cluster is characterized by the highest level of recreational territories, quality factor of
forest vegetation, cost amount on marketing activities of recreational territories,
proportion of foreign investments in recreational activities financing, an average level
                                                                                             143


of recreational capacity and number of recreational sites, lowest level of proportion of
total forestry costs on maintenance of recreational sites, wear coefficient of recreational
fixed assets, quantity of grants (programs) won to finance recreational activities. The
third cluster includes four forestries 3, 7, 8, 10, which have the following
characteristics: the highest level of wear coefficient of recreational fixed assets,
recreational capacity and quantity of grants (programs) won to finance recreational
activities, average area of recreational territories, number of recreational sites and
recreational capacity, quality factor of forest vegetation, cost amount on marketing
activities of recreational territories and quantity of grants (programs) won to finance
recreational activities, lowest proportion of total forestry costs on maintenance of
recreational sites. The fourth cluster includes 3 forestries 13, 14, 15 and is characterized
by the highest level of the proportion of total forestry costs on maintenance of
recreational sites and wear coefficient of recreational fixed assets, lowest number of
recreational sites and recreational capacity, quality factor of forest vegetation,
recreational capacity, cost amount on marketing activities of recreational territories and
quantity of grants (programs) won to finance recreational activities and quantity of
grants (programs) won to finance recreational activities, the lowest of recreational sites.
    For the proper selection of the clusters, a comparative analysis of several methods
was performed: arithmetic mean, hierarchical methods followed by dendrogram
construction, K-means method, which refers to the reference methods in which the
number of groups is specified by the user. The cluster analysis, using different methods,
allows us to state that their combination allows to select reasoned groupings, visually
illustrate the clustering procedure and rank the obtained clusters.
    The obtained results of clustering will help to develop separate development
strategies for each isolated cluster, which will increase the efficiency of recreational
areas management in the future. In addition, the results can be used to form an effective
model for the development of recreational clusters.


References
 1.   Anderson, N., Ford, R.M., Bennett, L.T., Nitschke, C., Williams, K.J.H.: Core values
      underpin the attributes of forests that matter to people. Forestry: An International Journal
      of Forest Research 91(5), 629–640 (2018). doi:10.1093/forestry/cpy022
 2.   Bell, S.: Forest recreation: New opportunities and challenges for forest managers. Rad.
      Sumar. inst. Izvanredni broj 10, 155–160 (2005)
 3.   Bielinskyi, A., Soloviev, V., Semerikov, S., Solovieva, V.: Detecting stock crashes using
      Levy distribution. CEUR Workshop Proceedings 2422, 420–433 (2019)
 4.   Derbentsev, V., Semerikov, S., Serdyuk, O., Solovieva, V., Soloviev, V.: Recurrence based
      entropies for sustainability indices. E3S Web of Conferences 166, 13031 (2020).
      doi:10.1051/e3sconf/202016613031
 5.   Gerstenberg, T., Baumeister, C.F., Schraml, U., Plieninger, T.: Hot routes in urban forests:
      The impact of multiple landscape features on recreational use intensity. Landscape and
      Urban Planning 203, 103888 (2020). doi:10.1016/j.landurbplan.2020.103888
 6.   Irland, L.C., Adams, D, Alig, R., Betz, C.J., Chen, C.-C., Hutchins, M., McCarl, B.A., Skog,
      K., Brent, L.: Assessing Socioeconomic Impacts of Climate Change on US Forests, Wood-
      Product Markets, and Forest Recreation: The effects of climate change on forests will
                                                                                             144


      trigger market adaptations in forest management and in wood-products industries and may
      well have significant effects on forest-based outdoor recreation. BioScience 51(9), 753–764
      (2001). doi:10.1641/0006-3568(2001)051[0753:ASIOCC]2.0.CO;2
 7.   Juutinen, A., Kosenius, A., Ovaskainen, A.: Estimating the benefits of recreation-oriented
      management in state-owned commercial forests in Finland: A choice experiment. Journal
      of Forest Economics 20(4), 396–412 (2014). doi:10.1016/j.jfe.2014.10.003
 8.   Kryzhanivskyi, Ye., Horal, L., Shyiko, V., Holubchak, O., Mykytiuk, N.: Economic and
      Mathematical Modelling for Evaluation of Potential Recreational Forest Utilization.
      Advances in Economics, Business and Management Research 99, 173–178 (2019)
 9.   Lee, K.-C., Kang, K.-R.: Classification of Recreation Forests through Cluster Analysis.
      Journal of the Korean Institute of Landscape Architecture 37(1), 9–17 (2009)
10.   Meyer, M.A., Rathmann, J., Schulz, C.: Spatially-explicit mapping of forest benefits and
      analysis of motivations for everyday-life’s visitors on forest pathways in urban and rural
      contexts.     Landscape        and      Urban      Planning      185,     83–95      (2019).
      doi:10.1016/j.landurbplan.2019.01.007
11.   Murphy, W.: Forest Recreation in a Commercial Environment. In: Small-scale forestry and
      rural development: The intersection of ecosystems, economics and society. Proceedings of
      IUFRO 3.08 Conference, Hosted By Galway-Mayo Institute of Technology, Galway,
      Ireland, 18-23 June 2006, pp. 347–356
12.   Shin, H.-K., Shin, H.-C.: Market Segmentation on Recreational Forest Visitors by Cluster
      Analysis. The Journal of the Korea Contents Association 10(3), 364–372 (2010).
      doi:10.5392/JKCA.2010.10.3.364
13.   Soloviev, V., Bielinskyi, A., Solovieva, V.: Entropy Analysis of Crisis Phenomena for DJIA
      Index. CEUR Workshop Proceedings 2393, 434–449 (2019)
14.   Soloviev, V.N., Belinskiy, A.: Complex Systems Theory and Crashes of Cryptocurrency
      Market. Communications in Computer and Information Science 1007, 276–297 (2019)

</pre>