<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparative Analysis of Two Approaches to the Clustering of Respondents (Based on Survey Results)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Iryna Strutynska</string-name>
          <email>strutynska@ukr.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Halyna Kozbur</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lesia Dmytrotsa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olha Hlado</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liliya Melnyk</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>State Higher Vocational School in Nowy Sącz</institution>
          ,
          <addr-line>Nowy Sącz</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ternopil Ivan Puluj National Technical University</institution>
          ,
          <addr-line>Ternopil</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper proposes an algorithm for solving the survey respondents' clustering problem, including the steps of collecting, preparing data, summarizing key results, and developing future goals. The research consists of two approaches to clustering: iterative and hierarchical in order to produce consistent and comprehensible results. The iterative method is implemented in MS Excel using the Data Mining add-in, hierarchical one is used with the help of writing code and using Python libraries. Hard clusters with sufficient degree of similarity within the cluster and differences from others were distinguished, the main characteristics of the obtained clusters were described as well. It has been experimentally established that the method of agglomerative hierarchical clustering is more effective for solving the problem of clustering of mixed-type data obtained from the survey of respondents.</p>
      </abstract>
      <kwd-group>
        <kwd>digital maturity</kwd>
        <kwd>clustering methods</kwd>
        <kwd>mixed-type data</kwd>
        <kwd>security</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The fastest and the most convenient way to get any information you need today is to
directly interview your target audience on a specific topic. With the development of
information technology, such questionnaires are increasingly shifting from personal
or telephone communication to online questionnaires. This allows you to reach a
larger audience in a shorter time span and with fewer human resources. The positive
aspects of such surveys are: convenience of expression; partial or complete anonymity
of results; the ability to complete a survey in any convenient for the respondent way;
no need to communicate with the employees of the survey organization, etc. Online
surveys are a particularly effective way of retrieving information if your target
audience is the users of the web. Data collection is only part of the complex task of getting
the information you need. Further processing and analysis of data with conclusions
and recommendations make the data cycle complete. Segmentation or clustering is
one of the most important and interesting tasks of data analysis. This paper offers an
algorithm for solving the problem of clustering respondents by online survey,
including the steps of collecting, preparing data, summarizing key findings, and developing
future goals.</p>
      <p>
        The problem of clustering of numerical data as a result of a series of measurements
was described by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A similar grouping of respondents was addressed in the works
of [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The main purpose of the work was to provide recommendations on the results
of determining the respondents’ political preferences and to compare the clustering
method with other generally accepted methods of providing such recommendations.
The problem of clustering of categorical data using probabilistic approach and
GACUC algorithm was investigated by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The main statements regarding the
clustering of mixed-type data and applying of chosen in the paper algorithms and metrics
were discussed in [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">4,5,6,7</xref>
        ]. However, the problem of processing and clustering of
mixed data obtained from the questionnaire has not been researched so far.
      </p>
      <p>This paper studies clustering of respondents using two approaches: iterative and
hierarchical in order to produce consistent and comprehensible results. The iterative
method is implemented in MS Excel using the Data Mining add-in, hierarchical one is
used with the help of writing code and using Python libraries.</p>
      <p>
        The survey, the results of which were taken as inputs to the clustering task, was
conducted among small and medium-sized enterprises in Ternopil region for the use
of digital technologies and tools in their business activities [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The problem of clustering. Approach typing and solution algorithms</title>
      <p>Clustering or cluster data analysis is one of the machine learning tasks of splitting
multiple objects into subsets (clusters) so that the objects assigned to one cluster are
as similar as possible to each other and the objects referred to different kinds are as
different as possible. This approach does not require a labeled data.</p>
      <p>One of the most common contemporary tasks that uses cluster analysis is text
analysis for news broadcasting, image grouping, consumer segmentation, community
identification on social networks, etc.</p>
      <p>The variability of tasks, types of datasets and expected results has led to the
formation of a large number of methods and approaches to clustering, which differ in
their understanding of the concept of “cluster”, as well as adjusting the parameters of
algorithms (number of expected clusters, density threshold, distance metrics, etc.) the
specifics of the dataset and the subsequent use of the results. Thus, this makes it
difficult to uniquely select the algorithm of operation and its parameters for each type of
task.</p>
      <p>
        Due to this, clustering can also be called an interactive task of machine learning
“with reinforcement”, which provides repeated experimental correction of algorithm
parameters for obtaining stable and interpretative results [
        <xref ref-type="bibr" rid="ref10 ref9">9,10</xref>
        ].
      </p>
      <p>
        There is no single common way to classify clustering methods and algorithms. One
approach is to distinguish clustering methods by cluster models used
(connectivitybased or hierarchical, centroid-based, distribution-based, density-based overlapping
clustering etc.). Another approach uses grouping of methods based on their key
characteristics (probabilistic, logical, graph-theoretic, hierarchical, neural, frequency
algorithms, etc.) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>One of the simplest approaches to clustering methods is to divide them into two
groups: hierarchical and non-hierarchical. Hierarchical cluster analysis methods are
divided into ascending or descending and can be represented graphically in the form
of dendrograms. At the same time with each subsequent step the number of clusters
increases or decreases depending on the chosen method: divisional or agglomerative
respectively.</p>
      <p>
        The largest group among non-hierarchical methods is iterative. In an iterative
approach, they define cluster centers and redistribute the elements of the data set by
proximity to the selected centers. These include algorithms k-means,
ExpectationMaximization method, mean-shift and others [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>The similarity of cluster elements and the closeness of clusters are determined by
predefined metrics.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Data Collection</title>
      <p>The input data set was obtained through Google Forms questionnaires from
executives in various companies and enterprises (limited liability companies (LLC),
individual entrepreneur) and businesses (construction, trade, repair, logistics, services,
etc.)</p>
      <p>Respondents answered 35 questions regarding two main aspects of running their
business:
• forms, organizations and spheres of activity;
• level of informatization of business activities (use of digital tools in their work,
work with social networks, planning services, analytics or advertising).</p>
      <p>Due to the specifics of the information requested and the use of different categories
of questions, both quantitative and categorical data were received. For example,
information about the number of employees in an organization was obtained in the form
of natural numbers, and information about the presence of a business model was
presented as a binary “yes” or “no” answer. There were some open-ended questions
regarding the respondent’s attitude to a particular problem related to informatization of
the business structure. Such responses were excluded from the general clustering
dataset.</p>
      <p>Numerous mechanical errors and blank answers were found in the data retrieval.
These problems were solved with manual processing. However, as the number of
respondents increases, such processing will require the unification of the possible
answer options for each of the questions or the reduction of all possible answers only
to the choice of the suggested ones.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Data Preparation</title>
      <p>Data preparation consisted of the steps of clearing and encoding data, missing values
were not identified in this study.
4.1</p>
      <sec id="sec-4-1">
        <title>Clearing data</title>
        <p>Both the hierarchical agglomerative algorithm and the EM method were run for the
same data set, so pre-processing due to data cleaning was performed equally for both
methods.</p>
        <p>The answers to the open-ended questions were reduced to a specific template, for
example, only “yes” or “no”, or otherwise unified, for example how is shown in Fig 1.
Attributes of the “automatically calculated questionnaire time” or “respondent’s
personal attitude” were marked as informative and removed from the task input. All
manipulations were performed manually due to the small dimension of the task.
Using the MS Excel add-on requires no special training and accepts a simple
spreadsheet of values of any type. Therefore, data encoding was not performed for clustering
with the Data Mining add-in in MS Excel.</p>
        <p>Data encryption was required to work with Python machine learning libraries since
most algorithms use mathematical operations on quantitative data. For those questions
where it was possible to rank more or less or better or worse, ranked value coding was
used. Responses were coded from 0 to some positive number, where 0 meant a single
answer “no” or a number close to 0, and other values were ranked according to the
increase in manifestation of the sign.</p>
        <p>Non-ranking answer options are nominal type data and have been indicated by
some character numbers. For further computational work of the algorithm with such
data, the Hower metric was used, which makes it possible to work with both
quantitative and categorical numerical data at the same time. Respondents' coded answers are
shown in Fig. 2.</p>
        <p>Fig 2. The survey respondents' coded answers
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Solving the clustering problem</title>
      <p>The main objectives of the study were:
 experimental finding of the optimal number of clusters and their characteristic
features for the interpreted (understandable) segmentation of business structures
according to the level of digital maturity by several methods;
 comparing the results obtained by different methods and determining the most
effective for a particular data analysis task.</p>
      <p>
        The study used two methods of clustering:
1. Using the Data Mining add-in for MS Excel spreadsheets [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Clustering
capabilities in MS Excel are represented by iterative algorithms: k-means and
ExpectationMaximization. For the reference, it was determined EM-algorithm;
2. Using the functions of libraries for machine learning Python programming
language [
        <xref ref-type="bibr" rid="ref14">14,15</xref>
        ].
      </p>
      <p>
        To describe how it works with two algorithms, we have introduced the notation: N
respondents  = {⃗⃗⃗⃗1, ⃗⃗⃗⃗2, … , ⃗⃗⃗⃗⃗ } and M questions  = { 1,  2, … ,   }. Every
participant ⃗⃗ ∈  ( ∈ ̅1̅,̅̅̅) answered each of the questions   ∈  ( ∈ ̅1̅̅,̅̅̅), so the
result is a matrix of responses with dimension( ×  ), in which each respondent is
represented as follows: ⃗⃗ = {  1,   2, … ,   , … ,   }, where   is an answer
lrespondents to k-question (Fig. 3). In the future, we call this tuple a point [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Let us consider the principles of the selected methods.
of the problem, the Hower metric (1) proposed in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] was used to calculate the
distance matrix.
where  
=  (  ,
      </p>
      <p>) – the distance between the answers in the k-th question, M
is the number of answers to the query in the tuple. The distance matrix Dk for the k-th
question is symmetric:
(1)
(2)
 (⃗⃗⃗

, ⃗⃗⃗ ) =

1 ∑</p>
      <p>=1   ,
0
 12
0
 13
 23</p>
      <p>0
0
 (⃗⃗⃗⃗1, ⃗⃗⃗⃗2)  (⃗⃗⃗⃗1, ⃗⃗⃗⃗3)
0
 (⃗⃗⃗⃗2, ⃗⃗⃗⃗3)</p>
      <p>0
Dk =

=
…
…
…
…
…
…
…
…
…
…
 1
 2
 3
…
0
 (⃗⃗⃗⃗1, ⃗⃗⃗⃗⃗ )
 (⃗⃗⃗⃗2, ⃗⃗⃗⃗⃗ )
 (⃗⃗⃗⃗3, ⃗⃗⃗⃗⃗ )
…
0
The symmetric matrix</p>
      <p>for distances between individual points of the cluster
looks like:</p>
      <p>The elements of the matrix D are the averaged values of the pairwise values of the
distances calculated by the formula (1). All questionnaire weights are taken to be 1.</p>
      <p>The way if measuring distances</p>
      <p>depend on the type of data in k question. If  
and  
quantitative, then distance</p>
      <p>is expressed by the formula (2):
.</p>
      <p>In this case,  (⃗⃗⃗

, ⃗⃗⃗ ) ∈ [0; 1]. If  
then the distance is calculated by the formula (3):
– nominal data that cannot be ordered,</p>
      <p>= 0 means identical answers of the respondents   to k question,
and</p>
      <p>= 1 – maximal difference. As a consequence, for the averaged distances
calculated by formula (1), all values (⃗⃗⃗

, ⃗⃗⃗ ) ∈ [0; 1].</p>
      <p>The distance between the individual clusters was by the distance neighbour
method. Clusters closest to the selected metric are merged, distances from newly created to
other clusters are recalculated, the distance matrix is automatically updated, and
clustering continues. The method of the far neighbour allows to allocate rather compact
and stable structures corresponding to the task.
5.2</p>
      <sec id="sec-5-1">
        <title>Expectation-Maximization method</title>
        <p>In contrast to the proposed modification of the agglomerative method, the fuzzy
clustering EM algorithm presented in the Data Mining Add-in for Microsoft Excel was
selected among the iterative algorithms. In this case, the main idea of the method is to
assume that the elements of the input data set are independent random variables
distributed by a law, in most cases a normal Gaussian distribution [16,17,18,19].</p>
        <p>When using the EM method, any object in the dataset is considered to belong to all
clusters with different probabilities. Before starting the algorithm, the number of K
clusters and the initial approximate parameters for each of the K distributions of the
input data are specified. Iterations incrementally improve the distribution parameters
to a predetermined level of model accuracy. Upon completion of the algorithm, each
object will be assigned to a cluster with the highest probability of belonging. Thus,
two successive steps are performed at each iteration:</p>
        <p>Further, the algorithm is based on an iterative repetition of two consecutive steps as
shown in the Fig. 4.
1. Expectation is calculating the probability (plausibility) of the points belonging to
2.</p>
        <p>Maximization is improvement of distribution parameter values to maximize the
each of the clusters;
likelihood of points belonging to clusters.
5.3</p>
      </sec>
      <sec id="sec-5-2">
        <title>Adjustment of algorithm parameters</title>
        <p>
          The MS Cluster Task Wizard in MS Excel allows you to select the desired parameters
and adjust their values [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. For this task, the list of questions that will affect the
result was changed in the algorithm settings, the value of the number of clusters and the
cluster seed of the EM clustering method were set.
        </p>
        <p>Referring to the sklearn.cluster.AgglomerativeClustering method to create a
clustering model using Python generally involves specifying 3 parameters: number of
clusters, intra-cluster distance and inter-cluster distance metrics. The metric of the
distance metric between the elements may be one of those proposed in [21] or
otherwise calculated. Calling the function of creating a cluster model:</p>
        <p>model = AgglomerativeClustering (number of clusters = m, metric =
“precomputed”, linkage = complete)
labels = model.fit_predict(distances),
where m − predefined number of clusters, distances – distance matrix previously
calculated by the Hower metric (1) – (3).
Let us consider the clustering results by each method and compare the results
obtained. The clustering output of MS Excel's Data Mining add-on provides a
breakdown of the dataset into clusters with the ability to visualize, view statistics, and
cluster profiles.</p>
        <p>
          Using the clustering methods of the sklearn library to analyse data on Python
output provides a one-dimensional numeric array indicating which cluster each input
tuple belongs to. Further analysis and visualization of the obtained result is carried out
additionally. The algorithm for selecting the optimal number of clusters is described
in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
6.1
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>The results of hierarchical agglomerative clustering</title>
        <p>The matrix of distances D between the points of the input set now looks like this
(the fragment of the matrix is shown in Table 1):</p>
        <p>Тable 1. Matrix of distances between questions
As a result of the agglomerative clustering algorithm using Python sklearn, a stable
distribution of 5 clusters was obtained, with a satisfactory value of the quality metric,
the Silhouette index [22]:  ≈ 0.16. A comparative analysis of the clusters obtained
by main characteristics is shown in Table 2. The percentages indicate the proportion
of respondents in each cluster who answered the same question the same way. The
number of respondents who answered equally to the selected questions varied from 40
to 100%. To distinguish the characteristic features of the formed clusters, we leave the
values greater than 80% and depict the comparison of the clusters.
According to the results, the largest number of respondents (16) was attributed to the
first cluster, the main characteristics of which are:
 lack of experience with any digital tools;
 absence of companies in the Internet environment;
 site inefficiency, if any.</p>
        <p>The second cluster was formed by 5 companies, the main characteristics of which
are defined as follows:
 availability of companies in the Internet space;
 usage of simple tools to a limited extent.</p>
        <p>2 more respondents formed the third cluster that is characterized by:
 effective functioning of the site and the purchase chain;
 use of most digital tools including advertising;
 the work of a marketer to promote a brand or product.</p>
        <p>The fourth cluster consists of 10 respondents, and its characteristic features are:
 lack of functioning of the purchase chain on the site;
 non-use of sophisticated digital tools;
 availability of social networks only.</p>
        <p>The fifth cluster consists of only one company that successfully uses virtually all
digital tools with the help of specialists.</p>
        <p>A sufficient degree of differences between clusters and a sufficient degree of
similarity of elements within the cluster (80-100%) makes it possible to clearly identify
the following groups and rank them by the level of use of digital technologies and
tools in business activities. Table 3 shows the ranking of types of business structures
by the decline in digital maturity.
The clustering performed by the EM method in MS Excel proved to be unstable.
Because the EM algorithm is a group of iterative fuzzy clustering methods, this result is
normal and suitable for use in a particular class of tasks. The comparative
characteristics of the clusters obtained are shown in Table 4. In contrast to agglomerative
clustering, the degree of uniformity of answers to questions within the clusters is much
lower, and fluctuates on average within 60-70%. As we can see, the degree of difference
between clusters is also low. Repeated application of the EM method did not improve
the quality of the results.</p>
        <p>Similar to the previous clustering method, a cluster was identified that included
two business entities that are actively using digital technology in their businesses. The
differences between the other clusters are small, the differences in the percentage of
answers to the questions are minimal, so it is impossible to distinguish the
characteristics of each subset. The inability to distinguish clusters with distinct features does not
meet the objective of the study.
In this study, we conducted an experimental comparison of the use of two approaches
to the clustering of respondents according to online survey results using the Google
Forms service, hard and soft clustering, in particular. Hard clustering was
implemented with the use of Python tools and the hierarchical agglomerative method, while soft
clustering was viewed through the use of the Data Mining add-in MS Excel and the
iterative EM method.</p>
        <p>A comparative analysis of the results obtained by the two methods showed the
following results:
 Using hierarchical agglomerative clustering, we obtained 5 clusters, sufficiently
different from each other and with a high degree of similarity between the elements
of the cluster (60-100% depending on the question). The cluster features are
distinguished (use of social networks, advertising offices and services, analytical tools,
search engine optimization of sites, etc.);
 the use of the EM method did not allow to obtain good clustering results and to
achieve the goal of the task, the results of the EM method implementation changed
with each run of the algorithm.</p>
        <p>It has been experimentally established that the method of agglomerative
hierarchical clustering is an effective method for solving the problem of clustering of
mixed-type data obtained from the survey of respondents.</p>
        <p>In addition to improving the parameters of the algorithm, the tasks for the further
studies are: elimination of mechanical errors when entering answers and the presence
of empty values; diversity of data, which causes complexity of their unification and
proper ordering; selection of mathematical metrics used as arguments for clustering
functions and calculating their quality.
15. Scikit. Clustering documentation. Scikit learn [Online]. Avaliable:
https://scikitlearn.org/stable/modules/clustering.html.
16. Expectation-maximization algorithm, Wikipedia [Online]. Avaliable:
https://en.wikipedia.org/wiki/Expectation-maximization_algorithm
17. Yair Weiss. Bayesian motion estimation and segmentation. PhD thesis, Massachusetts</p>
        <p>Institute of Technology, May 1998.
18. Zinchenko D. EM-algoritm klasterizatsii [EM-clustering algorithm], AlgoWiki [Online],
2016. Avaliable:
https://algowiki-project.org/ru/Участник:Noite/EMалгоритм_кластеризации
19. Oreshkov V. EM – masshtabiruyemyy algoritm klasterizatsii [EM – scalable clustering
algorithm], BaseGroup Labs [Online], 2013. Avaliable:
https://basegroup.ru/community/articles/em
20. Lotfi, E. (2018). [image] Available at:
https://www.researchgate.net/profile/Elaachak_Lotfi/publication/322641344/figure/fig3/A
S:585607373922305@1516631086753/flowchart-of-EM-algorithm.png [Accessed 15 Oct.
2019].
21. Scikit. Agglomerative Clustering. Scikit learn [Online]. Avaliable:
https://scikitlearn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.h
tml
22. Scikit. Silhourtte Score. Scikit learn [Online]. Avaliable:
https://scikitlearn.org/stable/modules/ generated/sklearn.metrics.silhouette_score.html</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cherezov</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tyuakchev</surname>
            <given-names>N.</given-names>
          </string-name>
          <article-title>Obzor osnovnykh metodov klassifikatsii I klasterizatsii dannykh [Overview main methods of data classification and clustering]</article-title>
          .
          <string-name>
            <surname>Vestnik</surname>
            <given-names>VGU</given-names>
          </string-name>
          ,
          <article-title>Seria: Sistemnyi analiz i informachionnye tekhnologii, no 2</article-title>
          . pp.
          <fpage>25</fpage>
          -
          <lpage>29</lpage>
          ,
          <year>2009</year>
          . Avaliable at: http://www.vestnik.vsu.ru/pdf/analiz/2009/02/2009-02-05.pdf (
          <issue>Accessed 14</issue>
          <year>August 2019</year>
          ). (In Russian).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Katakis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Tsapatsoulis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Tziouvas</surname>
            and
            <given-names>F. Mendes.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Clustering Online Poll Data: Towards a Voting Assistance System</article-title>
          ,
          <source>2012 Seventh International Workshop on Semantic and Social Media Adaptation and Personalization</source>
          , http://www.katakis.eu/wpcontent/uploads/2014/11/katakissmap12.pdf.doi.org/10.1109/SMAP.
          <year>2012</year>
          .19
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>J. McCaffrey</surname>
          </string-name>
          ,
          <article-title>Machine Learning Using C#</article-title>
          . Syncfusion,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chae</surname>
            ,
            <given-names>S. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.-M.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>W. Y.</given-names>
          </string-name>
          , (
          <year>2006</year>
          ).
          <article-title>Cluster analysis with balancing weights on mixed-type data</article-title>
          .
          <source>The Korean communications in statistics</source>
          ,
          <volume>13</volume>
          (
          <article-title>3) doi</article-title>
          .org/10.5351/CKSS.
          <year>2006</year>
          .
          <volume>13</volume>
          .3.
          <fpage>719</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gower</surname>
            <given-names>JC</given-names>
          </string-name>
          <article-title>(</article-title>
          <year>1967</year>
          )
          <article-title>A comparison of some methods of cluster analysis</article-title>
          .
          <source>Biometrics</source>
          <volume>23</volume>
          :
          <fpage>623</fpage>
          -
          <lpage>637</lpage>
          doi.org/10.2307/2528417
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>William</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Rand</surname>
          </string-name>
          .
          <article-title>Objective Criteria for the Evaluation of Clustering Methods</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          . Vol.
          <volume>66</volume>
          , No.
          <volume>336</volume>
          (
          <issue>Dec</issue>
          .,
          <year>1971</year>
          ), pp.
          <fpage>846</fpage>
          -
          <lpage>850</lpage>
          doi.org/10.2307/2284239
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gower</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          , (
          <year>1971</year>
          ).
          <article-title>A General Coefficient of Similarity and Some of Its Properties</article-title>
          . Biometrics,
          <volume>27</volume>
          (
          <issue>4</issue>
          ), p.
          <fpage>859</fpage>
          . doi.org/10.2307/2528823
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>I.</given-names>
            <surname>Strutynska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kozbur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dmytrotsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Bodnarchuk</surname>
          </string-name>
          and
          <string-name>
            <surname>O.Hlado.</surname>
          </string-name>
          (
          <year>2019</year>
          )
          <article-title>Small and Medium Business Structures Clustering Method Based on Their Digital Maturity</article-title>
          . 2019
          <source>International Scientific-Practical Conference Problems of Infocommunications. Science and Technology</source>
          , pp.
          <fpage>278</fpage>
          -
          <lpage>282</lpage>
          (in print)
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Klasterizatsia</surname>
          </string-name>
          [Clustering] Wikiconspekts, ITMO University [Online].
          <year>2019</year>
          . Avaliable: http://neerc.ifmo.ru/wiki/index.php?title=Кластеризация (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Cluster</surname>
            <given-names>analysis</given-names>
          </string-name>
          , Wikipedia [Online]. Avaliable: https://en.wikipedia.org/ wiki/Cluster_analysis
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Zatserkovnyi</surname>
            <given-names>V. I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burachek</surname>
            <given-names>V. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhelezniak</surname>
            <given-names>O. O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tereshchenko</surname>
            <given-names>A. O.</given-names>
          </string-name>
          <article-title>Heoinformatsiini systemy i bazy danykh [Geoinformation systems</article-title>
          and databases]. Nizhyn, Ukraine: NDU im. M. Hoholia,
          <year>2017</year>
          . pp.
          <fpage>77</fpage>
          -
          <lpage>95</lpage>
          . Avaliable at: https://studfiles.net/preview/6440954 (In Ukrainian).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>12. Google Forms, https://www.google.com/intl/uk_ua/forms</mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Microsoft</surname>
          </string-name>
          (
          <year>2017</year>
          <article-title>Dec)</article-title>
          .
          <article-title>Cluster Wizard (Data Mining Add-ins for Excel)</article-title>
          ,
          <source>Microsoft Docs [On-line]</source>
          . Avaliable: https://docs.microsoft.
          <article-title>com/en-us/sql/analysis-services/clusterwizard-data-mining-add-ins-for-excel?view=sql-server-2014</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Python</surname>
          </string-name>
          , https://www.python.org.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>