-

Speech Vocal Clustering Methods

Cherkasy State Technological University

Cherkasy

Shevchenko blvd.

Ukraine

t.utkina

k.rudakov

i.zubko

m.chychuzhko}@chdtu.edu.ua

fedorovee

@ukr.net

ineks-kiev@ukr.net 0 E. O. Paton Electric Welding Institute , Kyiv, Bozhenko str., 11, 03680 , Ukraine

1899

0000 0003

The problem of increasing the clustering efficiency of vocal speech sounds is considered. Centroid and medoid clustering methods that use normalized distances which increases the efficiency of clustering, are proposed. Characteristics and quality criteria based on them are suggested for these methods. These methods have been investigated on TIMIT database and are intended for intelligent biometric identification systems.

vocal speech sound centroid medoid clustering inter-cluster distance intra-cluster distance compression ratio

Automated biometric identification of a person means decision-making based on acoustic and visual information, which improves the quality of recognition of the person under investigation. Unlike the traditional approach [ 1 ], computer biometric identification speeds up and increases the likelihood of recognition, which is especially critical in conditions of limited time.

A special class of biometric identification of a person is formed by methods based on acoustic information [ 2 ].

To increase the efficiency of analysis, storage and transmission of a speech signal, knowledge of the structure of vocal speech sound is required, for which the following clustering methods are used.

Traditional methods (without the use of artificial neural networks and metaheuristics) [ 3-6 ] search for a solution faster than metaheuristic methods, some methods do not require setting of the number of clusters, but perform only a directed search for a solution and some methods require setting of the number of clusters or additional parameters.

Neural network methods [ 7-10 ] perform a search for a solution faster than metaheuristic methods, but perform only a directed search for a solution and require exact setting of the number of clusters.

Metaheuristic methods [ 11-14 ] perform a random search for a solution, but the search for a solution is longer than that of other methods, and require exact setting of the number of clusters.

Thus, only some traditional methods do not require an exact setting of the number of clusters, but instead require setting of additional parameters.

The aim of the work is to increase the clustering efficiency of vocal speech sounds by providing an adaptive number of clusters, a minimum number of set parameters and parallel information processing.

To achieve this goal, it is necessary to solve the following tasks:

1. to develop a centroid clustering method; 2. to create medoid clustering methods; 3. to determine the characteristics and quality criterion of the clustering method; 4. to conduct a numerical study of the proposed clustering methods.

Problem statement The problem of increasing the clustering efficiency of vocal speech sound comes down to the problem of finding such vector of parameters * that satisfies the crite1 rion F  SSWB   min , where SSWB is the ratio of the sums of average intra

C  cluster and inter-cluster distances, C is the compression ratio of speech sound. 3

Literature review

Traditional clustering methods include:

1. Partition-based (partitioning-based) or center-based methods

In this case, a cluster is a set of objects, each of which is closer to the center of this cluster than to the center of any other cluster. The center of the cluster is usually a centroid (average value of the coordinates of all objects in the cluster) or a medoid (an object of the cluster, the average difference of which from other objects in the cluster is minimal). These methods consider intersecting areas with a high density of objects as different clusters, for example, K-means [ 15 ], PAM (kmedoids) [ 16 ], FCM [ 17 ], ISODATA [ 18 ] methods. The advantage of these methods consists in a quick search for a solution; some of them do not require setting of the number of clusters and have poor sensitivity to noise or random emissions. The disadvantage of these methods consists in the presence of only a directed search for a solution; a problem with clusters of different shapes, sizes and densities; some of them require setting of the number of clusters and are sensitive to noise or random emissions or require setting of additional parameters. 2. Model mixture or distribution-based or model-based methods

In this case, a cluster is described by the probability density function. These methods should be used when clusters are of different sizes, and the set of objects of all clusters can be described by a mixture of distribution densities, for example, EM method [ 19 ]. The advantage of these methods consists in a quick search for a solution; determination of the shape and size of each cluster by the corresponding distribution density. The disadvantage of these methods consists in the presence of only a directed search for a solution; requirement for precise setting of the number of clusters; sensitivity to noise or random emissions; the difficulty of choosing the appropriate distribution density. 3. Density-based methods

In this case, a cluster is an area with a high density of objects, which is separated by areas with a low density of objects from areas with a high density of objects. These methods consider intersecting areas with a high density of objects as a single cluster. These methods should be used when noise and random emissions are present, clusters have different shapes and sizes, and the number of clusters is unknown. Methods that define clusters (for example, DBSCAN method [ 20 ]) and those that provide visualization of clusters (for example, OPTICS method [ 21 ]) are distinguished. The advantage of these methods consists in the absence of a requirement to set the number of clusters; the presence of various shapes and sizes of a cluster; poor sensitivity to noise or random emissions. The disadvantage of these methods consists in the presence of only a directed search for a solution; a problem with clusters of heterogeneous density; slow search for a solution. 4. Hierarchal methods

These methods provide visualization of a cluster tree called a dendrogram. On a dendrogram, pairs of clusters that are joined (in the case of agglomerative methods) or obtained as a result of separation (in the case of divisive methods) are connected by a U-shaped arc, the height of which corresponds to the distance between clusters. By the method of constructing a dendrogram, these methods are divided into agglomerative or bottom up ones – each object is considered as a singleton cluster, after which a stepwise combination of pairs of the closest clusters is performed (for example, centroid communication, Ward, single communication, full communication, group average methods [ 22, 23 ]) and divisive or top down ones – all objects are considered as one cluster, and at each step one of the constructed clusters is divided into a couple of clusters (for example, DIANA, DISMEA methods [ 22, 23 ]). The advantage of these methods consists in the absence of a requirement to set the number of clusters; visibility; some of them don’t strongly tend to clusters of a certain shape and size and have poor sensitivity to noise or random emissions. The disadvantage of these methods consists in the presence of only a directed search for a solution; slow search for a solution; some of them strongly tend to clusters of a certain shape and size, are sensitive to noise or random emissions, tend to discard clusters of high power.

Usually, the methods listed above either require setting of the number of clusters or have a slow search for a solution or require setting of additional parameters, which leads to a decrease in the clustering efficiency.

Therefore, the urgent task is to increase the clustering efficiency of vocal speech sounds by providing an adaptive number of clusters, a minimum number of set parameters and parallel information processing.

Method clustering of vocal speech sound Centroid clustering of vocal speech sound based on minimum distance method Centroid clustering of vocal speech sound based on the author’s minimum distance method includes the following steps: 1. Set a lot of samples of vocal speech sound S  {si (n)} , i 1, I , n 1, N , which are in a single amplitude-time window, where I is the number of samples, N is the length of the sample. Set the number of quantization levels of a speech signal L (for an 8-bit sound sample L  256 ). Set normalized threshold  , 0    1 . Set the number of clusters K  0 . 2. Calculate normalized squared distance between each pair of sound samples 3. Calculate the distance between each sound sample and a lot of sound samples Dij  4. Determine the number of sound sample with a minimum distance 5. Set a sound sample with a minimum distance as the new cluster center, i.e. mK 1  si* , set the number of sound samples in the new cluster per unit, i.e. aK 1  1 , increase the number of clusters, i.e. K  K 1 . 6. Set the sound sample number i  1 . 7. If i*  i , then go to step 13. 8. Calculate normalized squared distance between the ith sound sample and cluster centers

Dk  si  mk 9. Calculate the smallest normalized squared distance between the ith sound sample and cluster centers 10. Determine the number of the cluster with a minimum distance d *  min Dk , k 1, K .

k k*  arg min Dk , k 1, K .

k a * mk*  si 11. If d *   , then calculate the new center of the k* th cluster, i.e. m *  k k increase the number of sound samples in the k* th cluster, i.e. ak*  ak* 1 . 12. If d *   , then set the ith sample as the new cluster center, i.e. mK 1  si , set the number of sound samples in the new cluster per unit aK 1  1 , increase the number of clusters, i.e. K  K 1 . 13. If i  I , then go to the new sample, i.e. i  i 1 , go to step 7.

The method results in an adaptive set of centroids. 4.2

Medoid clustering of vocal speech sound based on minimax distance method Medoid clustering of vocal speech sound based on minimax distance method, unlike the traditional version, preliminarily determines the number of a sound sample with a minimum distance for a non-random choice of the first cluster center and includes the following steps: 1. Set a lot of samples of vocal speech sound S  {si (n)} , i 1, I , n 1, N , which are in a single amplitude-time window, where I is the number of samples, N is the length of the sample. Set the number of clusters c  0 . 2. Calculate the squared distance between each pair of sound samples 3. Calculate the distance between each sound sample and a lot of sound samples 2 Dij  si  s j , i 1, I , j 1, I . 4. Determine the number of sound sample with a minimum distance 5. Set a sound sample with a minimum distance as the new cluster center, i.e.

c1 (n)  si* (n) , n 1, N , increase the number of clusters, i.e. c  c  1. 6. Calculate the squared distance between each sound sample and each cluster center 7. Calculate a minimax squared distance between sound samples and cluster centers Dik  si k

2 , i 1, I , k 1, c . d *  max min Dik , i 1, I , k 1, c .

i k i*  arg max min Dik , i 1, I , k 1, c .

i k 8. Determine the number of sound sample with a minimax squared distance 9. If c  1 , then set a sound sample with a minimax squared distance as the new cluster center, i.e. c1 (n)  si* (n) , n 1, N , increase the number of clusters, i.e. c  c  1, go to step 6. 10. Calculate the average squared distance between cluster centers  

2 c c c2  c i1 ji i  j 2 . 11. Verification of the termination condition.

If d *   2

, then set a sound sample with a minimax squared distance as the new cluster center, i.e. c1 (n)  si* (n) , n 1, N , increase the number of clusters, i.e. c  c  1, go to step 6. 12. Calculate the distance between each sound sample and each cluster center 13. Determine for each sound sample the cluster center closest to it

Dik  si k , i 1, I , k 1, c . Medoid clustering of vocal speech sound based on subtractive clustering method, unlike the traditional version, uses the normalization of squared distances for calculating the potentials of sound samples to specify normalized standard deviations and includes the following steps: 1. Set a lot of samples of vocal speech sound S  {si (n)} , i 1, I , n 1, N , which are in a single amplitude-time window, where I is the number of samples, N is the length of the sample. Set the number of quantization levels of a speech signal L (for an 8-bit sound sample L  256 ). Set the threshold to stop the method  , 0    1 . Set standard deviations  a and  b ,  b   a , 0   a  1 , 0   b  1 .

Set the number of clusters c  0 . 2. Calculate the potential of each sound sample

j1 P(i)  I e 4sa2iNLs2j 2 , i 1, I . i*  arg max P(i) , i 1, I .

i 3. Determine the number of sound sample with a maximum potential 4. Set the potential P(i* ) as the new cluster potential, i.e. P(c  1)  P(i* ) . 5. Calculate the new potential of each sound sample  4 sic1 2 P(i)  P(i)  P(c  1)e b2NL2 , i 1, I .

6. Verification of the termination condition.

P(c  1) If

P(1)

  , then set a sound sample with the highest potential as the new cluster center, i.e. c1(n)  si* (n) , n 1, N , increase the number of clusters, i.e. c  c  1, go to step 3. 7. Calculate the distance between each sound sample and each cluster center

Dik  si k , i 1, I , k 1, c . 8. Determine for each normalized sound sample the cluster center closest to it

The method results in an adaptive set of medoids.

Medoid clustering of vocal speech sound based on minimum average distance method Medoid clustering of vocal speech sound based on the author’s method of minimum average distance includes the following steps: 1. Set a lot of samples of vocal speech sound S  {si (n)} , i 1, I , n 1, N , which are in a single amplitude-time window, where I is the number of samples, N is the length of the sample. Set the number of quantization levels of a speech signal L (for an 8-bit sound sample L  256 ). Set the radius of the neighborhood of sound samples  , 0    1 . Set the number of clusters c  0 . Set a lot of samples of speech sound that haven’t fallen into existing clusters, S  {si (n)} . 2. Calculate normalized squared distance between each pair of sound samples Dij  3. Calculate the distance between each sound sample and a lot of sound samples 4. Determine the neighborhood of each sound sample

Ui,  { j | Dij   , j 1, I} , i 1, I . 5. Determine the neighborhood of the sound sample with a minimum average distance

d i*  arg min i ,

iS S

U *  Ui* , . 6. Calculate the new distance between each sound sample and a lot of sound samples that haven’t fallen into existing clusters

0, i U * di  d   Dik , i U * , k Ui, U * , i  S .

 i  k 7. Determine the new neighborhood for each sound sample that hasn’t fallen into existing clusters. 8. Set a sound sample with a minimum average distance as the new cluster center, i.e.

c1 (n)  si* (n) , n 1, N , increase the number of clusters, i.e. c  c  1. 9. Verification of the termination condition.

If S \ U *   , then S  S \ U * , go to step 5. 10. Calculate the distance between each sound sample and each cluster center (codebook vector) 11. Determine for each sound sample the cluster center closest to it ui  arg min Dik , i 1, I , k 1, c .

k The method results in an adaptive set of medoids. 5

Determination of characteristics and quality criterion for clustering methods of vocal speech sound To evaluate clustering methods, the following characteristics are used in the work:

1. The sum of average intra-cluster distances: ─ in the case of centroid clustering methods ─ in the case of medoid clustering methods

 I SSW  kK1  i1  AikI1(si )Ak s(isi)mk  I c    Ak (si ) si k SSW  k1  i1 I1 i  Ak (si )  Ak (si )  10,, ssii  AAkk , 2    ,    2    ,   

2. The sum of inter-cluster distances: ─ in the case of centroid clustering methods ─ in the case of medoid clustering methods

3. The ratio of the sums of average intra-cluster and inter-cluster distances

In addition, to assess the quality of clustering methods, the following characteristics – compression ratio for speech sound – is proposed in the work: where I is the number of samples of vocal speech sound, c – the number of clusters.

The following criterion for the quality of clustering, which means choosing such a value  that minimizes the sum of the inverse of compression ratio and the ratio of the sums of average intra-cluster and inter-cluster distances, is formulated in the paper:

F  SSWB  1 C  min .

 (1)

For centroid clustering of vocal speech sound based on minimum distance method    . For medoid clustering of vocal speech sound based on subtractive clustering method   ( a , b , ) . For medoid clustering of vocal speech sound based on minimum average distance method    .

Experiments and results Numerical experiments were carried out using notebook Intel Core i5 8th Gen, MATLAB package, CUDA technology of parallel information processing on the GeForce 920M graphics card with the number of threads in the block N s = 1024. In this case, the most time-consuming (computational complexity O(I 2 ) , where I is the number of samples of vocal speech sound) step 2 of all four proposed clustering methods was parallelized. This made it possible to speed up the search for a solution.

For speech signals containing vocal sounds, the sampling frequency fd = 8 kHz and the number of quantization levels L = 256 were set. The length of the sample of vocal speech sound N  256 .

For modified method of subtractive clustering, the following fixed parameter values  =0.01,  b  1.25 a were set.

The results of a numerical study of the proposed clustering methods for vocal sounds of people from TIMIT database are presented in Table 1.

The result presented in Table 1 shows that the author's method of minimum average distance provides the smallest F value, calculated according to (1). The compression ratio is approximately 7.5, i.e. the number of stored samples is reduced by about 7.5 times.

Based on the experiments, the following conclusions can be drawn.

The author’s method of minimum distance should be used only when centroids are required, and not medoids, since it has the largest F and the lowest compression ratio.

The modified method of minimax distance performs a coarser adjustment of the number of clusters than the modified method of subtractive clustering and the author's method of minimum average distance, because it does not use parameters. On the other hand, an operator does not need to set any parameters, the values of which are established empirically.

The modified method of subtractive clustering performs finer adjustment of the number of clusters than the modified method of minimax distance, because it uses parameters. It requires more complex setting than the author's method of minimum average distance, because it uses three parameters, the values of which are established empirically.

The author's method of minimum average distance performs finer adjustment of the number of clusters than the modified method of minimax distance, because it uses parameters. It requires simpler adjustment than the modified method of subtractive clustering, because it uses only one parameter, the value of which is established empirically.

Conclusions The article considers the problem of increasing the clustering efficiency of vocal speech sounds. The following clustering methods are proposed – the author’s method of minimum distance, the author’s method of minimum average distance, the modified method of minimax distance (unlike the traditional version, it preliminarily determines the number of a sound sample with a minimum distance for non-random choice of the first cluster center), the modified method of subtractive clustering (unlike the traditional version, it uses squared distances normalization to set normal standard deviations in calculating the potentials of sound samples). Characteristics and quality criteria based on them are proposed for these methods. The proposed methods allow to increase the clustering efficiency of vocal speech sounds by providing an adaptive number of clusters, a minimum number of specified parameters and parallel information processing. The proposed methods are intended for software implementation on GPU using CUDA technology, which speeds up the process of finding a solution. Software that implements the proposed methods has been developed and researched on TIMIT database. The conducted experiments have confirmed the operability of the developed software and allow to recommend it for use in practice when solving problems of biometric identification of a person. Prospects for further research are to test the proposed methods on a wider set of test databases.

1. Singh , N. , Khan , R.A. , Shree , R. : Applications of Speaker Recognition. Procedia Engineering . 38 , 3122 - 3126 ( 2012 ).

2. Weychan , R. , Marciniak , T. , Dabrowski , A. : Implementation aspects of speaker recognition using Python language and Raspberry Pi platform . 2015 Signal Processing: Algorithms , Architectures, Arrangements, and Applications (SPA). ( 2015 ). doi: 10 .1109/SPA. 2015 .7365153

3. Brusco , M.J. , Shireman , E. , Steinley , D.: A comparison of latent class, K-means, and Kmedian methods for clustering dichotomous data . Psychological Methods . 22 , 563 - 580 ( 2017 ). doi: 10 .1037/met0000095

4. Lenarczyk , P. , Piotrowski , Z. : Speaker recognition system based on GMM multivariate probability distributions built-in a digital watermarking token . Przeglad Elektrotechniczny . 89 ( 2 ), 59 - 63 ( 2013 ).

5. Ram , A. , Jalal , S. , Jalal , A.S. , Kumar , M.: A Density Based Algorithm for Discovering Density Varied Clusters in Large Spatial Databases . International Journal of Computer Applications . 3 , 1 - 4 ( 2010 ). doi 10.5120/ 739 - 1038

6. Krishnamurthy , A. , Balakrishnan , S. , Xu , M. , Singh , A. : Efficient Active Algorithms for Hierarchical Clustering . CoRR. abs/1206 .4672, ( 2012 ).

7. Haykin , S.S.: Neural networks and learning machines . Pearson , Delhi ( 2016 ).

8. Du , K.-L., Swamy , M.N.S. : Neural Networks and Statistical Learning . ( 2014 ). doi: 10 .1007/978-1- 4471 -5571-3

9. Sivanandam , S.N. , Sumathi , S. , Deepa , S.N. : Introduction to neural networks using MATLAB 6.0 . Tata McGraw-Hill

Education

, New Delhi ( 2010 ).

10. Larin , V.J. , Fedorov , E.E. : Combination of PNN network and DTW method for identification of reserved words, used in aviation during radio negotiation . Radioelectronics and Communications Systems . 57 , 362 - 368 ( 2014 ). doi: 10 .3103/s0735272714080044

11. Das , S. , Abraham , A. , Konar , A. : Metaheuristic clustering . Springer Berlin, Berlin ( 2009 ).

12. Saemi , B. , Hosseinabadi , A.A.R. , Kardgar , M. , Balas , V.E. , Ebadi , H.: Nature Inspired Partitioning Clustering Algorithms: A Review and Analysis . Soft Computing Applications Advances in Intelligent Systems and Computing . 96 - 116 ( 2017 ). doi: 10 .1007/978-3- 319 - 62524- 9 _ 9

13. Brownlee , J.: Clever algorithms: nature-inspired programming recipes . Creative Commons , Melbourne ( 2011 ).

14. Subbotin , S. , Oliinyk , A. , Levashenko , V. , Zaitseva , E. Diagnostic rule mining based on artificial immune system for a case of uneven distribution of classes in sample . Communications. 3 . 3 - 11 . ( 2016 ).

15. Steinley , D. , Brusco , M.J. : Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques . Journal of Classification . 24 , 99 - 121 ( 2007 ). doi: 10.1007/s00357- 007-0003-0

16. Kaufman , L., Rousseeuw , P.J.: Finding groups in data: an introduction to cluster analysis . Wiley-Interscience, Hoboken, NJ ( 2005 ).

17. Wu , K.-L. , Yu , J. , Yang , M.-S.: A novel fuzzy clustering algorithm based on a fuzzy scatter matrix with optimality tests . Pattern Recognition Letters . 26 , 639 - 652 ( 2005 ). doi: 10 .1016/j.patrec. 2004 . 09 .016

18. Memarsadeghi , N. , Mount , D.M. , Netanyahu , N.S. , Moigne , J.L. : A Fast Implementation Of The Isodata Clustering Algorithm . International Journal of Computational Geometry & Applications . 17 , 71 - 103 ( 2007 ). doi: 10 .1142/S0218195907002252.

19. Hastie , T. , Friedman , J. , Tisbshirani , R. : The Elements of statistical learning: data mining, inference, and prediction . Springer, New York ( 2017 ).

20. Borah , B. , Bhattacharyya , D. : DDSC : A Density Differentiated Spatial Clustering Technique . Journal of Computers. 3 , ( 2008 ). doi: 10.4304/jcp.3.2 . 72 - 79

21. Liu , P. , Zhou , D. , Wu , N.: VDBSCAN: Varied Density Based Spatial Clustering of Applications with Noise . 2007 International Conference on Service Systems and Service Management . ( 2007 ). doi: 10 .1109/ICSSSM. 2007 .4280175

22. Aggarwal , C.C. , Reddy , C.K. : Data Clustering: Algorithms and Applications . Chapman and Hall/CRC, Boca Raton, FL ( 2018 ).

23. Gan , G. , Ma, C. , Wu , J. : Data clustering: theory, algorithms, and applications . Siam, Philadelphia ( 2007 ).