Model of Adaptive Language Synthesis Based On Cosine Conversion Furies with the Use of Continuous Fractions Lyubomyr Chyrun[0000-0002-9448-1751] Ivan Franko National University of Lviv, Lviv, Ukraine Lyubomyr.Chyrun@lnu.edu.ua Abstract. The proposed article describes an adaptive model for the synthesis of voice signals in a digital signal processor. The use of continuous fractions in a digital signal processor is suggested. The realization of continuous fractions with the help of multicellular structures is given. This procedure is used to im- plement the model of the human vocal tract. Keywords. Language Synthesis, Adaptive Synthesis, digital signal processor, Cosine Conversion Furies, Continuous Fractions, Speech synthesis system 1 Introduction Modern voice signals recognition systems integrate technologies from such fields of modern science as signal processing, pattern recognition, natural language, and lin- guistics. Such systems that are widely used in signal processing have created a real boom in digital signal processing (DSP). Previously, the field was dominated by vec- tor-oriented processors and algebraic mathematical apparatus, while the current gen- eration of DSP relies on sophisticated statistical models and uses complex software for practical implementation. Modern voice signals recognition models are able to understand the continuous input language for dictionaries, consisting of hundreds of thousands of words in operating environments. Linear predictive analysis of voice signals is historically the most important in voice analysis technologies. The basis of this is the filter source model, which is an ideal linear filter. 2 Analytical Review of Literary and Other Sources Linear predictive coding is most commonly used in speech analysis and synthesis, or in transmitting or storing speech signals. For this purpose, ideal cell structures are typically used to model the human vocal tract. For the first time, these structures with reflection coefficients were formulated by Markel, Gray [1], and Makhoul [2]. The model in the state space of a non-ideal cell structure with two and four factors per section for digital signal processors was analyzed in [3]. The general system of voice synthesis given in [4] is presented in Fig. 1. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). КV Oscillations Difference signal Frequency periodic amplitude Synthesized activation language model of the voice tract white noise obstacles Model coefficients КU Voiced / not voiced Fig. 1. Speech synthesis system In the general case, the problem of linear prediction is as follows [9-12]. Let us have a voice signal s n  , and let p ~ s n    sn  k  k 1 k be predicted magnitude. Inaccuracy of prediction in this case is given as follows: p e n   s n   ~ s n   s n    sn  k  . k k 1 We usually want to minimize the error to find the best, or optimal, values  k . De- termine the short-term average error: 2  p   p  2 2 E e n      s n     k s n  k    s n    2 s n   k s n  k    n n  k 1  n n  k 1  2 2  p  p  p      k s n  k     s 2 n   2    k s n s n  k     k s n  k   n  k 1  n k 1 n n  k 1  We can minimize the error  l for everyone 1  l  p by differentiating E and equat- ing the result to zero E  p   0  2 s n s n  l   2   k sn  k  s n  l     l n   n  k 1 In the case of the covariance method, we will start by slightly redefining the terms p p    s n s n  l      k  s n  k s n  l  or c l ,0    k c k , l    n k 1  n  k 1 This equation is also known as linear prediction equation (Yule-Volcker equation).  k  are called linear prediction coefficients, or predictor coefficients. When calculat- ing the equations for all values l , we can write them in a matrix form c  C where  1   c 1,1 c 1,2 ... c 1, p    c 1,0     c 2,1 c 2,2  2 ... c2, p   c 2,0    C  c     ... ... ... ...            p  c  p,1 c  p,2  ... c p, p  c  p,0 To solve this equation, you need to find the inverted matrix: c  C 1 This method is called the covariance method. Note that the covariance matrix is symmetric. The fastest way to find the solution to this equation is the Holetsky method (the covariance matrix is divided into lower and upper triangular matrices). Using a slightly different approach to minimize the error, we can find a solution to the linear prediction equation using the autocorrelation method   R 1 r where  1   r 0 r 1 ... r  p  1  r 1     r 1 2 r 0 ... r  p  2  r 2    R  r    ... ... ... ...          p   r  p  1 r  p  2 ... r 0   r  p  The matrix of the system is symmetrical and all diagonal elements are equal, which means that the inverted matrix always exists and the solutions of the system are in the left half plane. Autoregressive modeling using least squares prediction, or linear prediction, forms the basis of a wide range goals of signal processing and communication systems, that include adaptive filtering and control, modeling of speech and coding systems, adap- tive channel alignment, parametric spectrum estimation, and identification systems. To implement linear prediction of data or model goals, it is necessary to determine the values of linear prediction coefficients, as well as the order. Some commonly used practice model selection methods include the Akayke information criterion method, the Schwarz minimum description length method, and the Risenan prediction of least squares principle. In the original form, the first two criteria include a clear balance between the similarity of model input and the notion of fine for model complexity. Intuitively in the information criterion method, the primary purpose is to minimize the number of bits that will be required to describe the data [13-18]. When it is already possible to model the data parametrically and then encode the blocks, use the ap- proach of allocating blocks of similar data, and then the model is fined with the addi- tional number of bits required to encode its parameters [19-24]. However, the voice model based on cosine Fourier transform for language synthe- sis has better properties and use, less sensitivity to quantization effects and, as a re- sult, produces more natural synthesized language [25-29]. The parameters of this model are the coefficients of cosine Fourier transform. This model is based on the cosine decomposition of the logarithmic short-term voice range, and the synthesis is implemented by approximate inverse Fourier cosine transformations using continuous chain fractions [30-32]. This approach is parametric and is not based on any simplify- ing assumptions about the voice model, because the poles as well as the zeros of the voice model are justified [33-44]. 3 The Voice Model Based on a Cosine Fourier Transform Suppose that we have the logarithmic range ln S e jT   of the voice data segment sn , where Т – is the sampling interval, f s  1 - the sampling frequency, and  T - the angular frequency. This function can be expressed using the true Fourier cosine conversion coefficients Фур’є cn    ln S e jT  c e n  jn T . (1) n   The complex coefficients of the cosine Fourier transform of a discrete system with minimum phase stability are random and may be related to the following relations g n  cn , n  0, N F 2 , g n  2 cn , 0  n  N F 2 , (2) g n  0, n0 where N F - the dimension of the applied FFT. A digital filter whose logarithmic correspondence approximates a function ln S e jT   is determined by the transfer function system N 0 1 N 0 1 ~ S z   e c0 exp  2c z n 1 n n e c0  exp2c z  n 0 n n (3) c where 0  N 0  N F 2 . The coefficient e 0 is equal to the value of the RMS of the cosine Fourier transform model for the multiple signal. In our experiments on the voice model we used f s  8kHz , N F  512 , the voice segment length is 25 ms with 12 ms overlap and N 0  25 . ~ It follows from (3) that the system of transfer functions S z  is the product of tran- scendental transfer functions n H n z   e 2 cn z , 0  n  N 0  1 (4) The corresponding impulse feature is given  2cn i  , m  ni , i  0, 1, 2,  hn m    i! (5)  0, m  ni  ~ This means that the system of transfer functions S z  has the following form (Fig. 2) e c0 1 2 3 x(m) e 2 c1z e 2 c2 z e 2 c3 z 2c z  N 0 1 e N 0 1 y(m) Fig. 2. Voice model of cosine Fourier transform N 0 1 ~ S z   e c 0  H z  n (6) n0 To implement a transfer function H n z  using a digital filter, it is necessary to find an approximation H n z  , that can be practically implemented. One option for approxi- mating an exponential function in (4) is continuous chain fractions [4]. Another pos- sibility of implementing an exponential function approximation is to use a Pade ap- proximation. Then the system of transfer functions in a practical voice model based on a cosine Fourier transform will look like N 0 1 ~ ~ c0 ~ S z   e  H n z  . (7) n0 4 Decomposition Approximation Using Continuous Chain Fractions The exponential function expressed by a decomposition into a continuous fraction can be represented as the following decomposition [5], [6], [7]: 1 x x x x x ex  , (8) 1  1  2  3    2  2s  1 where the parameter is x  2cn z  n . The accuracy of the approximation of the voice model depends not only on the number of cosine Fourier transform coefficients in (3), but also on the number of members of a continuous fraction in (8), that is, on the length of a continuous fraction to be determined with s . A finite chain fraction for a function e x can also be expressed by a set of real functions that approximate an ex- ponential function with increasing accuracy 1 1 2 x 6  2x 12  6 x  x 2 ex  , , , , , (9) 1 1  x 2  x 6  4 x  x 12  6 x  x 2 2 These functions are known as Pade approximations of an exponential function. It is recommended to use an odd number of elements of a continuous fraction in (8). This leads to an approximation of an exponential function by a rational function with equal degrees of polynomials in the numerator and denominator in (9). These are the ap- proximations chosen ~ 2 x ~ 12  6 x  x 2 H 1 z   , H 2 z   , 2 x 12  6 x  x 2 ~ 120  60 x  12 x 2  x 3 H 3 z   , (10) 120  60 x  12 x 2  x 3 ~ 1680  840 x  180 x 2  20 x 3  x 4 H 4 z   1680  840 x  180 x 2  20 x 3  x 4 where z is the variable z-transformation and x  2cn z  n . In the general case, to achieve a better approximation, we can use decompositions of rational functions by taking more suitable fractions. cn  2 s 1 m     2 s m  n , 2s  1 c  2 s m    n  2 s 1 m  n    2 s 1 m , 2s  1  c  5 m    n  4 m  n    6 m , 3 cn  4 m     3 m  n    5 m , 3  3 m   c n 2 m  n    4 m ,  2 m   2c n 1 m  n    3 m , (11)  1 m   x m    2 m , y m    1 m . 5 Structure of adaptive synthesis As noted above, the approximation error for e x is determined by the number of ele- ments of a continuous fraction to decompose an exponential function into a continu- ous fraction. This error further depends on the magnitudes of the modules of the true Fourier cosine transform coefficients cn . On the basis of a statistical analysis of the Fourier cosine coefficients for the description of the voice model of a male loud- speaker, the following estimation was made in relation to the stability of the system and the well-defined safety limit for transfer functions H n z  in equation (5). From the above it follows that the functions H n z  can be approximated as follows: ~ n  1, 2, 3  H 3 z  ~ n  4, 5  H 2 z  ~ n  6, 7, , 25  H1 z  It is more effective in relation to the total error of approximation and in relation to saving the number of arithmetic operations required for the practical implementation of voice modeling to use the adaptive structure of a continuous fraction. The number of corresponding cells (Fig. 3) can be selected according to the magnitudes of the cosine Fourier transform coefficients. The following adaptive empirical rule can be used: ~ for cn  0.3 two cells - match H1 z  , ~ for cn  0.5 four cells - match H 2 z  , ~ for cn  1 six cells - match H 3 z  , ~ for cn  1 eight cells - match H 4 z  . For example, the voice model of the stationary part (24 ms) of the “е” vowel sound is used c0  0.491 - logarithm of the value of the difference signal c1  0.700 c 2  0.354 c3  0.026 c4  0.205 c5  0.159 c6  0.027 c7  0.310 c825  0.3 Using the empirical rule indicated, the voice model of the cosine Fourier transform is presented in Fig. 3 can be built: c0 c1 c2 c3 c4 p(n) ~ ~ ~ ~ e c0 H 3 z  H 2 z  H 1 z  H 1 z  ~ ~ ~ ~ H 1 z  H 1 z  H 2 z  … H 1 z  s(n) … c5 c6 c7 c8 … c25 Fig. 3. Voice model of cosine Fourier transform of the stationary part of the vowel “е”: p n  is activated signal; s n  is synthesized voice signal In the practical implementation of transcendental transfer functions H n z  the follow- ing numerical results were obtained: 1 Table 1. The value of the transcendental transfer function H 1 z   e z п H n z  - exact values H n z  - approximate values 1 1.00000000000000 0.99993896484375 2 1.00000000000000 0.99993896484375 3 0.50000000000000 0.49999648242188 4 0.16666666666667 0.16666549414062 5 0.08333333333333 0.06092749023438 6 0.00138888888889 0.00003051757812 п H n z  - exact values H n z  - approximate values 7 0.00019841269841 0.00030517578125 8 0.00002480158730 -0.0012207031250 9 0.00000271636432 -0.00003051757812 We will also present our numerical results in the following diagram (Fig. 4). 1 .20 Exact value Appr oxima te value 1 .00 0 .80 0 .60 0 .40 0 .20 0 .00 1 2 3 4 5 1 Fig. 4. The value of the transcendental transfer function H1  z   e z 6 Conclusions Voice modeling based on cosine Fourier transform is in fact related to spectral syn- thesis of voice signals, and is not based on any simplifying a priori considerations about the language reproduction system. It also contains information about the range of the activated voice path. The voice modeling procedure based on Fourier cosine transforms requires more arithmetic operations than approaches based on linear predictive coding, but the struc- ture of the digital filter can be optimized. Continuous fractions offer an interesting tool not only in language synthesis. A high-order approximation of algebraic transcendental functions can be used in bio- logical and industrial modeling systems. The direct implementation of continuous fractions further enables the implementation of multi-chamber structures. References 1. Markel, J.D., Gray, A.H.: linear Prediction of Speech. Berlin, Springer Verlag. (1976) 2. Makhoul, J.: Stable and Efficient Lattice Methods for Linear Prediction. In: IEEE Trans. Acoustics, Speech and signal Processing, ASSP-25(5), 423-428. (1977) 3. Vich, R., Smekal, Z.: Continued Fractions in Digital Filter Synthesis. In: Proc. of Inter. Scient. Colloquium, Ilmenau, Germany, 353-356. (1995) 4. Vich, R., Smekal, Z.: Digital Filter Realization of Nonrational Transfer Functions. In: Proc. of the First European Conference on Signal Analysis and Prediction, ECSAP-97, Prague, Czech Republic, 179-182. (1997) 5. Shmoylov, V.I.: Periodicheskiye tsepnyye drobi. Akademicheskiy Ekspress, Lviv, Ukraine. (1998) 6. Shmoylov, V.I., Sloboda M.Z.: Raskhodyashchiyesya nepreryvnyye drobi. Merkator, Lviv, Ukraine. (1999) 7. Shmoylov, V.I., Chyrun L.V. Kompleksnyye chisla i nepreryvnyye drobi. Merkator, Lviv, Ukraine. (2001) 8. Strum, R.D., Kirk, D.E.: First Principles of Discrete Systems and Digital Signal Process- ing. Massachusetts; Addison-Wesley Publishing Company, (1988). 9. Vavruk, Y.Y., Rashkevych, Y.M.: Osoblyvosti realizatsiyi prystroyiv obminu informatsi- yeyu v systemakh tsyfrovoyi obrobky syhnaliv. In: Modelyuvannya ta informatsiyni tekhnolohiyi, 4, 119-123. (1999) 10. Vavruk, Y.Y., Rashkevych, Y.M., Tsmots I.H.: Otsinka osnovnykh kharakterystyk protse- soriv upravlinnya ta obrobky informatsiyi na NVIS. In: Komp'yuterna inzheneriya ta in- formatsiyni tekhnolohiyi, 386, 5-11. (1999) 11. Rashkevych, Y.M.: Peretvorennya chasovoho masshtabu movnykh syhnaliv. Akad. Ek- spres, Lviv, Ukraine.(1997) 12. Vavruk, Y.Y., Rashkevych, Y.M., Tsmots I.H.: Pidkhody do pobudovy ta vyboru element- noyi bazy protsesoriv upravlinnya ta obrobky syhnaliv. In: Modelyuvannya ta informatsi- yni tekhnolohiyi, 3, 160-168. (1999) 13. Rzheuskyi, A., Gozhyj, A., Stefanchuk, A., Oborska, O., Chyrun, L., Lozynska, O., Mykich, K., Basyuk, T.: Development of Mobile Application for Choreographic Produc- tions Creation and Visualization. In: CEUR Workshop Proceedings, Vol-2386, 340-358. (2019) 14. Berko, A., Alieksieiev, V., Lytvyn, V.: Knowledge-based Big Data Cleanup Method. In: CEUR Workshop Proceedings, Vol-2386, 96-106. (2019) 15. Bisikalo, O., Ivanov, Y., Sholota, V.: Modeling the Phenomenological Concepts for Figu- rative Processing of Natural-Language Constructions. In: CEUR Workshop Proceedings, Vol-2362, 1-11. (2019) 16. Kravets, P.: Adaptive method of pursuit game problem solution. In: Modern Problems of Radio Engineering, Telecommunications and Computer Science Proceedings of Interna- tional Conference, TCSET, 62-65. (2006) 17. Kravets, P.: Game methods of construction of adaptive grid areas. In: The Experience of Designing and Application of CAD Systems in Microelectronics, Proceedings of the 7th International Conference, CADSM, 513-516. (2003) 18. Malachivskyy, P.S., Pizyur, Y.V., Andrunyk, V.A.: Chebyshev Approximation by the Sum of the Polynomial and Logarithmic Expression with Hermite Interpolation. In: Cybernetics and Systems Analysis 54(5), 765-770. (2018) 19. Pasichnyk, V., Shestakevych, T., Kunanets, N., Andrunyk, V.: Analysis of completeness, diversity and ergonomics of information online resources of diagnostic and correction fa- cilities in Ukraine. In: CEUR Workshop Proceedings, 2105, 193-208. (2018) 20. Babichev, S., Taif, M.A., Lytvynenko, V., Osypenko, V.: Criterial analysis of gene expres- sion sequences to create the objective clustering inductive technology. In: 37th Interna- tional Conference on Electronics and Nanotechnology, ELNANO, 244-248. (2017) 21. Babichev, S., Korobchynskyi, M., Lahodynskyi, O., Korchomnyi, O., Basanets, V., Boryn- skyi, V.: Development of a technique for the reconstruction and validation of gene network models based on gene expression profiles. In: Eastern-European Journal of Enterprise Technologies, 1 (4-91), 19-32. (2018) 22. Babichev, S., Lytvynenko, V., Osypenko, V.: Implementation of the objective clustering inductive technology based on DBSCAN clustering algorithm. In: Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT, 479-484. (2017) 23. Lytvyn, V., Vysotska, V., Peleshchak, I., Rishnyak, I., Peleshchak, R.: Time Dependence of the Output Signal Morphology for Nonlinear Oscillator Neuron Based on Van der Pol Model. In: International Journal of Intelligent Systems and Applications, 10, 8-17 (2018) 24. Gozhyj, A., Kalinina, I., Vysotska, V., Gozhyj, V.: The method of web-resources man- agement under conditions of uncertainty based on fuzzy logic, 2018 IEEE 13th Interna- tional Scientific and Technical Conference on Computer Sciences and Information Tech- nologies, CSIT 2018 – Proceedings 1, 343-346 (2018) 25. Gozhyj, A., Vysotska, V., Yevseyeva, I., Kalinina, I., Gozhyj, V.: Web Resources Man- agement Method Based on Intelligent Technologies, Advances in Intelligent Systems and Computing, 871, 206-221 (2019) 26. Su, J., Vysotska, V., Sachenko, A., Lytvyn, V., Burov, Y.: Information resources process- ing using linguistic analysis of textual content. In: Intelligent Data Acquisition and Ad- vanced Computing Systems Technology and Applications, Romania, 573-578, (2017) 27. Vysotska, V.: Linguistic Analysis of Textual Commercial Content for Information Re- sources Processing. In: Modern Problems of Radio Engineering, Telecommunications and Computer Science, TCSET’2016, 709–713 (2016) 28. Lytvyn, V., Sharonova, N., Hamon, T., Vysotska, V., Grabar, N., Kowalska-Styczen, A.: Computational linguistics and intelligent systems. In: CEUR Workshop Proceedings, Vol- 2136 (2018) 29. Vysotska, V., Chyrun, L.: Analysis features of information resources processing. In: Com- puter Science and Information Technologies, Proc. of the Int. Conf. CSIT, 124-128 (2015) 30. Vysotska, V., Chyrun, L., Chyrun, L.: Information Technology of Processing Information Resources in Electronic Content Commerce Systems. In: Computer Science and Informa- tion Technologies, CSIT’2016, 212-222 (2016) 31. Vysotska, V., Rishnyak, I., Chyrun L.: Analysis and evaluation of risks in electronic com- merce, CAD Systems in Microelectronics, 9th International Conference, 332-333 (2007). 32. Lytvyn, V., Pukach, P., Bobyk, І., Vysotska, V.: The method of formation of the status of personality understanding based on the content analysis. In: Eastern-European Journal of Enterprise Technologies, 5/2(83), 4-12 (2016) 33. Lytvyn, V., Vysotska, V., Mykhailyshyn, V., Rzheuskyi, A., Semianchuk, S.: System De- velopment for Video Stream Data Analyzing. In: In Advances in Intelligent Systems and Computing, 1020, 315-331. (2020) 34. Vysotska, V., Chyrun, L.: Methods of information resources processing in electronic con- tent commerce systems. In: Proceedings of 13th International Conference: The Experience of Designing and Application of CAD Systems in Microelectronics, CADSM 2015- February. (2015) 35. Andrunyk, V., Chyrun, L., Vysotska, V.: Electronic content commerce system develop- ment. In: Proceedings of 13th International Conference: The Experience of Designing and Application of CAD Systems in Microelectronics, CADSM 2015-February. (2015) 36. Alieksieieva, K., Berko, A., Vysotska, V.: Technology of commercial web-resource proc- essing. In: Proceedings of 13th International Conference: The Experience of Designing and Application of CAD Systems in Microelectronics, CADSM 2015-February. (2015) 37. Lytvyn, V., Peleshchak, I., Vysotska, V., Peleshchak, R.: Satellite spectral information recognition based on the synthesis of modified dynamic neural networks and holographic data processing techniques, 2018 IEEE 13th International Scientific and Technical Confer- ence on Computer Sciences and Information Technologies, CSIT 2018 – Proceedings 1, 330-334 (2018) 38. Emmerich, M., Lytvyn, V., Yevseyeva, I., Fernandes, V. B., Dosyn, D., Vysotska, V.: Preface: Modern Machine Learning Technologies and Data Science (MoMLeT&DS- 2019). In: CEUR Workshop Proceedings, Vol-2386. (2019) 39. Lytvyn, V., Vysotska, V., Mykhailyshyn, V., Peleshchak, I., Peleshchak, R., Kohut, I.: In- telligent system of a smart house. In: 3rd International Conference on Advanced Informa- tion and Communications Technologies, AICT, 282-287. (2019) 40. Kravets, P., Burov, Y., Lytvyn, V., Vysotska, V.: Gaming method of ontology clusteriza- tion. In: Webology, 16(1), 55-76. (2019) 41. Gozhyj, A., Kalinina, I., Gozhyj, V., Vysotska, V.: Web service interaction modeling with colored petri nets. In: Proceedings of the 2019 10th IEEE International Conference on In- telligent Data Acquisition and Advanced Computing Systems: Technology and Applica- tions, IDAACS 2019, 1,8924400, pp. 319-323 (2019) 42. Lytvyn, V., Peleshchak, I., Peleshchak, R., Vysotska, V.: Information Encryption Based on the Synthesis of a Neural Network and AES Algorithm. In: 3rd International Conference on Advanced Information and Communications Technologies, AICT, 447-450. (2019) 43. Shu, C., Dosyn, D., Lytvyn, V., Vysotska V., Sachenko, A., Jun, S.: Building of the Predi- cate Recognition System for the NLP Ontology Learning Module. In: Proceedings of the 2019 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IDAACS 2019, 2,8924410, pp. 802- 808 (2019) 44. Lytvyn, V., Vysotska, V., Shakhovska, N., Mykhailyshyn, V., Medykovskyy, M., Pelesh- chak, I., Fernandes, V. B., Peleshchak, R., Shcherbak, S.: A Smart Home System Devel- opment. In: Advances in Intelligent Systems and Computing IV, 1080, 804-830. (2020)