1. Introduction

Topological structure of Ukrainian tongue twisters based on speech sound analysis

Tetiana Kovaliuk

Iryna Yurchuk

i.a.yurchuk@gmail.com 0

Olga Gurnik

olga.gurnick@gmail.com 0 1 0 MoDaST-2024: 6th International Workshop on Modern Data Science Technologies , May, 31 - June, 1, 2024, Lviv- Shatsk , Ukraine 1 Separate Structural Unit “Vocational College of Engineering, Management and Land Management of National Aviation University” , Metrobudivska str. 5-a, Kyiv, UA- 03065 , Ukraine

Natural language processing occupies a central place at the current stage of the development of artificial intelligence and machine learning as its component. This is due not only to the fact that the ability to conduct a meaningful dialogue is one of the simplest quality of human intelligence, but also to the fact that there is currently an excessive amount of information in social networks, news searches, etc., which requires an automated approach to its processing with a specific goal (prevention terrorist activity, threats, spread of fakes, etc.). Models aimed at distinguishing meanings and seeing the content of texts, the ability to continue dialogues, understanding the topic of conversation are useful. In each of the languages, there are certain classes of texts (poems, idioms, colloquialisms) that are more complex than ordinary narrative sentences, and require native language processing algorithms to be more trained. In this work, the authors study tongue twisters to understand their sound composition and structural features. The authors accentuate special attention to a speech therapy orientation. So, the speech sounds were classified by labialization, volume, hardness, softness, place and method of creation. A topological analysis of their structure was implemented, in particular, the Betti numbers are calculated, and the obtained results are generalized.

eol>Ukrainian tongue twister persistent homology text vectorization 1

1. Introduction

For every language, tongue twisters as a speech genre are important. They are syntactically short, correct phrases spoken without context in any language with especially complicated articulation and combinations of sounds that have different phonemes and are difficult to pronounce. This is a way to develop the speech skills of children of preschool and primary school age both for the purpose of improvement and for the therapeutic purpose of eliminating defects. Public figures, actors, singers also use tongue twisters to improve their skills and build confidence in speeches, performances and recitations.

Tongue twisters are a relatively small part of the language in terms of the number of available texts. Because often they are devoid of content and are focused on the alternation of certain sounds, or rather on the difficulty of their reproduction by the speech apparatus (tongue, lips, etc.).

By I. Yurchuk and O. Gurnik, see [ 1 ], the detection of tongue twisters in the Ukrainian language using vectorization based on letters was implemented and it was obtained that the average percentage of detection is 80. The main drawback in this work was that the complexity of the sounds required by the speech apparatus was not taken into account, only the letters that were part of the colloquial text were to be coded.

This work is a continuation of the study of Ukrainian tongue twisters, with an emphasis on their use in speech therapy. For this purpose, a speech sound analysis of the patter was carried out, each speech sound was vectorized by mapping it into a seven-dimensional space, after which a cloud of points was assigned to each patter, which was investigated using topological data analysis. In particular, Betti numbers were calculated for each tongue twister, and based on the obtained values, an analysis was performed.

The purpose of this work is to study the features of tongue twisters in terms of topological invariants used by speech therapists who deal with both the elimination of speech defects and the general development of speech in language skills of people of any age (primary school children, public figures, elderly people who have overcome diseases affecting the brain).

The aim of research – to propose topological structures, the construction of which will be informative for understanding the nature of a tongue twister, to establish a set of data, the integration of which in a certain method of machine learning can understand it in future research.

To achieve this purpose, the major research objectives are:   

To conduct the formation of a dataset of tongue twisters used by speech therapists and their sound analysis.

In accordance with speech therapy requirements, to form criteria and signs for each sound and build a reflection in the real space of a certain dimension.

Conduct a topological analysis of each tongue twister and analyze the obtained.

It should be noted that this study is due to the lack of a dataset of such a size that would guarantee high accuracy due to using the machine learning methods directly.

2. Related works

Works related to the study of tongue twisters and their influence on speech and the application of topological data analysis to language processing are considered.

In [ 2 ], the authors improved a base for the implementation of prosodic strategies in speech intervention for speakers whose mean age is 54.5 years with spastic (mixed-spastic) dysarthria of varying etiology (cerebral palsy, multiple sclerosis, multiple system atrophy) by using tongue twisters.

Tongue twisters play an important role in determining not only speech defects, but also physiological ones, in particular tumors. By T. Bressmann, A. Foltz, J. Zimmermann, and J. C. Irish [ 3 ], there proposed outcome measures for affect speech production: the patients' speech acceptability, rate of errors, the time needed to produce the tongue twisters, pause duration between item repetitions and the tongue shape during the production. They helped to prove that the surgical resection of the tongue changed the error rate as affect speech production of speakers with a partial glossectomy. To reproduce a tongue twister, the speaker has to balance speed and accuracy, therefore the presence of a lingual tumor and the subsequent glossectomy requires a patient to allocate more resources to the phonological planning of the tongue twister because of the structural alteration of the tongue.

We have to remark that tongue twisters can be an effective instrument to research inner speech which plays a key role in a variety of different cognitive activities, including writing, personal thought, reasoning and memorization, see [ 4 ].

Amount what has been implemented so far in the areas of language processing by using topological data analysis, we highlight the following works: a providing distance measurement between poet literary styles [ 5 ], an investigation of interpretable topological features of any transformer-based LM with like-surface structure and structural properties [ 6 ], as an analog to bag-of-words, realization of persistence bag-of-words which is stable vectorized representation that enables the seamless integration with machine learning [ 7 ], text classification and visualization [ 8-10 ]. First paragraph in every section does not have a first-line indent. Use only styles embedded in the document.

3. Methods

It is known that there are several methods or, more precisely, paradigms for the machine's work on texts. Let's briefly review the main ones:  

Neural networks: Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) are classes of neural networks that have been developed specifically for processing sequential data such as text, audio, time series, etc. The basic idea of recurrent neural networks is that they have the ability to remember the previous state (information) and use it to process the next input in the sequence. LSTM has additional internal structures (gates), and GRU has mechanisms of forgetting and updating. The best choice between LSTM and GRU depends on the size of the data and the specifics of the task. LSTM can be useful when long-term memory is important, but it requires more resources to train. GRU is less complex and faster to learn, but may be less powerful in solving some problems. Word2vec uses a neural network model to learn word associations from a large text corpus. It can detect synonyms or suggest additional words for a partial sentence. Transformers: BERT (Bidirectional Encoder Representations from Transformers) is a deep learning model based on the Transformer architecture and used to solve        

We have to remark that all of the above approaches require large datasets, painstaking work on their cleaning and labeling. The main disadvantage of all is the fact that the larger the sample (dataset), the better the results. Moreover, the amount of training data is expressed in thousands of units. That is why the authors propose the approach described in this section.

In this section, vectorization of the words, dataset and main terms of persistent homology are considered.

3.1. Principles of speech sound coding

Every speech sound corresponds to a vector ⃗ = ( 1, 2, 3, 4, 5, 6, 7), where: Natural Language Processing (NLP) problems. BERT is one of the most effective models for context-based language understanding and has gained significant popularity since its launch. Tasks that can be solved with BERT include text classification, named entity recognition, question answering, and many other natural language processing tasks. BERT has an impressive ability to understand complex language constructions and semantics thanks to its ability to model context in both directions.

Unsupervised learning algorithm: GloVe makes mapping of words to a meaningful space in which the distance between words is related to semantic similarity. Training is performed on the aggregated global statistics of pairwise co-occurrence of corpus words, and the resulting representations demonstrate interesting linear substructures of the word vector space. 1 be the ordinal number of a speech sound in the text. 2 be the ordinal number of the word in the text which contains a speech sound . 3 equals to 1 for labialized vowel speech sounds and 2 for non-labialized vowel speech sounds. If a speech sound is a consonant, 3 is equal to a zero. 4 be a consonant sound by volume: sonorous, voiced and voiceless. If a speech sound is a vowel, 4 is equal to a zero. 5 be a consonant sound by a place of creation: labial, nasal, lingual and laryngeal. If a speech sound is a vowel, 5 is equal to a zero. 6 be a consonant sound by the method of creation: closed (breakthrough) - sounds are created at the moment of breakthrough by an air stream of closed speech organs (they are also called breakthrough, explosive, instantaneous, because the creation of such sounds is fast, it cannot be prolonged; they are not elongated ); fricative - sounds are made when a stream of exhaled air passes through the gap (whistling and hissing) of the speech organs (can be lengthened, drawn out); closed-through sounds combine moments of closure and breakthrough during their creation; affricates (closed-cleft or merged); trembling (or vibrating). If a speech sound is a vowel, 5 is equal to a zero. 7 be a consonant sound by hardness and softness: hard, soft, softened (palatalized) and semi-softened (semi-palatalized). If a speech sound is a vowel, 5 is equal to a zero. A speech sound y i l a m a r y

Let consider an example of mapping a tongue twister “Yila Maryna malynu” into 7, see Table 1. Every speech sound corresponds to unique point in 7. Moreover, all coordinates of a point are non-negative integers.

Let determine the importance of the first coordinate in such vectorization process. Some speech sounds can be the same in terms of labialization, volume, hardness, softness, place and method of creation, and be components of the same word. However, the sequence of their pronunciation will always be different. In the example in Table 1, such speech sounds are ”y” and “a”.

Since the mapping is carried out in a seven-dimensional space, any visualization is complicated by the human perception so it is necessary to reduce the dimensions.

In Fig. 1, there are two projections of points corresponding to a tongue twister “Yila Maryna malynu” into three-dimensional space up to the different coordinates.

3.2. A dataset

For research, the authors made a dataset that contains tongue twisters from open sources and they are used by speech therapists for the purpose of eliminating and preventing speech defects in children's speech skills. All of them have different quantities of speech sounds and are oriented on different types of speech problems. In Fig. 2 there is a histogram for the quantity of speech sounds in a tongue twister. sounds in a tongue twister

There are 100 tongue twisters in the dataset. It should be noted that tongue twisters that contain no more than 50 sounds make up the majority of the available dataset. Most likely, this phenomenon is due to the fact that long tongue twisters are rarely used for therapeutic purposes. The most widely used tongue twisters are such that contain from 30 to 40 speech sounds.

As can be seen from the histogram, this distribution is far from normal. Therefore, following general practice, it is necessary to remove atypical tongue twisters from this sample. However, based on the small amount of data in the dataset, the authors avoid this.

3.3. Persistent homologies

For construction and analyzing the structure of tongue twisters, the concepts of topological data analysis will be used. In particular, we will be interested in the concept of Betti numbers and their geometric interpretation, see[ 11, 12 ].

The zero Betti number ( 0=rank 0i,j) is the amount of connected components of the space. The first Betti number ( 1=rank 1i,j) is the amount of cycles in the space. The second Betti number ( 2=rank 1i,j) is the amount of 2-spheres in the space. For calculating this invariant we used the l-th persistent homology li,j, which is Im , for 0≤ i<j ≤ k+1, where , : li → , i<j, be a map. On other words, i,j= li/( lj ∩ li), where li is l-cycles of and lj is l-boundaries of (a set { } =1 of Vietoris-Rips complexes is the filtration for any finite set { 1, 2, …, }, where < , i<j.). There is a method of their calculation based on the matrices algebra, the persistence barcode and the persistence diagrams. The l-cycle is a 1chain with empty boundary. The group of 1-cycles is the kernel of the 1-th boundary homomorphism, 1=ker 1. The 1-boundary is a 1-chain that is the boundary of a 2-chain. The group of 1-boundaries is the image of the 2-nd boundary homomorphism, 1=Im 1+1. A 1-chain is a formal sum of 1-simplices in a simplicial complex K and its standard notation is c=∑ , where is p-simplex in K and is either 1 or 0. Similarly for groups of higher orders.

For next calculation, we used the GUDHI library, which is a generic open source C++ library with Python interface, for Topological Data Analysis (TDA) and Higher Dimensional Geometry Understanding, see [ 13 ].

In Table 2 main geometrical structures as a circle, a disk, a sphere, e. t. and their Betti numbers are presented.

Let consider a tongue twister “Yila Maryna malynu” from previous sections. By using the GUDHI library, the values of were obtained and Table 3 was formed. For any , its geometric structure consists of more than one connected component, each of which in turn is a two-dimensional disk.

4. Algorithm and experiments

For analyzing a topological structure of tongue twisters, the authors propose the following algorithm:

Step 1. Every speech sound of a twister is coded according to Sec. 3.1. If a twister consists of speech sounds { 1, 2, … , }, then it corresponds to a set = {( 11, 21, 31, 41, 51, 61, 71), … , ( 1 , 2 , 3 , 4 , 5 , 6 , 7 ) }. In other words, a tongue twister is considered as a cloud of the points in +7. We also normalize it by standard function and map a cloud of the points into 7, where 7 = [0; 1]7 be a seven-dimensional unit cube. Let denote it by ̃.

Step 2. To construct on ̃ the filtration by Vietoris-Rips complexes, where { 1, 2, …, } is a finite set, < , i<j, and computing 0=rank 0i,j, 1=rank 1i,j, 2=rank 2i,j, 3=rank 3i,j, 4=rank 4i,j and 5=rank 5i,j for every fixed , = ̅1̅̅,̅̅.

Step 3. For every twister of dataset, we apply the previous steps and obtain a set which consists of vectors with × 6 − coordinates.

In Fig. 4, there is a pipeline of an algorithm. A coding corresponds to Step 1 and a computation , = ̅0̅,̅5̅, – Step 2. We remark on the following:

 The output of Coding is a cloud of the points into 7, where 7 = [0; 1]7 be a sevendimensional unit cube.

 The output of Computation , = ̅0̅,̅5̅, is N numbers of ordered sets of × 6 non-negative integers.  At the output of each of the steps, a dataset of numbers is obtained. Potentially, even at the first step, a tongue twister is vectorized. However, the amount of data at the output of two different tongue twisters differs. Also, such vectorization cannot be used as an input for those machine learning methods focused on fixed dimensionality of data, such as neural networks, transformers, etc.  The output dataset can be the data provided to the machine-learning algorithm after preprocessing. Since the values of Betti numbers are undefined for some fixed values of , the authors recommend in these cases to set them by -1. It has no interpretation from a geometric point of view, so it will not affect the understanding of the structure. As a result, the dataset will consist 6 integer numbers.For each proceedings volume published with CEUR-WS, the titles of its papers should either all use the emphasizing capitalized style or the regular English (or native language) style. Check with the editors of your volume which style you should adopt.

5. Results and discussions

Let resume the following features of the structures that appear based on topological data analysis and characterize tongue twisters.

The number of connected components of the space ( 0) decreases with increasing value ( ) - this effect is general for stable homologies. A mean value of 0 for = 0.55 is equal to 4, see Fig. 5.

The mean value of 1 is equal to 0 for = 0, … , 0.45, further it equals 1 for = 0.55, … , 0.8 and zero again.

For all tongue twisters the values , = 3,4,5 such that 3 = 4 = 5 = 0. It has the following effects on their topological structure: there are no geometrical structures with nesting of two or more dimensional spheres in the existing tongue twisters dataset. Let calculate the mean values of 0 and 1 at fixed values and construct a histogram, see Fig. 7. For the considered dataset, there are geometric structures that are a disconnected join of a finite number of two-dimensional disks and one-dimensional circles.

In the case of one-connected structures, some structures are tori. The authors applied PCA (Principal Component Analysis), during which it was obtained that the most informative for this dataset are sets of Betti numbers for = 0.2, 0.3, 0.45, 0.55.

6. Conclusions

The vectorization of tongue twisters was obtained, which is based on the complexity of sounds in pronunciation (speech therapy component), which takes into account aspects of sonority, the place of sound and the method of its creation. As a result, a seven-dimensional vector is corresponding to each sound of a tongue twister.

Based on the formed cloud of points, the Betti numbers were calculated for each fixed value of the parameter of formation of the simplicial complex. As a result, each of the tongue twisters contains a clearly defined cyclic structure of space, which is close to a circle.

In the future, this result will provide an opportunity to improve the percentage of tongue twister recognition among ordinary sentences, as well as the opportunity to form an artificial dataset to use neural network approaches.

[1]

Yurchuk ,

Gurnik , Tongue twisters detection in Ukrainian by using TDA , CEUR Workshop Proceedings 3396 ( 2023 ), pp. 163 - 172 .

[2]

Kember ,

K. P.

Connaghan ,

Patel , Inducing speech errors in dysarthria using tongue twisters , Int. J. of Language & Communication Disorders 52 ( 4 ) ( 2017 ) 469 - 478 . doi: 10 .1111/ 1460 - 6984 . 12285 .

[3]

Bressmann ,

Foltz ,

Zimmermann ,

J. C.

Irish , Production of tongue twisters by speakers with partial glossectomy , J. Clinical Linguistics & Phonetics 28 ( 12 ) ( 2014 ) 951 - 964 . doi: 10 .3109/02699206. 2014 . 938833 .

[4]

Corley ,

P. H.

Brocklehurst ,

H. S.

Moat , Error Biases in Inner and Overt Speech: Evidence from Tonguetwisters , Journal of Experimental Psychology: Learning, Memory, and Cognition , 37 ( 1 ) ( 2011 ) 162 - 175 . doi: 10 .1037/a0021321.

[5]

Paluzo-Hidalgo ,

Gonzalez-Diaz ,

M. A.

Gutierrez-Naranjo , Towards a philological metric through a topological data analysis approach , ArXiv , 2019 , abs/ 1912 .09253.

[6]

Kushnareva ,

Cherniavskii ,

Mikhailov ,

Artemova ,

Barannikov ,

Bernstein ,

Piontkovskaya ,

Piontkovski , E. Burnaev, Artificial text detection via examining the topology of attention maps , in: Proceedings of Conference on Empirical Methods in Natural Language Processing , Punta Cana, Dominican Republic, 2021 , pp. 635 - 649 . doi: 10 .18653/v1/ 2021 .emnlp-main. 50

[7]

Zielinski ,

Lipinski ,

Juda ,

Zeppelzauer ,

Dłotko , Persistence bag-of-words for topological data analysis , in: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, Macao , 2019 , pp. 4489 - 4495 . doi: 10 .24963/ijcai. 2019 /624.

[8]

Elyasi ,

M. H.

Moghadam , An introduction to a new text classification and visualization for natural language processing using topological data analysis, 2019, arXiv . URL: https://arxiv.org/abs/ 1906 .01726.

[9]

Yue , Topological data analysis of two cases: text classification and business customer relationship management , J. of Physics , 1550 ( 2020 ). doi: 10 .1088/ 1742 - 6596/1550/3/032081.

[10] Sh . Gholizadeh , K.

Savle , A.

Seyeditabari , W. Zadrozny, Topological data analysis in text classification: extracting features with additive information , arXiv, 2020 . URL: https:// https://arxiv.org/abs/ 2003 .13138.

[11] G. Carlsson, Topology and data , Bull.Amer.Math.Soc , 46 ( 2 ) ( 2009 ): 255 - 308 .

[12] I. Yurchuk , Digital image segmentation based on the persistent homologies , in: Proceedings of the 1st International Workshop on Information-Communication Technologies and Embedded Systems , ICTES, Mykolaiv, Ukraine, 2019 , pp. 226 - 232 .

[13] The GUDHI library . URL:https://gudhi.inria.fr/