<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Topological structure of Ukrainian tongue twisters based on speech sound analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tetiana Kovaliuk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iryna Yurchuk</string-name>
          <email>i.a.yurchuk@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Gurnik</string-name>
          <email>olga.gurnick@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>MoDaST-2024: 6th International Workshop on Modern Data Science Technologies</institution>
          ,
          <addr-line>May, 31 - June, 1, 2024, Lviv- Shatsk</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Separate Structural Unit “Vocational College of Engineering, Management and Land Management of National Aviation University”</institution>
          ,
          <addr-line>Metrobudivska str. 5-a, Kyiv, UA- 03065</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Natural language processing occupies a central place at the current stage of the development of artificial intelligence and machine learning as its component. This is due not only to the fact that the ability to conduct a meaningful dialogue is one of the simplest quality of human intelligence, but also to the fact that there is currently an excessive amount of information in social networks, news searches, etc., which requires an automated approach to its processing with a specific goal (prevention terrorist activity, threats, spread of fakes, etc.). Models aimed at distinguishing meanings and seeing the content of texts, the ability to continue dialogues, understanding the topic of conversation are useful. In each of the languages, there are certain classes of texts (poems, idioms, colloquialisms) that are more complex than ordinary narrative sentences, and require native language processing algorithms to be more trained. In this work, the authors study tongue twisters to understand their sound composition and structural features. The authors accentuate special attention to a speech therapy orientation. So, the speech sounds were classified by labialization, volume, hardness, softness, place and method of creation. A topological analysis of their structure was implemented, in particular, the Betti numbers are calculated, and the obtained results are generalized.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ukrainian tongue twister</kwd>
        <kwd>persistent homology</kwd>
        <kwd>text vectorization 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>For every language, tongue twisters as a speech genre are important. They are syntactically
short, correct phrases spoken without context in any language with especially complicated
articulation and combinations of sounds that have different phonemes and are difficult to
pronounce. This is a way to develop the speech skills of children of preschool and primary
school age both for the purpose of improvement and for the therapeutic purpose of
eliminating defects. Public figures, actors, singers also use tongue twisters to improve their
skills and build confidence in speeches, performances and recitations.</p>
      <p>Tongue twisters are a relatively small part of the language in terms of the number of
available texts. Because often they are devoid of content and are focused on the alternation
of certain sounds, or rather on the difficulty of their reproduction by the speech apparatus
(tongue, lips, etc.).</p>
      <p>
        By I. Yurchuk and O. Gurnik, see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the detection of tongue twisters in the Ukrainian
language using vectorization based on letters was implemented and it was obtained that the
average percentage of detection is 80. The main drawback in this work was that the
complexity of the sounds required by the speech apparatus was not taken into account, only
the letters that were part of the colloquial text were to be coded.
      </p>
      <p>This work is a continuation of the study of Ukrainian tongue twisters, with an emphasis
on their use in speech therapy. For this purpose, a speech sound analysis of the patter was
carried out, each speech sound was vectorized by mapping it into a seven-dimensional
space, after which a cloud of points was assigned to each patter, which was investigated
using topological data analysis. In particular, Betti numbers were calculated for each tongue
twister, and based on the obtained values, an analysis was performed.</p>
      <p>The purpose of this work is to study the features of tongue twisters in terms of
topological invariants used by speech therapists who deal with both the elimination of
speech defects and the general development of speech in language skills of people of any
age (primary school children, public figures, elderly people who have overcome diseases
affecting the brain).</p>
      <p>The aim of research – to propose topological structures, the construction of which will
be informative for understanding the nature of a tongue twister, to establish a set of data,
the integration of which in a certain method of machine learning can understand it in future
research.</p>
      <p>To achieve this purpose, the major research objectives are:


</p>
      <p>To conduct the formation of a dataset of tongue twisters used by speech therapists
and their sound analysis.</p>
      <p>In accordance with speech therapy requirements, to form criteria and signs for each
sound and build a reflection in the real space of a certain dimension.</p>
      <p>Conduct a topological analysis of each tongue twister and analyze the obtained.</p>
      <p>It should be noted that this study is due to the lack of a dataset of such a size that would
guarantee high accuracy due to using the machine learning methods directly.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>Works related to the study of tongue twisters and their influence on speech and the
application of topological data analysis to language processing are considered.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the authors improved a base for the implementation of prosodic strategies in
speech intervention for speakers whose mean age is 54.5 years with spastic (mixed-spastic)
dysarthria of varying etiology (cerebral palsy, multiple sclerosis, multiple system atrophy)
by using tongue twisters.
      </p>
      <p>
        Tongue twisters play an important role in determining not only speech defects, but also
physiological ones, in particular tumors. By T. Bressmann, A. Foltz, J. Zimmermann, and J. C.
Irish [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], there proposed outcome measures for affect speech production: the patients'
speech acceptability, rate of errors, the time needed to produce the tongue twisters, pause
duration between item repetitions and the tongue shape during the production. They
helped to prove that the surgical resection of the tongue changed the error rate as affect
speech production of speakers with a partial glossectomy. To reproduce a tongue twister,
the speaker has to balance speed and accuracy, therefore the presence of a lingual tumor
and the subsequent glossectomy requires a patient to allocate more resources to the
phonological planning of the tongue twister because of the structural alteration of the
tongue.
      </p>
      <p>
        We have to remark that tongue twisters can be an effective instrument to research inner
speech which plays a key role in a variety of different cognitive activities, including writing,
personal thought, reasoning and memorization, see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Amount what has been implemented so far in the areas of language processing by using
topological data analysis, we highlight the following works: a providing distance
measurement between poet literary styles [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], an investigation of interpretable topological
features of any transformer-based LM with like-surface structure and structural properties
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], as an analog to bag-of-words, realization of persistence bag-of-words which is stable
vectorized representation that enables the seamless integration with machine learning [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
text classification and visualization [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8-10</xref>
        ]. First paragraph in every section does not have a
first-line indent. Use only styles embedded in the document.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>It is known that there are several methods or, more precisely, paradigms for the machine's
work on texts. Let's briefly review the main ones:

</p>
      <p>Neural networks: Recurrent Neural Networks (RNNs), Long Short-Term Memory
(LSTM) and Gated Recurrent Unit (GRU) are classes of neural networks that have
been developed specifically for processing sequential data such as text, audio, time
series, etc. The basic idea of recurrent neural networks is that they have the ability
to remember the previous state (information) and use it to process the next input in
the sequence. LSTM has additional internal structures (gates), and GRU has
mechanisms of forgetting and updating. The best choice between LSTM and GRU
depends on the size of the data and the specifics of the task. LSTM can be useful when
long-term memory is important, but it requires more resources to train. GRU is less
complex and faster to learn, but may be less powerful in solving some problems.
Word2vec uses a neural network model to learn word associations from a large text
corpus. It can detect synonyms or suggest additional words for a partial sentence.
Transformers: BERT (Bidirectional Encoder Representations from Transformers) is
a deep learning model based on the Transformer architecture and used to solve







</p>
      <p>We have to remark that all of the above approaches require large datasets, painstaking
work on their cleaning and labeling. The main disadvantage of all is the fact that the larger
the sample (dataset), the better the results. Moreover, the amount of training data is
expressed in thousands of units. That is why the authors propose the approach described
in this section.</p>
      <p>In this section, vectorization of the words, dataset and main terms of persistent
homology are considered.</p>
      <sec id="sec-3-1">
        <title>3.1. Principles of speech sound coding</title>
        <p>Every speech sound  corresponds to a vector  ⃗ = ( 1,  2,  3,  4,  5,  6,  7), where:
Natural Language Processing (NLP) problems. BERT is one of the most effective
models for context-based language understanding and has gained significant
popularity since its launch. Tasks that can be solved with BERT include text
classification, named entity recognition, question answering, and many other
natural language processing tasks. BERT has an impressive ability to understand
complex language constructions and semantics thanks to its ability to model context
in both directions.</p>
        <p>Unsupervised learning algorithm: GloVe makes mapping of words to a meaningful
space in which the distance between words is related to semantic similarity.
Training is performed on the aggregated global statistics of pairwise co-occurrence
of corpus words, and the resulting representations demonstrate interesting linear
substructures of the word vector space.
 1 be the ordinal number of a speech sound in the text.
 2 be the ordinal number of the word in the text which contains a speech sound  .
 3 equals to 1 for labialized vowel speech sounds and 2 for non-labialized vowel speech
sounds. If a speech sound is a consonant,  3 is equal to a zero.
 4 be a consonant sound by volume: sonorous, voiced and voiceless. If a speech sound is
a vowel,  4 is equal to a zero.
 5 be a consonant sound by a place of creation: labial, nasal, lingual and laryngeal. If a
speech sound is a vowel,  5 is equal to a zero.
 6 be a consonant sound by the method of creation: closed (breakthrough) - sounds are
created at the moment of breakthrough by an air stream of closed speech organs (they are
also called breakthrough, explosive, instantaneous, because the creation of such sounds is
fast, it cannot be prolonged; they are not elongated ); fricative - sounds are made when a
stream of exhaled air passes through the gap (whistling and hissing) of the speech organs
(can be lengthened, drawn out); closed-through sounds combine moments of closure and
breakthrough during their creation; affricates (closed-cleft or merged); trembling (or
vibrating). If a speech sound is a vowel,  5 is equal to a zero.
 7 be a consonant sound by hardness and softness: hard, soft, softened (palatalized) and
semi-softened (semi-palatalized). If a speech sound is a vowel,  5 is equal to a zero.
A
speech
sound
y
i
l
a
m
a
r
y</p>
        <p>Let consider an example of mapping a tongue twister “Yila Maryna malynu” into  7, see
Table 1. Every speech sound corresponds to unique point in  7. Moreover, all coordinates
of a point are non-negative integers.</p>
        <p>Let determine the importance of the first coordinate in such vectorization process. Some
speech sounds can be the same in terms of labialization, volume, hardness, softness, place
and method of creation, and be components of the same word. However, the sequence of
their pronunciation will always be different. In the example in Table 1, such speech sounds
are ”y” and “a”.</p>
        <p>Since the mapping is carried out in a seven-dimensional space, any visualization is
complicated by the human perception so it is necessary to reduce the dimensions.</p>
        <p>In Fig. 1, there are two projections of points corresponding to a tongue twister “Yila
Maryna malynu” into three-dimensional space up to the different coordinates.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. A dataset</title>
        <p>For research, the authors made a dataset that contains tongue twisters from open sources
and they are used by speech therapists for the purpose of eliminating and preventing speech
defects in children's speech skills. All of them have different quantities of speech sounds and
are oriented on different types of speech problems. In Fig. 2 there is a histogram for the
quantity of speech sounds in a tongue twister.
sounds in a tongue twister</p>
        <p>There are 100 tongue twisters in the dataset. It should be noted that tongue twisters that
contain no more than 50 sounds make up the majority of the available dataset. Most likely,
this phenomenon is due to the fact that long tongue twisters are rarely used for therapeutic
purposes. The most widely used tongue twisters are such that contain from 30 to 40 speech
sounds.</p>
        <p>As can be seen from the histogram, this distribution is far from normal. Therefore,
following general practice, it is necessary to remove atypical tongue twisters from this
sample. However, based on the small amount of data in the dataset, the authors avoid this.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Persistent homologies</title>
        <p>
          For construction and analyzing the structure of tongue twisters, the concepts of topological
data analysis will be used. In particular, we will be interested in the concept of Betti
numbers and their geometric interpretation, see[
          <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
          ].
        </p>
        <p>The zero Betti number ( 0=rank  0i,j) is the amount of connected components of the
space. The first Betti number ( 1=rank  1i,j) is the amount of cycles in the space. The second
Betti number ( 2=rank  1i,j) is the amount of 2-spheres in the space. For calculating this
invariant we used the l-th persistent homology  li,j, which is Im  
 , for 0≤ i&lt;j ≤ k+1, where


 , :  li →    , i&lt;j, be a map. On other words,  
i,j= li/( lj ∩  li), where  li is l-cycles of    and
 lj is l-boundaries of    (a set {   } =1 of Vietoris-Rips complexes is the filtration for any

finite set { 1,  2, …,   }, where   &lt;  , i&lt;j.). There is a method of their calculation based on the
matrices algebra, the persistence barcode and the persistence diagrams. The l-cycle is a
1chain with empty boundary. The group of 1-cycles is the kernel of the 1-th boundary
homomorphism,  1=ker  1. The 1-boundary is a 1-chain that is the boundary of a 2-chain.
The group of 1-boundaries is the image of the 2-nd boundary homomorphism,  1=Im  1+1.
A 1-chain is a formal sum of 1-simplices in a simplicial complex K and its standard notation
is c=∑     , where   is p-simplex in K and   is either 1 or 0. Similarly for groups of higher
orders.</p>
        <p>
          For next calculation, we used the GUDHI library, which is a generic open source C++
library with Python interface, for Topological Data Analysis (TDA) and Higher Dimensional
Geometry Understanding, see [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>In Table 2 main geometrical structures as a circle, a disk, a sphere, e. t. and their Betti
numbers are presented.</p>
        <p>Let consider a tongue twister “Yila Maryna malynu” from previous sections. By using the
GUDHI library, the values of   were obtained and Table 3 was formed. For any   , its
geometric structure consists of more than one connected component, each of which in turn
is a two-dimensional disk.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Algorithm and experiments</title>
      <p>For analyzing a topological structure of tongue twisters, the authors propose the following
algorithm:</p>
      <p>Step 1. Every speech sound of a twister is coded according to Sec. 3.1. If a twister consists
of  speech sounds { 1,  2, … ,   }, then it corresponds to a set  =
{( 11,  21,  31,  41,  51,  61,  71), … , ( 1 ,  2 ,  3 ,  4 ,  5 ,  6 ,  7 ) }. In other words, a tongue twister is
considered as a cloud of the points in  +7. We also normalize it by standard function and map
a cloud of the points into  7, where  7 = [0; 1]7 be a seven-dimensional unit cube. Let denote
it by  ̃.</p>
      <p>Step 2. To construct on  ̃ the filtration    by Vietoris-Rips complexes, where { 1,  2, …,
  } is a finite set,   &lt;  , i&lt;j, and computing  0=rank  0i,j,  1=rank  1i,j,  2=rank  2i,j,  3=rank
 3i,j,  4=rank  4i,j and  5=rank  5i,j for every fixed   ,  = ̅1̅̅,̅̅.</p>
      <p>Step 3. For every twister of dataset, we apply the previous steps and obtain a set  which
consists of vectors with  × 6 − coordinates.</p>
      <p>In Fig. 4, there is a pipeline of an algorithm. A coding corresponds to Step 1 and a
computation   ,  = ̅0̅,̅5̅, – Step 2. We remark on the following:</p>
      <p> The output of Coding is a cloud of the points into  7, where  7 = [0; 1]7 be a
sevendimensional unit cube.</p>
      <p> The output of Computation   ,  = ̅0̅,̅5̅, is N numbers of ordered sets of  × 6 non-negative
integers.
 At the output of each of the steps, a dataset of numbers is obtained. Potentially, even at the
first step, a tongue twister is vectorized. However, the amount of data at the output of two
different tongue twisters differs. Also, such vectorization cannot be used as an input for those
machine learning methods focused on fixed dimensionality of data, such as neural networks,
transformers, etc.
 The output dataset can be the data provided to the machine-learning algorithm after
preprocessing. Since the values of Betti numbers are undefined for some fixed values of   , the
authors recommend in these cases to set them by -1. It has no interpretation from a geometric
point of view, so it will not affect the understanding of the structure. As a result, the dataset
will consist 6 integer numbers.For each proceedings volume published with CEUR-WS,
the titles of its papers should either all use the emphasizing capitalized style or the
regular English (or native language) style. Check with the editors of your volume which
style you should adopt.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results and discussions</title>
      <p>Let resume the following features of the structures that appear based on topological data
analysis and characterize tongue twisters.</p>
      <p>The number of connected components of the space ( 0) decreases with increasing value
( ) - this effect is general for stable homologies. A mean value of  0 for  = 0.55 is equal to
4, see Fig. 5.</p>
      <p>The mean value of  1 is equal to 0 for  = 0, … , 0.45, further it equals 1 for  =
0.55, … , 0.8 and zero again.</p>
      <p>For all tongue twisters the values   ,  = 3,4,5 such that  3 =  4 =  5 = 0. It has the
following effects on their topological structure: there are no geometrical structures with
nesting of two or more dimensional spheres in the existing tongue twisters dataset.
Let calculate the mean values of  0 and  1 at fixed values   and construct a histogram, see
Fig. 7. For the considered dataset, there are geometric structures that are a disconnected
join of a finite number of two-dimensional disks and one-dimensional circles.</p>
      <p>In the case of one-connected structures, some structures are tori. The authors applied
PCA (Principal Component Analysis), during which it was obtained that the most
informative for this dataset are sets of Betti numbers for  = 0.2, 0.3, 0.45, 0.55.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>The vectorization of tongue twisters was obtained, which is based on the complexity of
sounds in pronunciation (speech therapy component), which takes into account aspects of
sonority, the place of sound and the method of its creation. As a result, a seven-dimensional
vector is corresponding to each sound of a tongue twister.</p>
      <p>Based on the formed cloud of points, the Betti numbers were calculated for each fixed
value of the parameter of formation of the simplicial complex. As a result, each of the tongue
twisters contains a clearly defined cyclic structure of space, which is close to a circle.</p>
      <p>In the future, this result will provide an opportunity to improve the percentage of tongue
twister recognition among ordinary sentences, as well as the opportunity to form an
artificial dataset to use neural network approaches.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>I.</given-names>
            <surname>Yurchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Gurnik</surname>
          </string-name>
          ,
          <article-title>Tongue twisters detection in Ukrainian by using TDA</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          <volume>3396</volume>
          (
          <year>2023</year>
          ), pp.
          <fpage>163</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kember</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Connaghan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <article-title>Inducing speech errors in dysarthria using tongue twisters</article-title>
          ,
          <source>Int. J. of Language &amp; Communication Disorders</source>
          <volume>52</volume>
          (
          <issue>4</issue>
          ) (
          <year>2017</year>
          )
          <fpage>469</fpage>
          -
          <lpage>478</lpage>
          . doi:
          <volume>10</volume>
          .1111/
          <fpage>1460</fpage>
          -
          <lpage>6984</lpage>
          .
          <fpage>12285</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bressmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Foltz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Irish</surname>
          </string-name>
          ,
          <article-title>Production of tongue twisters by speakers with partial glossectomy</article-title>
          ,
          <source>J. Clinical Linguistics &amp; Phonetics</source>
          <volume>28</volume>
          (
          <issue>12</issue>
          ) (
          <year>2014</year>
          )
          <fpage>951</fpage>
          -
          <lpage>964</lpage>
          . doi:
          <volume>10</volume>
          .3109/02699206.
          <year>2014</year>
          .
          <volume>938833</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Corley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Brocklehurst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Moat</surname>
          </string-name>
          ,
          <article-title>Error Biases in Inner and Overt Speech: Evidence from Tonguetwisters</article-title>
          ,
          <source>Journal of Experimental Psychology: Learning, Memory, and Cognition</source>
          ,
          <volume>37</volume>
          (
          <issue>1</issue>
          ) (
          <year>2011</year>
          )
          <fpage>162</fpage>
          -
          <lpage>175</lpage>
          . doi:
          <volume>10</volume>
          .1037/a0021321.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Paluzo-Hidalgo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gonzalez-Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Gutierrez-Naranjo</surname>
          </string-name>
          ,
          <article-title>Towards a philological metric through a topological data analysis approach</article-title>
          ,
          <source>ArXiv</source>
          ,
          <year>2019</year>
          , abs/
          <year>1912</year>
          .09253.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kushnareva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cherniavskii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mikhailov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Artemova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Barannikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Piontkovskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Piontkovski</surname>
          </string-name>
          , E. Burnaev,
          <article-title>Artificial text detection via examining the topology of attention maps</article-title>
          ,
          <source>in: Proceedings of Conference on Empirical Methods in Natural Language Processing</source>
          , Punta Cana, Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>635</fpage>
          -
          <lpage>649</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .emnlp-main.
          <fpage>50</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zielinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lipinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Juda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zeppelzauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dłotko</surname>
          </string-name>
          ,
          <article-title>Persistence bag-of-words for topological data analysis</article-title>
          ,
          <source>in: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, Macao</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4489</fpage>
          -
          <lpage>4495</lpage>
          . doi:
          <volume>10</volume>
          .24963/ijcai.
          <year>2019</year>
          /624.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Elyasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Moghadam</surname>
          </string-name>
          ,
          <article-title>An introduction to a new text classification and visualization for natural language processing using topological data analysis, 2019, arXiv</article-title>
          . URL: https://arxiv.org/abs/
          <year>1906</year>
          .01726.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <article-title>Topological data analysis of two cases: text classification and business customer relationship management</article-title>
          ,
          <source>J. of Physics</source>
          ,
          <volume>1550</volume>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1088/
          <fpage>1742</fpage>
          - 6596/1550/3/032081.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Sh</surname>
          </string-name>
          .
          <string-name>
            <surname>Gholizadeh</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Savle</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Seyeditabari</surname>
          </string-name>
          , W. Zadrozny,
          <article-title>Topological data analysis in text classification: extracting features with additive information</article-title>
          , arXiv,
          <year>2020</year>
          . URL: https:// https://arxiv.org/abs/
          <year>2003</year>
          .13138.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>G. Carlsson,</surname>
          </string-name>
          <article-title>Topology and data</article-title>
          ,
          <source>Bull.Amer.Math.Soc</source>
          ,
          <volume>46</volume>
          (
          <issue>2</issue>
          ) (
          <year>2009</year>
          ):
          <fpage>255</fpage>
          -
          <lpage>308</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>I. Yurchuk</surname>
          </string-name>
          ,
          <article-title>Digital image segmentation based on the persistent homologies</article-title>
          ,
          <source>in: Proceedings of the 1st International Workshop on Information-Communication Technologies and Embedded Systems</source>
          , ICTES, Mykolaiv, Ukraine,
          <year>2019</year>
          , pp.
          <fpage>226</fpage>
          -
          <lpage>232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <article-title>The GUDHI library</article-title>
          . URL:https://gudhi.inria.fr/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>