<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>X (O. Lavrynenko);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Method for Biometric Coding of Speech Signals based on Adaptive Empirical Wavelet Transform⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksandr Lavrynenko</string-name>
          <email>oleksandrlavrynenko@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maksym Zaliskyi</string-name>
          <email>maksym.zaliskyi@npp.nau.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Denys Bakhtiiarov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anatolii Taranenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yevhen Gabrousenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Aviation University</institution>
          ,
          <addr-line>1 Lubomyr Huzar ave., 03058 Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>In this research, a biometric speech coding method is developed where empirical wavelet transform is used to extract biometric features of speech signals for voice identification of the speaker. This method differs from existing methods because it uses a set of adaptive bandpass Meyer wavelet filters and Hilbert spectral analysis to determine the instantaneous amplitudes and frequencies of internal empirical modes. This makes it possible to use multiscale wavelet analysis for biometric coding of speech signals based on an adaptive empirical wavelet transform, which increases the efficiency of spectral analysis by separating high-frequency speech oscillations into their low-frequency components, namely internal empirical modes. Also, a biometric method for encoding speech signals based on mel-frequency cepstral coefficients has been improved, which uses the basic principles of adaptive spectral analysis using an empirical wavelet transform, which also significantly improves the separation of the Fourier spectrum into adaptive bands of the corresponding formant frequencies of the speech signal.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;speech signal</kwd>
        <kwd>biometric coding</kwd>
        <kwd>speaker identification</kwd>
        <kwd>information protection</kwd>
        <kwd>voice authentication</kwd>
        <kwd>wavelet transform</kwd>
        <kwd>bandpass wavelet filters</kwd>
        <kwd>mel-frequency cepstral coefficients</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The development of new methods and means of ensuring information security is intended
primarily to prevent threats of access to information resources by unauthorized persons. To solve
this problem, it is necessary to have identifiers and create identification procedures for all users.
Modern identification and authentication include various systems and methods of biometric
identification [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
      </p>
      <p>
        One of the most common biometric characteristics of a person is his or her voice, which has a
set of individual characteristics that are relatively easy to measure (e.g., the frequency spectrum of
the voice signal). The advantages of voice identification also include ease of application and use,
and the fairly low cost of devices used for identification (e.g., microphones) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Voice identification capabilities cover a very wide range of tasks, which distinguishes them
from other biometric systems. First of all, voice identification has been widely used for a long time
in various systems for differentiating access to physical objects and information resources. Its new
application in systems based on telecommunication channels seems promising. For example, in
mobile communications, voice can be used to manage services, and the introduction of voice
identification helps protect against fraud [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Voice identification also plays an important role in solving such an important task as protecting
speech information. This identification is used to create new technical means and software and
hardware devices for protecting speech information, in particular, from leakage through acoustic,
vibroacoustic, and other channels [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Voice identification is of particular importance in the investigation of crimes, in particular in the
field of computer information, and in the formation of the evidence base for such an investigation.
In these cases, it is often necessary to identify an unknown voice recording. Voice identification is
an important practical task when searching for a suspect based on a voice recording in
telecommunication channels. Determining such characteristics of the speaker’s voice as gender,
age, nationality, dialect, and emotional coloring of speech is also important in the field of forensics
and anti-terrorism. The identification results are important in conducting phonoscopic
examinations, and in carrying out expert forensic research based on the theory of forensic
identification [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Thus, the development of new methods of voice identification is a promising and relevant
scientific and technical task in providing biometric authentication in information and
telecommunication systems.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature review and problem statement</title>
      <p>
        The paper investigates a well-known method of biometric coding of speech signals based on
melfrequency cepstral coefficients (MFCC) [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], which consists of finding the average values of the
coefficients of the discrete cosine transform (DCT)
c [ n ]= ∑ E [ m ] cos(
      </p>
      <p>N f−1
m=0
1
2 ),
πn(m+ )</p>
      <p>
        N f
n=0 , … , N f −1 ,
prologarithmized energy of the spectrum [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
      </p>
      <p>N−1 2
E [ m ]=ln( ∑ |X [ k ]| H m [ k ]), m=0 , … , N f −1 ,</p>
      <p>k=0
discrete Fourier transform (DFT)</p>
      <p>X [ k ]= ∑ x [ n ] w [ n ] e−N2πj kn , k =0 , … , N −1 ,</p>
      <p>
        N−1
n=0
processed with a triangular filter [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
      </p>
      <p>0 ,∧k &lt; f [ m−1]
{ ( f [ m+1]−k )</p>
      <p>( k −f [ m−1])
H m [ k ]= ( f [ m ]−f [ m−1])
( f [ m+1]−f [ m ])
0 ,∧k &gt; f [ m+1]
,∧f [ m−1] ≤ k &lt; f [ m ]
,∧f [ m ] ≤ k ≤ f [ m+1]
where
f [ m ]=( N f ) M−1( M ( Fmin)+m M ( Fmax− Fmin) )</p>
      <p>
        F s N f +1
in mel scale M =1127.01048 × ln (1+ F / 700) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        The problem is that the presented method of biometric encoding of speech signals based on
MFCC does not meet the condition of adaptability [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
      </p>
      <p>
        ¿ n=1 ¿ N Λn=[ 0 , π ] ,
where Λn=[ ωn−1 , ωn ] are the segments of the Fourier spectrum [ 0 , π ] of the speech signal under
study, which is divided into N adjacent segments with boundaries ωn (where ω0=0 and ωN =π ),
which leads to suboptimal extraction of biometric features of speech signals and to a decrease in
the probability of recognizing the voice features of a person [
        <xref ref-type="bibr" rid="ref13 ref14">13–15</xref>
        ].
Therefore, it is necessary to develop a new method of biometric coding of speech signals based on
empirical wavelet transform (EWT). This method should differ from existing approaches by
constructing a system of adaptive bandpass Meyer wavelet filters, followed by the use of Hilbert
spectral analysis to determine the instantaneous amplitudes and frequencies of the functions of
internal empirical modes. The application of this method will reveal the biometric characteristics of
speech signals and increase the efficiency of their coding.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Purpose and research objectives</title>
      <p>The developed method includes the following steps (see Fig. 1). The speech signal, whose
frequency range is from 300 to 3400 Hz, is divided into K frames of 20 ms in length by N counts,
which intersect at 1/ 2 frame length to ensure the stationarity of the process (see Fig. 2) [16].
w [ n ]=0.53836−0.46164 × cos(2 π</p>
      <p>n
N −1
n=0 , … , N −1.</p>
      <p>The values of k indexes correspond to frequencies:</p>
      <p>F
f [ k ]= s k , k =0 , … , N / 2 ,</p>
      <p>
        N
where F s is the sampling rate of the speech signal.
The normalized Fourier spectrum in terms of frequency [ 0 , π ] and amplitude [
        <xref ref-type="bibr" rid="ref1">0 ,1</xref>
        ] is divided into
N segments Λn=[ ωn−1 , ωn], where ωn=(Ωn+ Ωn+1)/ 2 are the segment boundaries (ω0=0 and
ωN =π ), and Ωn are local maxima in the frequency spectrum characterizing the biometric features
of speech signals, then it is obvious that ¿ n=1 ¿ N Λn=[ 0 , π ] (see Figure 3) [18, 19].
Each boundary (filter cutoff frequencies) ωn, has a transient phase of width 2 τ n, where τ n is chosen
in proportion to ωn: τ n=γ ωn, and the parameter γ must meet the condition [20]:
ωn+1−ωn , 0&lt; γ &lt;1
γ &lt;minn ωn+1+ωn
which guarantees the absence of overlap between the transition regions 2 τ n and ensures the
orthogonality of the basis of the bandpass Meyer wavelet filters {ϕ1 (ω) , {ψn (ω)}nN=1}.
      </p>
      <p>Then ∀ n&gt;0, the adaptive basis {ϕ1 (ω) , {ψn (ω)}nN=1} is set by the scaling function ϕ^n ( ω) and
wavelet functions ψ^ n (ω), which corresponds to the low-pass filter and N −1 bandpass Meyer
filters for each spectrum segment Λn [21].</p>
      <p>1 ,∧|ω|≤ (1−γ ) ωn
ϕ^n (ω)={cos[ π2 β( 2 γ1ωn (|ω|−(1−γ ) ωn))],∧(1−≤ γ(1) ω+γn ≤)ω|ωn |≤</p>
      <p>0 ,∧ot h erwise
1 ,∧(1+ γ ) ωn ≤|ω|≤</p>
      <p>≤ (1−γ ) ωn+1
(|ω|−(1−γ ) ωn+1))],∧(1−γ ) ωn+1 ≤|ω|≤</p>
      <p>≤ (1+ γ ) ωn+1
sin [ π2 β( 2 γ1ωn (|ω|−(1−γ ) ωn))],∧(1−≤ γ(1) ω+γn ≤)ω|ωn |≤</p>
      <p>0 ,∧ot h erwise
where the function β ( x ) must meet the condition
β ( x )={01 ,,∧∧ xx ≤≥ 10 and β ( x )+ β (1− x )=1</p>
      <p>
        ∀ x∈ [
        <xref ref-type="bibr" rid="ref1">0 ,1</xref>
        ] .
      </p>
      <p>In practice, the following polynomial function is used [22]
β ( x )= x4 (35−84 x +70 x2−20 x3) .
As can be seen from the scaling function ϕ^n ( ω) and wavelet functions ψ^ n ( ω), adaptability is
achieved by building bandpass filters centered around the frequencies ωn, which characterize the
biometrics of the speech.</p>
      <p>Then the detailed coefficients of W εf ( n , t ) are given by scalar products with empirical wavelet
functions:</p>
      <p>W εf ( n , t )=⟨ f , ψn ⟩= ∫ f ( τ ) ψn ( τ −t ) dτ =( f^ ( ω) ψ^ n ( ω))∨ ,
and the approximation coefficients W εf ( 0 , t ) by a scalar product with a scaling function:
∨</p>
      <p>W εf (0 , t )=⟨ f , ϕ1 ⟩= ∫ f ( τ ) ϕ1 ( τ −t ) dτ =( f^ ( ω) ϕ^1 ( ω)) ,
where ψ^ n ( ω) and ϕ^1 ( ω) are defined by the equations of the wavelet functions ψ^ n ( ω) and the
scaling function ϕ^n ( ω), respectively [23–25].</p>
      <p>The reconstruction of the speech signal f (t ) using the wavelet coefficients of detail W εf ( n , t )
and approximation W εf (0 , t ) is given by the following expression</p>
      <p>N N ∨
f (t )=W εf (0 , t ) ϕ1 (t )+∑ W εf ( n , t ) ψn (t ) = (W^ εf (0 , ω) ϕ^1 ( ω)+∑ W^ εf ( n , ω) ψ^ n ( ω)) .</p>
      <p>n=1 n=1
Then the internal empirical modes of the studied signal f (t ) are given by the formulas
f 0 (t )=W εf ( 0 , t ) ϕ1 (t ) ,
f n (t )=W εf ( n , t ) ψ n (t ) ,
and the orthogonality of the expansion is proved by the fact that [26–28]</p>
      <p>N
f (t )=∑ f n (t ) .</p>
      <p>n=0</p>
      <p>To determine the instantaneous frequency and amplitude of the internal empirical modes (IEMs)
of the speech signal, we will resort to Hilbert spectral analysis.</p>
      <p>The Hilbert transform (HT) of EWT x (t ) is given by the following expression
y (t )= 1 P ∫∞ x ( τ ) dτ ,</p>
      <p>π −∞ t −τ
where P is the principal Cauchy value of the singular integral [29, 30].</p>
      <p>With the help of HT EWT x (t ) we can get an analytical signal</p>
      <p>z (t )= x (t )+iy (t )=a (t ) eiθ(t ) ,
where i=(−1)1/2.</p>
      <p>Then the instantaneous amplitude and frequency of the EWT can be expressed as
a (t )=√( x2+ y2) , ω (t )=dθ / dt ,
where the instantaneous frequency of ω (t ) is determined by the rate of change of the
instantaneous phase</p>
      <p>θ (t )=arctan ( y / x ) ,
and the EWT x (t ) can be expressed as the real part of the following equation [31]
n
x (t )=ℜ {∑ a j (t ) exp [i ∫ ω j (t ) dt ]}.</p>
      <p>j=1
Then the Hilbert energy density spectrum is defined as</p>
      <p>Si , j= H (ti , ω j)= ∆ t ×1∆ ω H [∑k=n1 a2k (t )],
where the intervals ∆ t × ∆ ω represent the values of a2 (t ) at a given time and frequency [32].
Let’s set the threshold function (see Fig. 4), which is described by the following expression:
Let’s assume that the probability of recognizing P frequency and amplitude of the function of the
harmonic distribution law x (t )= A × sin ( ωt + φ ) is 1, and the function of the uniform distribution
law
x (t )={b−a
1 ,∧ x∈ [ a , b ]
0 ,∧ x∉ [ a , b ]
is 1/2 [34, 35].</p>
      <p>Then the theoretical criterion for finding the maximum possible probability of recognizing the
biometric speech features of the analyzed frame is written in the following way, which is based on
the balance between the energy of the biometric speech features and their number
√∑|Ci … N|</p>
      <p>N 2
P= k=1 = N −i , i=1 , … N ,
√ ∑N|C|2 N</p>
      <p>k=1
where C is the Hilbert energy spectrum of length N , and T =Ci [36–38].</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results and discussion</title>
      <p>In this system to evaluate the results of automatic recognition of voice control commands, a
classifier built by the criterion of minimum distance is used. The dispersion of the difference
between the mathematical expectation of the mathematical expectation of the recognition features
based on the developed method of the reference voice images stored in the database and the
mathematical expectation of the recognition features based on the developed method at the testing
level of the system is used as such an indicator.</p>
      <p>The variance in the difference of the difference of the mathematical expectations of two samples
of voice control commands (recognition features based on the developed method), is written as
follows:
n
where, xi is recognition features based on the developed method stored in the base of reference
voice images, ¯xi Is recognition features based on the developed method at the system testing level,
n is some recognition features based on the developed method.</p>
      <p>The decision on biometric identification of voice commands is made according to the criterion
of minimum variance, i.e., the smallest deviation of the compared recognition features based on the
developed method in a certain recognition threshold which is given by the following expression:
n
n ( ∑i=1 xi
∑
D= i=1 n</p>
      <p>n
− ∑i=1 ¯xi )
n
2
,
if Dmin&lt;Θ
identified !
else
not identified ! end
where, Dmin is the minimum variance, Θ = 1 – Δ is a given threshold of acceptable recognition (in
practice, Δ = 0.80..0.90 is usually used).
The minimum variance, which is within the specified threshold of acceptable recognition, is the
best result of comparison, which means that the command is identified (recognized) is “identified!”
Otherwise, the voice command fails biometric identification (is not recognized) and is “not
identified!”</p>
      <p>The paper details the obtained results of preliminary experimental research, based on which
conclusions are drawn about the feasibility of further scientific and practical application of the
system for recognizing voice control commands based on cepstral analysis and the developed
algorithm for calculating the recognition features based on the developed method, as well as, a
thorough justification of the scientific and technical significance of the conducted experimental
research.</p>
      <p>All scientific-experimental studies of the system of recognition of voice control commands set
out below (Tables 1–3), were carried out taking into account the criterion of minimum distance,
which is the variance of the difference between the mathematical expectations of the compared
recognition features based on the developed method, depending on which varied values of the
minimum variance Dmin, thereby giving an objective assessment of the quality (reliability) of
recognition of voice control commands in the testing mode of the system. The decision on
biometric identification of voice commands is made by the criterion of minimum variance Dmin in a
given threshold of acceptable recognition Θ = 1 – Δ = 0.15, where Δ = 0.85.
In the first experiment (Table 1), we compared the recognition features based on the developed
method of voice commands of control subject No. 1: “up”, “down”, “right”, and “left”, which were
stored at the training level in the base of reference voice images with the recognition features
based on the developed method of voice commands of the same control subject No. 1, but already
in the system testing mode (the recognition features based on the developed method of spoken
voice commands in the testing mode are compared with the recognition features based on the
developed method of voice commands spoken earlier in the system training mode).</p>
      <p>From the obtained results (Table 1) it can be seen that the recognition features based on the
developed method of the voice commands of the control subject No. 1 meet the criterion of
minimum dispersion Dmin in the given threshold of acceptable recognition Θ = 0.15: “up” is
Dmin = 0.0311, “down” is Dmin = 0.0648, “right” is Dmin = 0.0123, “left” is Dmin = 0.0112, based on this,
the decision about positive biometric identification of the spoken voice commands is made (voice
commands are recognized). In other cases (Table 1) it is seen that the values of Dmin do not
correspond to the selected criterion, which means that the recognition features based on the
developed method of the spoken voice commands do not coincide with the recognition features
based on the developed method that are stored in the database of reference voice images, i.e. the
voice commands are not recognized.</p>
      <sec id="sec-4-1">
        <title>Training Testing Control Subject No. 1 Control Subject No. 2</title>
        <p>left
In the second experiment (Table 2), we compared the recognition features based on the developed
method of the spoken voice commands of control subject No. 2 in the testing mode with the
recognition features based on the developed method of the voice commands of control subject No.
1 spoken earlier in the system training mode.</p>
        <p>From the obtained results (Table 2), we can conclude that the recognition features based on the
developed method of voice commands of the control subject No. 2 meet the criterion of minimum
variance Dmin in a given threshold of acceptable recognition Θ = 0.15: “up” is Dmin = 0.0451, “down”
is Dmin = 0.0482, “right” is Dmin = 0.0967, “left” is Dmin = 0.0703, and therefore, a decision is made
about the positive result of recognizing the spoken voice commands.</p>
        <p>In all other cases, voice commands are not recognized because the resulting values do not meet
the specified recognition criterion.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Voice commands up down right left</title>
        <p>In the third experiment (Table 3), the recognition features based on the developed method of the
spoken voice commands of control subject No. 3 in the testing mode were compared with the
Recognition features based on the developed method of voice commands of control subject No. 1
spoken earlier in the training mode of the system, which are stored in the database of reference
voice images of control commands.</p>
        <p>The obtained values of the comparison results: “up” is Dmin = 0.0602, “down” is Dmin = 0.0912,
“right” is Dmin = 0.0846, “left” is Dmin = 0.0785, fully meet the criterion Dmin &lt; Θ, where Θ = 0.15, and
therefore, the decision about the positive result of recognizing the spoken voice commands is
made. As for the other obtained resultant values, they do not meet the specified recognition
criterion, and thus, the voice commands are not recognized.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>The paper develops a method of biometric coding of speech signals based on empirical wavelet
transform, which differs from existing methods by constructing a set of adaptive bandpass Meyer
wavelet filters with the subsequent application of Hilbert spectral analysis to find instantaneous
amplitudes and frequencies of functions of internal empirical modes, which will allow to determine
biometric features of speech signals and increase the efficiency of their coding.</p>
      <p>The paper details the results of preliminary experimental studies, based on which conclusions
are drawn about the feasibility of further scientific and practical application of the developed
system for recognizing voice control commands based on the novelty of cepstral analysis and the
algorithm for calculating the recognition features based on the developed method, as well as,
justification of the scientific significance of the study.</p>
      <p>A comparative evaluation of the calculated values obtained according to the chosen criterion of
minimum distance, which is the main indicator of the quality criterion of voice command
recognition, has been carried out.</p>
      <p>In the first experiment (Table 1) we compared the Recognition features based on the developed
method of voice commands of control subject No. 1: “up”, “down”, “right”, and “left”, which were
stored at the training level in the base of reference voice images with the Recognition features
based on the developed method of voice commands of the same control subject No. 1, but already
in the system testing mode.</p>
      <p>From the obtained results (Table 1) we can see that the recognition features based on the
developed method of voice commands of the control subject No. 1 meet the criterion of minimum
variance Dmin in the given threshold of acceptable recognition Θ = 0.15: “up” is Dmin = 0.0311,
“down” is Dmin = 0.0648, “right” is Dmin = 0.0123, “left” is Dmin = 0.0112, based on this, the decision
about positive biometric identification of the spoken voice commands is made.</p>
      <p>In the second experiment (Table 2), we compared the recognition features based on the
developed method of the spoken voice commands of control subject No. 2 in the testing mode with
the recognition features based on the developed method of the voice commands of control subject
No. 1 spoken earlier in the system training mode.</p>
      <p>From the obtained results (Table 2), we can conclude that the recognition features based on the
developed method of voice commands of the control subject No. 2 meet the criterion of minimum
variance Dmin in a given threshold of acceptable recognition Θ = 0.15: “up” is Dmin = 0.0451, “down”
is Dmin = 0.0482, “right” is Dmin = 0.0967, “left” is Dmin = 0.0703, and therefore, a decision is made
about the positive result of recognizing the spoken voice commands.</p>
      <p>In the third experiment (Table 3), the recognition features based on the developed method of the
spoken voice commands of the control subject No. 3 in the testing mode were compared with the
recognition features based on the developed method of the voice commands of the control subject
No. 1 spoken earlier in the system training mode. The obtained values of the comparison results:
“up” is Dmin = 0.0602, “down” is Dmin = 0.0912, “right” is Dmin = 0.0846, “left” is Dmin = 0.0785, fully
meet the criterion Dmin &lt; Θ, where Θ = 0.15, and therefore, the decision about the positive result of
recognizing the spoken voice commands is made.</p>
      <p>A software complex has been developed, including means for compiling a database of reference
voice images of control subjects for training and testing of the voice control system, and a program
mo delineating the proposed methods and algorithms for recognizing voice control commands in
the MATLAB environment.
Declaration on Generative AI
While preparing this work, the authors used the AI programs Grammarly Pro to correct text
grammar and Strike Plagiarism to search for possible plagiarism. After using this tool, the authors
reviewed and edited the content as needed and took full responsibility for the publication’s content.
[15] I. K. Alak, S. Ozaydin, Speech denoising with maximal overlap discrete wavelet transform, in:
International Conference on Electrical and Computing Technologies and Applications
(ICECTA), 2022, 27–30. doi:10.1109/ICECTA57148.2022.9990250
[16] Ravi, S. Taran, Emotion recognition using rational dilation wavelet transform for speech
signal, in: 7th International Conference on Signal Processing and Communication (ICSC), 2021,
156–160. doi:10.1109/ICSC53193.2021.9673412
[17] G. Konakhovych, et al., Method of reliability increasing based on spare parts optimization for
telecommunication equipment, in: 2nd International Workshop on Advances in Civil Aviation
Systems Development, ACASD 2024, Lecture Notes in Networks and Systems, vol. 992, 2024,
296–309. doi:10.1007/978-3-031-60196-5_22
[18] S. Zhao, M. Fu, Optimization of audio signal denoising algorithm based on wavelet transform
in speech communication scene, in: 5th International Conference on Civil Aviation Safety and
Information Technology (ICCASIT), 2023, 726–731. doi:10.1109/ICCASIT58768.2023.10351711
[19] O. Y. Lavrynenko, et al., Application of Daubechies wavelet analysis in problems of acoustic
detection of UAVs, in: 6th Workshop for Young Scientists in Computer Science &amp; Software
Engineering, vol. 3662, 2024, 125–143.
[20] D. Pawade, et al., Voice based authentication using mel-frequency Cepstral coefficients and
Gaussian Mixture Model, in: Bombay Section Signature Conference (IBSSC), 2022, 1–6.
doi:10.1109/IBSSC56953.2022.10037421
[21] D. Bakhtiiarov, et al., Distribute load among concurrent servers, in: Cybersecurity Providing in</p>
      <p>Information and Telecommunication Systems II, vol. 3826, 2024, 260–266.
[22] G. Bhatnagar, et al., System for identification of voice calls of interest in a telecom
communication network, in: World Conference on Communication &amp; Computing (WCONF),
2023, 1–6. doi:10.1109/WCONF58270.2023.10234983
[23] D. Bakhtiiarov, et al., Method of binary detection of small unmanned aerial vehicles, in:
Cybersecurity Providing in Information and Telecommunication Systems, vol. 3654, 2024, 312–
321.
[24] D. Dai, et al., A robust speech recognition algorithm based on improved PNCC and wavelet
analysis, in: International Conference on Image Processing and Computer Vision (IPCV), 2023,
7–12. doi:10.1109/IPCV57033.2023.00008
[25] O. Lavrynenko, et al., Method of remote biometric identification of a person by voice based on
wavelet packet transform, in: Cybersecurity Providing in Information and Telecommunication
Systems, vol. 3654, 2024, 150–162.
[26] S. M. Kabir, et al., Vowel recognition for isolated digit using wavelet transform at
decomposition level 3, in: 4th International Conference on Sustainable Technologies for
Industry 4.0 (STI), 2022, 1–4. doi:10.1109/STI56238.2022.10103316
[27] O. Holubnychyi, O. Lavrynenko, D. Bakhtiiarov, Well-adapted to bounded norms predictive
model for aviation sensor systems, in: International Workshop on Advances in Civil Aviation
Systems Development, ACASD 2023, Lecture Notes in Networks and Systems, vol. 736, 2023,
179–193. doi:10.1007/978-3-031-38082-2_14
[28] O. Julius, et al., Implementation of audio signals denoising for perfect speech-to-speech
translation using principal component analysis, in: International Conference on Science,
Engineering and Business for Sustainable Development Goals (SEB-SDG), 2023, 1–6.
doi:10.1109/SEB-SDG57117.2023.10124385
[29] O. Lavrynenko, et al., Remote voice user verification system for access to IoT services based on
5G technologies, in: 12th International Conference on Intelligent Data Acquisition and
Advanced Computing Systems: Technology and Applications (IDAACS), 2023, 1042–1048.
doi:10.1109/IDAACS58523.2023.10348955
[30] G. Yang, Y. Song, J. Du, Speech signal denoising algorithm and simulation based on wavelet
threshold, in: International Conference on Natural Language Processing (ICNLP), 2022, 304–
309. doi:10.1109/ICNLP55136.2022.00055
[31] P. Warule, S. P. Mishra, S. Deb, Time-frequency analysis of speech signal using wavelet
synchrosqueezing transform for automatic detection of Parkinson’s disease, Sensors Letters
7(10) (2023) 1–4. doi:10.1109/LSENS.2023.3311670
[32] B. Zhao, et al., A spectrum adaptive segmentation empirical wavelet transform for noisy and
nonstationary signal processing, Access 9 (2021) 106375–106386. doi:10.1109/
ACCESS.2021.3099500
[33] O. Lavrynenko, et al., Method of semantic coding of speech signals based on empirical wavelet
transform, in: 4th International Conference on Advanced Information and Communication
Technologies (AICT), 2021, 18–22. doi:10.1109/AICT52120.2021.9628985
[34] T. Choudhary, V. Goyal, A. Bansal, WTASR: Wavelet transformer for automatic speech
recognition of Indian languages, Big Data Min. Analytics 6(1) (2023) 85–91.
doi:10.26599/BDMA.2022.9020017
[35] N. Holighaus, et al., Grid-based decimation for wavelet transforms with stably invertible
implementation, transactions on audio, speech, and language processing 31 (2023) 789–801.
[36] R. Odarchenko, et al., Empirical wavelet transform in speech signal compression problems, in:
8th International Conference on Problems of Infocommunications, Science and Technology
(PIC S&amp;T), 2021, 599–602. doi:10.1109/PICST54195.2021.9772156
[37] K. Sun, et al., Wavelet denoising method based on improved threshold function, in: 10th Joint
International Information Technology and Artificial Intelligence Conference (ITAIC), 2022,
1402–1406. doi:10.1109/ITAIC54216.2022.9836698
[38] O. Lavrynenko, Method of speech signal scrambling based on matched wavelet filters, in:
Cybersecurity Providing in Informationand Telecommunication Systems II, vol. 3826, 2024,
229–235.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
             
            <surname>Kinkiri</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
           Keates,
          <article-title>Speaker identification: variations of a human voice</article-title>
          ,
          <source>in: International Conference on Advances in Computing and Communication Engineering (ICACCE)</source>
          ,
          <year>2020</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICACCE49060.
          <year>2020</year>
          .9154998
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>M. </surname>
          </string-name>
          <article-title>M. </article-title>
          <string-name>
            <surname>Abdulghani</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
           L. 
          <string-name>
            <surname>Walters</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
           H. 
          <article-title>Abed, Voice signature recognition for UAV pilots identity verification</article-title>
          ,
          <source>in: International Conference on Computational Science and Computational Intelligence (CSCI)</source>
          ,
          <year>2023</year>
          ,
          <fpage>125</fpage>
          -
          <lpage>129</lpage>
          . doi:
          <volume>10</volume>
          .1109/CSCI62032.
          <year>2023</year>
          .00026
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
             
            <surname>Singhal</surname>
          </string-name>
          , D. K. 
          <string-name>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>Analysis of classifiers for gender identification using voice signals</article-title>
          ,
          <source>in: 5th International Conference on Information Systems and Computer Networks (ISCON)</source>
          ,
          <year>2021</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1109/ISCON52037.
          <year>2021</year>
          .9702469
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L. H.</given-names>
            <surname> Palivela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
             
            <surname>Dharmalingam</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
           Elangovan,
          <article-title>Voice authentication system</article-title>
          ,
          <source>in: International Conference on Data Science, Agents &amp; Artificial Intelligence (ICDSAAI)</source>
          ,
          <year>2023</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICDSAAI59313.
          <year>2023</year>
          .10452482
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
             
            <surname>Aliaskar</surname>
          </string-name>
          , et al.,
          <article-title>Human voice identification based on the detection of fundamental harmonics</article-title>
          ,
          <source>in: 7th International Energy Conference (ENERGYCON)</source>
          ,
          <year>2022</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1109/ENERGYCON53164.
          <year>2022</year>
          .9830471
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
             A. 
            <surname>Jabbar</surname>
          </string-name>
          , et al.,
          <article-title>Stable implementation of voice activity detector using zero-phase zero frequency resonator on FPGA</article-title>
          ,
          <source>in: International Conference and Expo on Real Time Communications at IIT (RTC)</source>
          ,
          <year>2023</year>
          ,
          <fpage>13</fpage>
          -
          <lpage>18</lpage>
          . doi:
          <volume>10</volume>
          .1109/RTC58825.
          <year>2023</year>
          .10304243
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
             
            <surname>Kuzmin</surname>
          </string-name>
          , et al.,
          <article-title>Method for correcting the mathematical model in case of empirical data asymmetry</article-title>
          , in: Integrated Computer Technologies in Mechanical Engineering,
          <source>ICTM 2022, Lecture Notes in Networks and Systems</source>
          , vol.
          <volume>657</volume>
          ,
          <year>2023</year>
          ,
          <fpage>249</fpage>
          -
          <lpage>260</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -36201- 9_
          <fpage>21</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
             
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
             
            <surname>Gurugubelli</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
           K. 
          <article-title>Vuppala, Study on the effect of emotional speech on language identification</article-title>
          ,
          <source>in: National Conference on Communications (NCC)</source>
          ,
          <year>2020</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1109/NCC48643.
          <year>2020</year>
          .9056015
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname> Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
             P. 
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
             
            <surname>Kumar Das</surname>
          </string-name>
          ,
          <article-title>Effectiveness of feature collaboration in speaker identification for voice biometrics</article-title>
          , in: International Conference on Computer, Electrical &amp; Communication
          <string-name>
            <surname>Engineering</surname>
          </string-name>
          (ICCECE),
          <year>2023</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICCECE51049.
          <year>2023</year>
          .10085318
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N. J.</given-names>
             
            <surname>Perdana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
             E. 
            <surname>Herwindiati</surname>
          </string-name>
          , N. H. 
          <article-title>Sarmin, Voice recognition system for user authentication using Gaussian mixture model</article-title>
          ,
          <source>in: International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)</source>
          ,
          <year>2022</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . doi:
          <volume>10</volume>
          .1109/IICAIET55139.
          <year>2022</year>
          .9936856
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
             
            <surname>Lavrynenko</surname>
          </string-name>
          , et al.,
          <article-title>A method for extracting the semantic features of speech signal recognition based on empirical wavelet transform</article-title>
          ,
          <source>Radioelectron. Comput. Syst</source>
          .
          <volume>3</volume>
          (
          <issue>107</issue>
          ) (
          <year>2023</year>
          )
          <fpage>101</fpage>
          -
          <lpage>124</lpage>
          . doi:
          <volume>10</volume>
          .32620/reks.
          <year>2023</year>
          .
          <volume>3</volume>
          .
          <fpage>09</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
             
            <surname>Barhoush</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
             
            <surname>Hallawa</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
           
          <article-title>Schmeink, Robust automatic speaker identification system using shuffled MFCC features</article-title>
          ,
          <source>in: International Conference on Machine Learning and Applied Network Technologies (ICMLANT)</source>
          ,
          <year>2021</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICMLANT53170.
          <year>2021</year>
          .9690530
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13] E. J. van Rensburg,
          <string-name>
            <given-names>R.</given-names>
             A. 
            <surname>Botha</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
           
          <article-title>Haskins, Identifying duress through voice during speaker authentication</article-title>
          , in: International Conference on Electrical,
          <source>Computer and Energy Technologies (ICECET)</source>
          ,
          <year>2023</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICECET58911.
          <year>2023</year>
          .10389204
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>O.</given-names>
             
            <surname>Lavrynenko</surname>
          </string-name>
          , et al.,
          <article-title>A wavelet-based steganographic method for text hiding in an audio signal</article-title>
          ,
          <source>Sensors</source>
          <volume>22</volume>
          (
          <issue>15</issue>
          ) (
          <year>2022</year>
          )
          <article-title>5832</article-title>
          . doi:
          <volume>10</volume>
          .3390/s22155832
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>