<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Method of Remote Biometric Identification of a Person by Voice based on Wavelet Packet Transform</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksandr Lavrynenko</string-name>
          <email>oleksandrlavrynenko@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bohdan Chumachenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maksym Zaliskyi</string-name>
          <email>maksym.zaliskyi@npp.nau.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Serhii Chumachenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Denys Bakhtiiarov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Aviation University</institution>
          ,
          <addr-line>1 Lubomyr Huzar ave., Kyiv, 03058</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>150</fpage>
      <lpage>162</lpage>
      <abstract>
        <p>In this research, the task of extracting speech signal recognition features for voice identification of a person in a remote mode was solved, which imposes several restrictions, namely: (1) minimum processing time of the speech signal realization, since the required recognition reliability is achieved through statistical processing of the results; (2) reduction of the dimensionality of recognition features, since the process of extracting recognition features and their classification occurs on the transmitting side of the communication channel, which in turn imposes certain factors of computing power and noise in the communication channel. After analyzing the given conditions of the voice identification system, the question arose of developing a method for extracting speech signal recognition features that would provide more informative spectral characteristics of the speech signal, which would improve the efficiency of their further classification under the influence of noise. In this paper, we consider the possibility of applying the theory of time-scale analysis to solve this problem, namely, the development of a method for extracting recognition features based on the wavelet packet transform using the orthogonal basis wavelet function of Meyer and subsequent averaging of wavelet coefficients that are in the frequency band of the corresponding wavelet packet. Experimental studies have shown the ability of the developed method to generate speech signal recognition features with a close frequency-temporal structure based on wavelet packets in the Meyer basis, namely, it was found that at a signal-to-noise ratio of 10 dB, the features obtained based on the developed method have a very acceptable result, namely, 1.6-2 times more robust to noise than the features obtained based on the traditional Fourier spectrum, where the total deviation of the root mean square error of the obtained features is unacceptable at a signal-to-noise ratio of 20 dB.</p>
      </abstract>
      <kwd-group>
        <kwd>1 speech signal</kwd>
        <kwd>recognition features</kwd>
        <kwd>wavelet transform</kwd>
        <kwd>wavelet Meyer function</kwd>
        <kwd>spectral analysis</kwd>
        <kwd>voice identification</kwd>
        <kwd>biometric authentication</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The development of new methods and means of
ensuring information security is intended
primarily to prevent threats of access to
information resources by unauthorized persons.
To solve this problem, it is necessary to have
identifiers and create identification procedures
for all users. Modern identification and
authentication include various systems and
methods of biometric identification [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The
development of identification systems based on
biometric measurements is associated with a
whole range of advantages: such systems are
more reliable because biometric indicators are
more difficult to fake; modern microprocessor
technology makes biometric methods more
convenient than conventional identification
methods; and, finally, they are much easier to
automate measurements [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6 ref9">2–6</xref>
        ].
      </p>
      <p>
        One of the most common biometric
characteristics of a person is his or her voice,
which has a set of individual characteristics that
are relatively easy to measure (for example, the
frequency spectrum of the speech signal). The
advantages of voice identification also include
ease of application and use, and the fairly low cost
of devices used for identification (e. g.,
microphones) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Voice identification capabilities cover a very
wide range of tasks, which distinguishes them
from other biometric systems. First of all, voice
identification has been widely used for a long
time in various systems for differentiating access
to physical objects and information resources. Its
new application in remote voice identification
systems, where a person is identified through a
telecommunications channel, seems promising.
For example, in mobile communications, voice
can be used to manage services, and the
introduction of voice identification helps protect
against fraud [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Voice identification is of particular
importance in the investigation of crimes, in
particular in the field of computer information,
and in the formation of the evidence base for such
an investigation. In these cases, it is often
necessary to identify an unknown voice
recording. Voice identification is an important
practical task when searching for a suspect based
on a voice recording in telecommunication
channels. Determining such characteristics of the
speaker’s voice as gender, age, nationality,
dialect, and emotional coloring of speech is also
important in the field of forensics and
antiterrorism. The identification results are
important in conducting phonoscopic
examinations, and in carrying out expert forensic
research based on the theory of forensic
identification [
        <xref ref-type="bibr" rid="ref10">9</xref>
        ].
      </p>
      <p>
        Voice identification in real-world
environments faces the following serious
challenges. Firstly, such identification is subject
to all kinds of hardware distortions and noise
caused by the peculiarities of equipment and
devices for recording, processing, and storing
information. Secondly, external acoustic noise
inevitably superimposes the speech signal, which
can significantly distort individual informative
characteristics. Given this, identification systems
that have demonstrated fairly high efficiency in
laboratory conditions may show much lower
reliability when analyzing speech information
with external noise. Finally, in several tasks,
identification has to be performed in very difficult
conditions of overlapping voices of several
speakers, in particular, with similar acoustic
characteristics. It should be noted that there have
been virtually no studies of voice identification
capabilities for this most difficult case [
        <xref ref-type="bibr" rid="ref11">10</xref>
        ].
      </p>
      <p>
        Voice identification involves a set of
technical, algorithmic, and mathematical
methods that cover all stages, from voice
recording to voice data classification. The
discussed difficulties and shortcomings lead to
the conclusion that further development of voice
identification systems requires the development
of new approaches aimed at processing large
arrays of experimental speech signals, their
effective analysis, and reliable classification. This
indicates the relevance of research on the
creation of new mathematical methods for
processing, analyzing, and classifying voice data
that would ensure the reliability and accuracy of
person identification [
        <xref ref-type="bibr" rid="ref12">11</xref>
        ].
      </p>
      <p>Traditionally, the methods that provide the
required level of classification reliability under
given conditions are of practical interest for
speech signal recognition. Until recently, the
dominant approach to the construction of
biometric voice identification devices was not
to impose restrictions on the processing time of
the speech signal, since the required
recognition reliability was achieved by
statistical processing of the results obtained, as
well as by increasing the dimensionality of the
recognition features, and as a rule, the process
of extracting recognition features and their
classification took place on the transmitting
side of the communication channel.</p>
      <p>
        However, in the case of remote voice
identification in modern mobile radio
communication systems, it is difficult to ensure
these conditions, since the identification of a
person is carried out on the receiving side, and
this, in turn, imposes certain factors of
computing power and the influence of noise in
the communication channel. An additional
requirement is often the need to make a
classification decision in a time-sensitive
environment [
        <xref ref-type="bibr" rid="ref13">12</xref>
        ].
      </p>
      <p>
        In this case, it is necessary to move to other
methods that can provide the necessary
contrast of the speech signal in the formed
technologies in a remote
mode based on
modern mobile radio communication systems,
which will significantly expand the scope of this
type of technology. In this paper, we consider
the possibility of applying the theory of
timescale analysis to solve this problem [
        <xref ref-type="bibr" rid="ref14">13</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature</title>
    </sec>
    <sec id="sec-3">
      <title>Analysis and</title>
    </sec>
    <sec id="sec-4">
      <title>Problem Statement</title>
      <p>In general, recognition is the process of
assigning the object under study, in this case, a
. . .. . .2 ],</p>
      <p>is a  -dimensional vector of observed
values  of features  1,  2, . . . ,   that reflect
the most important properties of objects for
recognition. The set of features  , as a rule, is
the same for all recognition classes  1,  2, . . . ,</p>
      <p>Thus, we consider the task of recognizing
the object under study belongs to one of a finite
number of classes  1,  2, . . . ,   , which
described by a set of features  1,  2, . . . ,   ,
which is the same for all classes. Differences
are
recognition
features
the
specified
conditions, namely, to ensure the quality of
speech signal recognition features extraction
influence
noise</p>
      <p>in
by
of
under
allow
the
the
communication channel, which in turn
use
of
voice
identification</p>
      <p>vector
the
will
between classes will be manifested only in
differences in the characteristics of features of
different objects. Then, for any set of features
 1,  2, . . . ,   , you can set rules according to
which any two classes  1 and   are assigned a
 
= [ . . .</p>
      <p>11
 21
  1
and

 1 =
.
.
[ 1 ]
,
which
consists of 
parameters
called
interclass distances that express the degree of
difference in the characteristics of recognition
features [16].</p>
      <p>An integral part of the speech signal
recognition process is the definition of a set of
features  1,  2, . . . ,   , i.e., the formation of
recognition features in such a way as to ensure
the required classification reliability with the
minimum
possible
dimension  .</p>
      <p>By
the
considered approach to solving the problem of
speech signal recognition, an important point is
the choice of a method for forming recognition
features. The use of approaches based on the
traditional Fourier spectral-time analysis for
this
purpose
is
associated
with
certain
difficulties. First, there are high requirements
for the input speech signal stream in terms of
signal-to-noise ratio. Secondly, the lack of
classification reliability for
multicomponent
and low-stationary signals, such as speech
signals, and thirdly, the need for a significant
amount of implementations. The desire to
overcome
these
limitations
within
the
framework of traditional approaches of classical
spectral signal processing leads to
difficult-toimplement variants of speech signal recognition
devices and solutions that are unacceptable for
the conditions under consideration [17].</p>
      <p>Thus, we formulate the research objective:
to develop a method that allows the formation
of
contrasting
recognition
features
for
automatic remote identification of a person by
voice under the conditions of restrictions on
the duration of the processed realization at a
signal-to-noise ratio of less than 20 dB under
conditions of partial or complete a priori
uncertainty about their structure.</p>
    </sec>
    <sec id="sec-5">
      <title>3. Proposed Method</title>
      <p>Currently,
analyzing
methods</p>
      <p>of
decompose the input signal into a system of
basis wavelets—functions, each of which is a
shifted and scaled copy of the input (generating
or mother wavelet). A characteristic property
of wavelet functions (hereinafter referred to as
wavelets) is the finite energy at their full
localization in both frequency and time
domains.</p>
      <p>Thus, any sequence of discrete samples of
the speech signal  (  ) can be represented as an
ordered set of coefficients of decomposition by
a system of scaling functions and wavelet
functions:
 (  ) = ∑   ,   , (  ) + ∑</p>
      <p>∑   ,   , (  ),
 2 −
 =1  =1
2 −
 =1
and</p>
      <p>detailing
coefficients [18].
defined
analysis:
where</p>
      <p>is the number of decomposition
levels,   , and   , are the approximating</p>
      <p>Here, in (1) and (2) √2 is the normalizing
factor, and 
= 0, ± 1, ± 2, . ..;</p>
      <p>∈  .</p>
      <p>In practice, to quickly calculate the values of
wavelet coefficients   ,
and   ,</p>
      <p>use a
sequential separation scheme called the pyramid
or Mallat algorithm, which is interpreted as a
sequential two-band filtering of the input speech
signal using cascaded low-pass (h) and high-pass
(g) filter blocks (Fig. 1) [19].
In Fig. 1, for the wavelet coefficients   , and
  , , the first index</p>
      <p>corresponds to the
number of the decomposition level, and the
second index  = 0,1, . . . , 2</p>
      <p>− 1 corresponds
to the ordinal value of the wavelet coefficient at
the decomposition level  . According to the
theory of multiple-scale analysis, the values of
  , and   , can be obtained based on the
coefficients calculated at the previous stages of
speech signal decomposition:
  , =</p>
      <p>∑   −1, ℎ +2 ,
1
√2

1
√2</p>
      <p>, =</p>
      <p>∑   −1,   +2 ,
where ℎ and   are sequences that define the
characteristics of filters H and G at the  level
of wavelet decomposition [20].</p>
      <p>The number of multiplication operations
required to calculate all the coefficients of the
discrete wavelet transform for the data set 
and the length of the vectors h and g equal to 
is 2</p>
      <p>
        . The same number of operations is
required to recover or calculate all the spectral
components. So, to analyze a speech signal on a
wavelet basis, you need to perform 4
operations. The number of complex
multiplication operations for the fast Fourier
transform is   2  , which is comparable to
or even greater than in the case of the discrete
wavelet transform [
        <xref ref-type="bibr" rid="ref15">21</xref>
        ].
      </p>
      <p>The interpretation of the coefficients of the
discrete wavelet transform is somewhat more
complicated than the Fourier coefficients. If the
analyzed speech signal is sampled at a
frequency of 8 kHz and consists of 256 samples,
then the top frequency of the signal is 4 kHz.</p>
      <p>Then the coefficients of the first level of
decomposition (128) occupy the frequency
band [2.0, 4.0] kHz. The second-level wavelet
coefficients (64) are responsible for the [1.0,
2.0] kHz frequency band. They are displayed
before the first level wavelet coefficients. The
procedure is repeated until there is 1 wavelet
coefficient and 1 scaling coefficient at level 9.</p>
      <p>The total number of coefficients is
(1+1+2+4+8+16+32+64+128) = 256. That is,
the number of coefficients is equal to the
number of samples in the input speech signal.</p>
      <p>
        If the main energy of the signal was
concentrated near the frequency of 1.0 kHz,
then the second-level wavelet coefficients will
be more informative, and the first-level wavelet
coefficients can be neglected [
        <xref ref-type="bibr" rid="ref16">22</xref>
        ].
      </p>
      <p>As a continuation of the development of the
theory of multiple-scale analysis, it is proposed
to improve the Mallat algorithm by additional
processing of the high-frequency components
of the pyramid of the analyzed speech signal.</p>
      <p>Thus, in the improved algorithm, recursive
filtering is applied to the coefficients   , . This
full decomposition algorithm is called wavelet
packet decomposition. The decomposition
scheme based on wavelet packets is shown in
Fig. 2.</p>
      <p>For the wavelet coefficients    ( ) (Fig. 2),
the index  corresponds to the number of the
decomposition level, the index  corresponds
to the number of the subband at the level  ,
and  = 0,1, . . . , 2 − 1 corresponds to the
number of the wavelet coefficients at the level
 . In wavelet packages, several decomposition
bases are used for complete decomposition,
united by the image of nesting in each other,
which gave the method its name. In general,
each level of the hierarchy can use its specific
basis. In contrast to the Mallat algorithm, the
use of wavelet packets makes it possible to take
into account the subtle structure of the
analyzed speech signal process in a more
comprehensive way.</p>
      <p>
        Indeed, the absolute values of the
coefficients in the wavelet packet
decomposition are smaller than those of the
Mallat algorithm. Therefore, it can be argued
that the approximation with wavelet packets
has a much smaller error [
        <xref ref-type="bibr" rid="ref17">23</xref>
        ].
      </p>
      <p>Since the wavelet basis is a complete
decomposition basis, the wavelet coefficients
contain individual characteristics of the input
speech signal, determined by the properties of
the basis functions to the same extent as the
spectral components of the Fourier series.</p>
      <p>Thus, any wavelet transform, including those
based on the use of wavelet packets, allows you
to uniquely represent a speech signal by an
ordered set of its wavelet coefficients. It is
possible to assume the possibility of using them
as recognition features and thus put the
calculation of coefficients based on wavelet
packets based on the proposed method.</p>
      <p>
        The method of forming speech signal
recognition features based on wavelet packets
is defined as follows. In the wavelet spectrum
formed based on wavelet packets, the power of
the calculated wavelet coefficients within each
subband of the decomposition is averaged. The
averaged coefficients are normalized and,
according to their place in the overall pyramid
of wavelet packets from left to right and from
top to bottom, converted into a vector of
recognition features. Thus, specific values of
the average power of the wavelet coefficients in
each subband of the decomposition will serve
as the primary features of speech signal
recognition. It should be noted that, in general,
the features obtained in this way will be
correlated, so it is advisable to apply an
additional decorrelation transformation to the
vector, which, by the way, will reduce the size
of the secondary recognition feature space
[
        <xref ref-type="bibr" rid="ref18">24</xref>
        ].
      </p>
      <p>Consider the sequence of stages of the
proposed method (Fig. 3).
Z2,0 (i)</p>
      <p>Z1,0 (i)</p>
      <p>H
G
Z2,1 (i)</p>
      <p>S (ti )</p>
      <p>G</p>
      <p>H
Z2,2 (i)</p>
      <p>Z1,1 (i)</p>
      <p>G
Z2,3 (i)
H</p>
      <p>G</p>
      <p>H</p>
      <p>G</p>
      <p>H</p>
      <p>G</p>
      <p>H</p>
      <p>G
Zm,n (i) Zm,n (i) Zm,n (i) Zm,n (i) Zm,n (i) Zm,n (i) Zm,n (i) Zm,n (i)</p>
      <p>Pm,n Pm,n Pm,n Pm,n Pm,n Pm,n Pm,n Pm,n
Figure 3: Scheme of speech signal recognition features selection for biometric identification of a
person
Initially, the input sequence of discrete samples
of the speech signal  (  ) with length  , a
multiple of power 2, at  = 0,1,2, . . . , (</p>
      <p>− 1) is
decomposed into 
≤</p>
      <p>2( )levels as a result
of applying the wavelet packet algorithm. At
the
first level, the</p>
      <p>input array  (  ) is
decomposed into two sets  1,0( ) and  1,1( ) by
convolution  (  ) with sequences {ℎ} and { },
which are determined by the characteristics of
low H and high G frequency filters. At the 2nd
level, the considered convolution procedures
are repeated with each of the obtained subsets
 1,0( )
and  1,1( ).</p>
      <p>The
process</p>
      <p>
        of full
decomposition, called wavelet packetization,
involves  steps similar to the first one [
        <xref ref-type="bibr" rid="ref19">25</xref>
        ].
      </p>
      <p>The analytically considered procedures can be
represented in general by the following
expressions:</p>
      <p>,2 ( ) = ∑   −1, ( )ℎ , ( ),
  ,2 +1( ) = ∑   −1, ( )  , ( ),
where is 1 ≤</p>
      <p>≤  , and 0 ≤  ≤ (2 −1 − 1).</p>
      <p>At the first level of decomposition, the samples
of the speech signal  (  ) are used as  0,0( ) .</p>
      <p>The values of the elements of the sequences {ℎ}
and { } depend on the choice of the type of
scaling function  ( ) and wavelet function
 ( ) and, according to (1) and (2), are
calculated as follows:
ℎ , ( ) = 2− /2 (2−  −  ),
  , ( ) = 2− /2 (2−  −  ).</p>
      <p>As
a
result of the</p>
      <p>transformations
performed
during the</p>
      <p>
        decomposition, the
sequence of samples of the speech signal  (  )
is decomposed into  = 2 ⋅ 2 − 1 sequences
(including the input one) of length  /2 , each
of which represents one of the frequency
subbands of the input speech signal [
        <xref ref-type="bibr" rid="ref20">26</xref>
        ].
      </p>
      <p>Different realizations of speech signals will
have
different
energy
distributions</p>
      <p>over
frequency subbands since their Fourier spectra
will also be different. If you calculate the
average power of the wavelet coefficients in
each subband, the set of values obtained will
reflect the wavelet content of the speech signal
subbands,
similar
to
the</p>
      <p>frequency
representation. Moreover, the transition to the
average power will allow the use of relatively
short input realizations for recognition, which
is an important point in the operation of rapid
analysis
systems.</p>
      <p>The
bandwidth
frequencies falling into each of the subbands
will narrow with an increase in the number of
the decomposition level, which follows from
the
wavelet</p>
      <p>packet scheme (Fig. 2). The
average powers of the wavelet coefficients in
each subband, which are used as speech
recognition features, are calculated according
to the following expression:
 ̄ , =
∑
(( +1)⋅2 )−1(  , ( ))2
 = ⋅ /2

 /2
.</p>
      <p>
        (3)
power  ̄0,0
realization  (  ) [
        <xref ref-type="bibr" rid="ref21">27</xref>
        ].
      </p>
      <p>To eliminate the sensitivity of the features to
changes in the average power of the speech
signal realization, the values of  ̄ , obtained
by (3) are normalized relative to the average
calculated normalized values of  ̄ , from left to
right and from top to bottom. The number of the
feature</p>
      <p>expression  = 2
is determined</p>
      <p>according to the
− 1 +  and corresponds to
the ordinal number of the component element of
the vector</p>
      <p>= {  } .</p>
      <p>An important point in implementing the
method is the choice of the scaling function  ( )
and the wavelet  ( ). First, the size of the
timefrequency window should be taken into account.</p>
      <p>Second, the smoothness and symmetry of the
underlying wavelet. Third, determine (set) the
order of approximation. Correct selection of the
wavelet basis for the speech signal significantly
reduces
the
number
of
non-zero</p>
      <p>
        wavelet
coefficients   , ( ), which significantly reduces
the size of the recognition features and makes
them much more informative [
        <xref ref-type="bibr" rid="ref22">28</xref>
        ].
      </p>
    </sec>
    <sec id="sec-6">
      <title>4. Results and Discussion</title>
      <p>Practical experiments were conducted to
investigate the contrast of the speech recognition
feature vectors formed based on the proposed
method. In particular, Figs. 4–5 show the feature
vectors of the speech signal calculated in different
wavelet decomposition bases.</p>
      <p>Thus, in the first case (Fig. 4), a wavelet
package based on the Haar basis was used to
obtain wavelet coefficients, which provides a
relatively coarse approximation of the speech
signal, which accordingly affects the
informativeness of the recognition features. In
the second case (Fig. 5), the speech signal
recognition features are calculated based on a
smoother Meyer function, which makes the
features more informative.</p>
      <p>
        A comparative analysis of the results in Figs.
4–5 shows that when choosing a smoother basis
function, the number of yr values close to zero in
the feature vector  = {  } increases and the
informativeness of the decomposition increases,
unlike the Haar function, where we get less
informative recognition features. Thus, the use of
basic wavelet functions consistently in terms of
smoothness with the studied speech signal
allows us to reduce the size of recognition
features and increase their informativeness.
To confirm the hypothesis that it is expedient
to build speech signal recognition systems
based on wavelet packets using the values of
 = {  } obtained by expression (3) as
recognition features, we studied the developed
method of forming recognition features in
comparison with the approach proposed in
[15], which is based on the spectral
components of the classical harmonic Fourier
transform (Fig. 6).
The experiment used realizations of speech
signals with a duration of  = 512 samples,
and the decomposition was performed at  =
5 levels of decomposition. This approach
allowed us to obtain a feature vector  = {  }
of length  = 32, where 16 wavelet coefficients
were averaged in each subband. As for the
recognition features based on the Fourier
transform, the spectrum was divided into 32
bands of 16 coefficients each [
        <xref ref-type="bibr" rid="ref23">29</xref>
        ].
To illustrate more clearly the effectiveness of
the proposed method (Fig. 7), an experiment
was conducted using pre-recorded 30 audio
recordings
with
the
same
semantic
constructions by two different speakers, i.e.,
the words were pronounced by the speakers:
“1”, “2”, “3”, “4”, “5” every 30 times. The average
value of Root Mean Square Errors (RMSE) will
serve
as
an
objective
indicator of the
effectiveness of the developed method
 =

1
 =1
,
for all 30 realizations for each speaker, so the
result that shows the lowest RMSE error is the
best.
      </p>
      <p>RMSE is one of many metrics that are used
to evaluate model performance. To calculate
RMSE, square the number of detected errors
and find the average value [30].</p>
      <p>a
b
c
signal recognition features using bases: a)</p>
      <sec id="sec-6-1">
        <title>Meyer, b) Haar, c) Fourier The results of the pairwise comparison of the features of the test speech signals obtained using the</title>
      </sec>
      <sec id="sec-6-2">
        <title>Haar and</title>
      </sec>
      <sec id="sec-6-3">
        <title>Meyer wavelet-based methods and the Fourier spectral coefficientbased method are presented in Table 1.</title>
        <p>This
experimental study is
needed to
compute an objective measure of the
interclass distance RMSE of recognition features, i.e.,
the
scatter
of features
when
comparing
different realizations of speech signals [31].
Comparative analysis of the existing and
proposed methods</p>
        <p>Harr, 
robustness of feature vectors formed based on
wavelet packets for the Meyer, Haar basis and
based on the Fourier energy spectrum, several
experiments were conducted with the addition
of white noise with a signal-to-noise ratio of
10 dB to the speech signal (noise power was
measured in the analysis band) [32]. Fig. 8
shows all three feature vectors with the same
noise power.
c
Figure 8: Thirty realizations of speech
recognition features obtained at a
signal-tonoise ratio of 10 dB based on bases: (a) Meyer,
(b) Haar, and (c) Fourier
Thus, it was possible to establish that at a
signal-to-noise ratio of 10 dB, the features
obtained based on the developed method have
a very acceptable result, namely, a 1.6-2-fold
increase in stability compared to the features
obtained based on the traditional Fourier
spectrum, where already at a signal-to-noise
ratio of 20 dB the total deviation of the
obtained features  is unacceptable.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5. Conclusions</title>
    </sec>
    <sec id="sec-8">
      <title>Research and</title>
    </sec>
    <sec id="sec-9">
      <title>Future</title>
      <p>In this research, the task of extracting speech
signal recognition features for voice
identification of a person in a remote mode
was solved, which imposes several
restrictions, namely: (1) minimum processing
time of the speech signal realization, since the
required recognition reliability is achieved by
statistical processing of the obtained results;
(2) reduction of the dimensionality of
recognition features, since the process of
extracting recognition features and their
classification occurs on the transmitting side of
the communication channel, which in turn
imposes certain factors of computing power
and the influence of noise in the
communication channel.</p>
      <p>The studies have shown the ability of the
developed method to form recognition
features based on wavelet packets on the
Meyer basis. The most important indicator of
the effectiveness of the experiment is the
increase in the contrast of recognition features,
i.e., the increase in the interclass distance in the
formed feature system for speech signals with
a similar frequency-temporal structure. Even a
visual analysis of the obtained values  =
{  } (Figs. 7–8) reveals significant differences
in the structure of the feature vectors formed
by relatively short implementations, which
proves the potential use of the presented
method for speech signal recognition in rapid
analysis systems. Since the recognition
features are distributed according to normal
law, the subsequent procedure for deciding
whether speech signal realizations belong to a
particular class is greatly simplified.</p>
      <p>After analyzing the given conditions of the
voice identification system, the question arose
of developing a method for extracting speech
signal recognition features that would provide
more informative spectral characteristics of
the speech signal, which would improve the
efficiency of their further classification under
the influence of noise.</p>
      <p>This paper considers the possibility of
applying the theory of scale-time analysis to
solve this problem, namely, the development of
a method for extracting recognition features
based on the wavelet packet transform using
the orthogonal basis wavelet Meyer function
and subsequent averaging of wavelet
coefficients that are in the frequency band of
the corresponding wavelet packet.
Experimental studies have shown the ability of
the developed method to generate speech
signal recognition features with a close
frequency-temporal structure based on
wavelet packets in the Meyer basis, namely, it
was found that at a signal-to-noise ratio of 10
dB, the features obtained based on the
developed method have a very acceptable
result, namely, 1.6–2 times more robust to
noise than the features obtained based on the
traditional Fourier spectrum, where the total
deviation of the root mean square error of the
obtained features is unacceptable at a
signalto-noise ratio of 20 dB.</p>
      <p>Also, the analysis of the results shows that
the contrast of the recognition features of test
speech signals generated based on the
developed method without the influence of
noise is on average 3.8 times higher than that
of the method using Fourier spectral
coefficients.</p>
      <p>The authors see the further direction of
research in identifying the potential
capabilities of the developed method of speech
signal recognition in person identification
under very difficult conditions of overlapping
voices of several speakers, in particular with
similar acoustic characteristics, as well as in
selecting and justifying the criterion for
implementing recognition procedures. It
should be noted that there have been virtually
no studies of voice identification capabilities
for this most difficult case.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Anand</surname>
          </string-name>
          Babu et al.,
          <article-title>Secure Data Retrieval System using Biometric Identification</article-title>
          ,
          <source>International Conference on Data Science and Information System (ICDSIS)</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICDSIS55133.
          <year>2022</year>
          .
          <volume>9915968</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Romanovskyi</surname>
          </string-name>
          , et al.,
          <article-title>Prototyping Methodology of End-to-End Speech Analytics Software</article-title>
          ,
          <source>in: 4th International Workshop on Modern Machine Learning Technologies and Data Science</source>
          , vol.
          <volume>3312</volume>
          (
          <year>2022</year>
          )
          <fpage>76</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Iosifov</surname>
          </string-name>
          , et al.,
          <source>Transferability Evaluation of Speech Emotion Recognition Between Different Languages, Advances in Computer Science for Engineering and Education</source>
          <volume>134</volume>
          (
          <year>2022</year>
          )
          <fpage>413</fpage>
          -
          <lpage>426</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          - 04812-8_
          <fpage>35</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Iosifova</surname>
          </string-name>
          , et al.,
          <source>Analysis of Automatic Speech Recognition Methods, in: Workshop on Cybersecurity Providing in Information and Telecommunication Systems</source>
          , vol.
          <volume>2923</volume>
          (
          <year>2021</year>
          )
          <fpage>252</fpage>
          -
          <lpage>257</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>O.</given-names>
            <surname>Romanovskyi</surname>
          </string-name>
          , et al.,
          <source>Automated for Dialect Identification, IEEE Access 8 Pipeline for Training Dataset Creation</source>
          (
          <year>2020</year>
          )
          <fpage>174871</fpage>
          -
          <lpage>174879</lpage>
          . doi:
          <volume>10</volume>
          .
          <article-title>1109/ from Unlabeled Audios for Automatic ACCESS</article-title>
          .
          <year>2020</year>
          .
          <volume>3020506</volume>
          .
          <string-name>
            <surname>Speech</surname>
            <given-names>Recognition</given-names>
          </string-name>
          , Advances in [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Affect-Salient Event Computer Science for Engineering and Sequences Modelling for Continuous Education IV</article-title>
          , vol.
          <volume>83</volume>
          (
          <year>2021</year>
          )
          <fpage>25</fpage>
          -
          <lpage>36</lpage>
          . Speech Emotion Recognition Using doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -80472-
          <issue>5</issue>
          _3
          <string-name>
            <given-names>Connectionist</given-names>
            <surname>Temporal</surname>
          </string-name>
          <string-name>
            <surname>Classification</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>O.</given-names>
            <surname>Iosifova</surname>
          </string-name>
          , et al.,
          <source>Techniques 5th International Conference on Signal Comparison for Natural Language and Image Processing (ICSIP)</source>
          (
          <year>2020</year>
          )
          <article-title>Processing</article-title>
          , in: 2nd
          <source>International 773-778. doi: 10.1109/ICSIP49896. Workshop on Modern Machine Learning 2020.9339383. Technologies and Data Science</source>
          , vol. [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hidayat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Winursito</surname>
          </string-name>
          ,
          <source>Analysis of 2631, no. I (</source>
          <year>2020</year>
          )
          <fpage>57</fpage>
          -
          <lpage>67</lpage>
          . Amplitude Threshold on Speech
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Monday</surname>
          </string-name>
          et al.,
          <source>Shared Weighted Recognition System</source>
          ,
          <source>International Continuous Wavelet Capsule Network Seminar on Application for Technology for Electrocardiogram Biometric of Information and Communication Identification</source>
          , 18th
          <source>International (iSemantic)</source>
          (
          <year>2020</year>
          )
          <fpage>449</fpage>
          -
          <lpage>453</lpage>
          . doi: Computer Conference on Wavelet Active
          <volume>10</volume>
          .1109/iSemantic50169.
          <year>2020</year>
          .
          <volume>9234214</volume>
          . Media Technology and Information [16]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhong</surname>
          </string-name>
          , W. Peng,
          <source>Research on Processing (ICCWAMTIP)</source>
          (
          <year>2021</year>
          )
          <fpage>419</fpage>
          - Speech
          <source>Emotion Recognition Technology 425. doi: 10.1109/ICCWAMTIP53232. Based on Machine Learning</source>
          ,
          <year>7th 2021</year>
          .9674078. International Conference on Information
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , et al.,
          <source>An Efficient and Privacy- Science and Control Engineering Preserving Biometric Identification (ICISCE)</source>
          (
          <year>2020</year>
          )
          <fpage>1220</fpage>
          -
          <lpage>1223</lpage>
          . doi: Scheme in Cloud Computing,
          <source>IEEE Access 10.1109/ICISCE50968</source>
          .
          <year>2020</year>
          .
          <volume>00247</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <volume>6</volume>
          (
          <year>2018</year>
          )
          <fpage>19025</fpage>
          -
          <lpage>19033</lpage>
          . doi: [17]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kashyap</surname>
          </string-name>
          , et al.,
          <source>Machine Learning10.1109/ACCESS</source>
          .
          <year>2018</year>
          .
          <volume>2819166</volume>
          .
          <article-title>Based Scoring System to Predict the Risk</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Upadhyay</surname>
          </string-name>
          et al.,
          <article-title>Biometric and Severity of Ataxic Speech Using Identification using Gait Analysis by Different Speech Tasks, IEEE Deep Learning</article-title>
          ,
          <source>Pune Section Transactions on Neural Systems and International Conference (PuneCon) Rehabilitation Engineering</source>
          <volume>31</volume>
          (
          <year>2023</year>
          ) (
          <year>2020</year>
          )
          <fpage>152</fpage>
          -
          <lpage>156</lpage>
          . doi:
          <volume>10</volume>
          .1109/PuneCon 4839-
          <fpage>4850</fpage>
          . doi:
          <volume>10</volume>
          .1109/TNSRE.
          <year>2023</year>
          .
          <volume>50868</volume>
          .
          <year>2020</year>
          .
          <volume>9362402</volume>
          . 3334718.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          , et al.,
          <source>An Efficient Biometric</source>
          [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Deep Neural Identification in Cloud Computing with Networks-based Classification Enhanced Privacy Security, IEEE Access Methodologies of Speech, Audio and 7 (</article-title>
          <year>2019</year>
          )
          <fpage>105363</fpage>
          -
          <lpage>105375</lpage>
          . doi: Music,
          <article-title>and its Integration for Audio 10</article-title>
          .1109/ACCESS.
          <year>2019</year>
          .
          <volume>2931881</volume>
          .
          <string-name>
            <surname>Metadata</surname>
            <given-names>Tagging</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J. Web</given-names>
            <surname>Eng</surname>
          </string-name>
          .
          <volume>22</volume>
          (
          <issue>1</issue>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Attallah</surname>
          </string-name>
          <article-title>, Multi-tasks Biometric System (</article-title>
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          . doi:
          <volume>10</volume>
          .13052/jwe1540- for Personal Identification,
          <source>International</source>
          <volume>9589</volume>
          .2211. Conference on Computational Science [19]
          <string-name>
            <given-names>O.</given-names>
            <surname>Lavrynenko</surname>
          </string-name>
          , et al.,
          <source>Method of Semantic and Engineering</source>
          (CSE) and
          <source>International Coding of Speech Signals based on Conference on Embedded and Empirical Wavelet Transform, 4th Ubiquitous Computing (EUC)</source>
          (
          <year>2019</year>
          ) International Conference on Advanced 110-
          <fpage>114</fpage>
          . doi:
          <volume>10</volume>
          .1109/CSE/EUC.
          <year>2019</year>
          .
          <article-title>Information and Communication 00030</article-title>
          .
          <string-name>
            <surname>Technologies</surname>
          </string-name>
          (AICT) (
          <year>2021</year>
          )
          <fpage>18</fpage>
          -
          <lpage>22</lpage>
          . doi:
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Aliaskar</surname>
          </string-name>
          et al.,
          <source>Human Voice 10.1109/AICT52120</source>
          .
          <year>2021</year>
          .
          <volume>9628985</volume>
          .
          <article-title>Identification Based on the Detection of [20]</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Dutt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gader</surname>
          </string-name>
          ,
          <source>Wavelet Fundamental Harmonics, 7th Multiresolution Analysis Based Speech International Energy Conference Emotion Recognition System Using 1D (ENERGYCON)</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi: CNN LSTM Networks,
          <source>Transactions on 10.1109/energycon53164</source>
          .
          <year>2022</year>
          .
          <volume>9830471</volume>
          .
          <string-name>
            <surname>Audio</surname>
            , Speech,
            <given-names>and Language</given-names>
          </string-name>
          <string-name>
            <surname>Proces</surname>
          </string-name>
          .
          <volume>31</volume>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kethireddy</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Mel-Weighted</surname>
          </string-name>
          (
          <year>2023</year>
          )
          <fpage>2043</fpage>
          -
          <lpage>2054</lpage>
          . doi: Single Frequency Filtering Spectrogram
          <volume>10</volume>
          .1109/TASLP.
          <year>2023</year>
          .
          <volume>3277291</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <source>Research on Extracting Transformation Optimized by ConvoAlgorithm of Speech Eigenvalue Based lutional Autoencoders</source>
          ,
          <source>Transact. Neural on Wavelet Packet Transform and Netw. Learn. Syst</source>
          .
          <volume>34</volume>
          (
          <issue>3</issue>
          ) (
          <year>2023</year>
          )
          <fpage>1395</fpage>
          -
          <string-name>
            <surname>Gammatone</surname>
            <given-names>Filter</given-names>
          </string-name>
          ,
          <source>3rd Information 1405. doi: 10</source>
          .1109/TNNLS.
          <year>2021</year>
          .31053 Technology, Networking,
          <source>Electronic and 67. Automation Control Conference (ITNEC)</source>
          [30]
          <string-name>
            <given-names>O.</given-names>
            <surname>Lavrynenko</surname>
          </string-name>
          , et al.,
          <source>Remote Voice User</source>
          (
          <year>2019</year>
          )
          <fpage>165</fpage>
          -
          <lpage>169</lpage>
          . doi:
          <volume>10</volume>
          .1109/ITNEC. Verification System for Access to IoT
          <source>2019.8729292. Services Based on 5G Technologies</source>
          , 12th
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>O.</given-names>
            <surname>Lavrynenko</surname>
          </string-name>
          , et al.,
          <source>A Method for International Conference on Intelligent Extracting the Semantic Features of Data Acquisition and Advanced Speech Signal Recognition Based on Computing Systems: Technology and Empirical Wavelet Transform</source>
          ,
          <string-name>
            <surname>Applications</surname>
          </string-name>
          (
          <year>2023</year>
          )
          <fpage>1042</fpage>
          -
          <lpage>1048</lpage>
          . doi: Radioelectron.
          <source>Comput. Syst</source>
          .
          <volume>107</volume>
          (
          <issue>3</issue>
          )
          <fpage>10</fpage>
          .1109/IDAACS58523.
          <year>2023</year>
          .
          <volume>10348955</volume>
          . (
          <year>2023</year>
          )
          <fpage>101</fpage>
          -
          <lpage>124</lpage>
          . doi:
          <volume>10</volume>
          .32620/reks. [31]
          <string-name>
            <given-names>O.</given-names>
            <surname>Veselska</surname>
          </string-name>
          , et al.,
          <source>A Wavelet-Based</source>
          <year>2023</year>
          .
          <volume>3</volume>
          .09. Steganographic Method for Text Hiding
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>G.</given-names>
            <surname>Frusque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Fink</surname>
          </string-name>
          ,
          <source>Learnable Wavelet in an Audio Signal, Sensors</source>
          <volume>22</volume>
          (
          <issue>15</issue>
          )
          <article-title>Packet Transform for Data-Adapted (</article-title>
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          . doi:
          <volume>10</volume>
          .3390/s22155832. Spectrograms, International Conference [32]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kuzmin</surname>
          </string-name>
          , et al.,
          <article-title>Empirical Data on Acoustics, Speech and Signal Approximation Using Three-DimenProcessing (</article-title>
          <year>2022</year>
          )
          <fpage>3119</fpage>
          -
          <lpage>3123</lpage>
          . doi: sional
          <string-name>
            <surname>Two-Segmented</surname>
            <given-names>Regression</given-names>
          </string-name>
          , 3rd
          <volume>10</volume>
          .1109/ICASSP43922.
          <year>2022</year>
          .9747491. KhPI Week on Advanced Technology
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , et al.,
          <string-name>
            <given-names>A Spectrum</given-names>
            <surname>Adaptive</surname>
          </string-name>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1109/KhPIWeek Segmentation Empirical Wavelet 57572.
          <year>2022</year>
          .
          <volume>9916335</volume>
          .
          <article-title>Transform for Noisy and Nonstationary Signal Processing</article-title>
          ,
          <source>IEEE Access 9</source>
          (
          <year>2021</year>
          )
          <fpage>106375</fpage>
          -
          <lpage>106386</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2021</year>
          .
          <volume>3099500</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>R.</given-names>
            <surname>Odarchenko</surname>
          </string-name>
          , et al.,
          <article-title>Empirical Wavelet Transform in Speech Signal Compression Problems</article-title>
          , 8th International Conference on Problems of Infocommunications, Science and Technology (
          <year>2021</year>
          )
          <fpage>599</fpage>
          -
          <lpage>602</lpage>
          . doi:
          <volume>10</volume>
          .1109/PICST54195.
          <year>2021</year>
          .
          <volume>9772156</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <source>Multiple Vowels Repair Based on Pitch Extraction and Line Spectrum Pair Feature for Voice Disorder, J. Biomedical Health Inform</source>
          .
          <volume>24</volume>
          (
          <issue>7</issue>
          ) (
          <year>2020</year>
          )
          <fpage>1940</fpage>
          -
          <lpage>1951</lpage>
          . doi:
          <volume>10</volume>
          .1109/JBHI.
          <year>2020</year>
          .
          <volume>2978103</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>F.</given-names>
            <surname>Costa</surname>
          </string-name>
          , et al.,
          <source>Wavelet-Based Harmonic Magnitude Measurement in the Presence of Interharmonics, Transactions on Power Delivery</source>
          <volume>38</volume>
          (
          <issue>3</issue>
          ) (
          <year>2023</year>
          )
          <fpage>2072</fpage>
          -
          <lpage>2087</lpage>
          . doi:
          <volume>10</volume>
          .1109/TPWRD.
          <year>2022</year>
          .
          <volume>3233583</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>A Framework of Adaptive Multiscale Wavelet Decomposition for Signals on Undirected Graphs</article-title>
          ,
          <source>Transactions on Signal Proces</source>
          .
          <volume>67</volume>
          (
          <issue>7</issue>
          ) (
          <year>2019</year>
          )
          <fpage>1696</fpage>
          -
          <lpage>1711</lpage>
          . doi:
          <volume>10</volume>
          .1109/TSP.
          <year>2019</year>
          .
          <volume>2896246</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Saniie</surname>
          </string-name>
          ,
          <source>Massive Ultrasonic Data Compression Using Wavelet Packet</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>