<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ways to increase the probability of correct recognition of noisy speech commands by their cross-correlation portraits</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ekaterina Galitskaya</string-name>
          <email>katrisa@yandex.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viktor Krasheninnikov</string-name>
          <email>kvrulstu@mail.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Applied mathematics and computer science, Ulyanovsk State Technical University</institution>
          ,
          <addr-line>Ulyanovsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>66</fpage>
      <lpage>69</lpage>
      <abstract>
        <p>-Currently, the field of application of voice information-control systems is being intensively expanded, for which recognition of speech commands (SC) is necessary. This recognition is very difficult in the presence of intense acoustic noise. We consider a method for recognizing noisy SCs by cross-correlation portraits (CCP), which is used for speakerdependent recognition from a limited vocabulary of commands. In this method, SCs are converted into CCPs, which are special images. The probability of correct recognition directly depends on the choice of command standards. The standards should accurately reflect the entire class of commands, for which the library of standards is optimized. The standards are stored as CCPs. Recognized SC is converted into CCP and the closest portrait is found from the set of standard portraits. In this case, a sufficiently accurate matching of the standard and the recognized SC portraits is required. For this, two methods are proposed: phonemic alignment and variation of the boundaries of SCs, given that its boundaries can be estimated ahead or delayed. The experiments showed that the proposed modernization of the algorithm significantly increases the probability of correct recognition.</p>
      </abstract>
      <kwd-group>
        <kwd>speech command</kwd>
        <kwd>recognition</kwd>
        <kwd>standard</kwd>
        <kwd>crosscorrelation portrait</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        At present, the management of many technical systems is
impossible without the participation of a human operator,
despite the significant success of robotization. In this case, it
is desirable to facilitate the operator work using a voice
information management systems (VIMS), in which it is
possible to obtain information about the state of the system
and manage it by SC. However, SC recognition is required
for this. To date, many speech recognition systems have been
developed that are used to enter information into a computer,
control robots, etc. [
        <xref ref-type="bibr" rid="ref5">1-7</xref>
        ]. However, most of these systems are
inoperative in the presence of noise.
      </p>
      <p>
        At the same time, there is a need for VIMS operating in
conditions of very strong acoustic noise, for example,
aviation, noisy production, etc. Installing VIMS in the
cockpit can help reduce the workload of the pilot. Honeywell
has tested the VIMS on its Embraer 170 aircraft (recognition
accuracy of this VIMS is 90%) [
        <xref ref-type="bibr" rid="ref7">8</xref>
        ]. There are examples of
VIMS using in military aircraft Eurofighter Typhoon.
Lockheed Martin also developed the F-35 cab with speech
recognition. Airbus Defense and Space considered adding a
cockpit assistance system with voice recognition technology
to its recently developed Sferion helicopter [9]. The
TOUCH-FLIGHT 2 project is exploring the use of voice
control as an alternative mode of interaction between pilots
and cockpit avionics [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. These programs are developed in
English [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14">11-17</xref>
        ], which makes them impossible to use at
Russian facilities, the pilots of which communicate in
Russian. There are no open data on the using of Russian
VIMS on aircraft. Thus, the problem of creating methods and
algorithms for recognizing SC in the presence of strong
interference remains relevant.
      </p>
      <p>
        Various features of speech signals are used in recognition
algorithms: spectral analysis, wavelets, hidden Markov
chains, cepstral analysis, artificial neural networks, etc.
Typically, SC recognition systems in severe interference
conditions work with a limited dictionary. Some standards of
the SCs from this dictionary are constructed, and the
recognized SC refers to the closest of these standards. In this
paper, we consider the method for recognizing highly noisy
SCs by cross-correlation portraits (CCPs). In this method,
SCs are converted into CCPs, which are special images that
reflect the acoustic features of the SCs [
        <xref ref-type="bibr" rid="ref15 ref16 ref17">18-20</xref>
        ]. This makes it
possible to apply image processing methods to recognition of
the SC. There is an extensive literature on image processing,
for example, [
        <xref ref-type="bibr" rid="ref18 ref19 ref20 ref21 ref22">21-25</xref>
        ].
      </p>
      <p>Standard SCs are stored in the computer memory in the
form of CCPs (the standard CCPs). Recognized SC is also
converted into CCP and the nearest to standard CCPs is
located. The probability of correct recognition essentially
depends on the choice of standard SCs. Therefore, the library
of standards is to be optimized. Sufficiently accurate
matching of the portraits of the standard and the recognized
SC is required to find nearest standard CCP. For this, two
methods of refining alignment are proposed: phonemic
alignment and variation of the boundaries of the SC, given
that its boundaries can be estimated ahead or delayed. The
experiments showed that the proposed modernization of the
method significantly increases the probability of correct
recognition.</p>
      <p>II. RECOGNITION OF SPEECH COMMANDS BY THEIR
AUTOCORRELATION AND CROSS-CORRELATION PORTRAITS</p>
      <p>
        The use of autocorrelation portraits (ACPs) was proposed
in [
        <xref ref-type="bibr" rid="ref15 ref16">18,19</xref>
        ] for SCs recognizing on the background of strong
noise. Let X  { x1 , x 2 ... x N } be SC, consisting of N values.
The ACP of X is the two-dimensional array (image) R . We
divide into segments of the length
      </p>
      <p>X M  1
L  [ N /( M  1)] , where [u ] is the integer part of the
number u . Each row of R is a sequence of sample
correlation coefficients r (t , k ) of the segment
X t  x (t 1) L 1 ,..., x tL 
X :
X t ,k  x (t 1) L 1 k ,..., x tL  k  shifted by k samples relative to
and
segments
r (t , k ) 
1 L 1</p>
      <p>(  x (t 1) L  j x (t 1) L  j  k )   t  t  k ) ,
L  t t  k j  0
(1)
where t  1,..., M , k  1,..., K ,  t ,  t  k
are the sample
means, and  t2 ,  t2 k are sample variances of X t and
X t , k . Thus, the ACP is an M  K
array (image) of the
sample autocorrelation coefficients of one SC
converted into the range of brightness 0 ;255  in Fig. 1.
The image row reflects the change of the correlation
between the values of the speech signal at shifts by
k  1,..., K samples, that is, local correlations. The sequence
of rows reflects the process of changing correlations with
the time, for example, characterizes the sequence of
phonemes.</p>
      <p>It turned out, that the ACPs are individual, resistant to
noise and weakly sensitive to the pronunciation volume. The
main advantage of ACPs for SCs recognizing is strong rows
correlation, which makes it possible to use image processing
methods for filtering, recognition, etc. The standards SCs are
stored in the computer's memory as ACPs. Recognized SC is
also converted into ACP. This ACP refers to the nearest of
the standard ACPs according to some metric. The distance
between two ACPs is defined as the sum of the distances
between the corresponding rows. Any metric can be used,
which allows to determine the distance between two rows as
vectors: Euclidean, squared, angle between the vectors, etc.
When constructing the ACP, the SC is divided into M  1
segments. Each segment contains some part of SC. Due to
variability of the pronunciation rate, the same phonemes of
SC can have different row numbers in ACPs of standard and
recognized SC. As a result, the distance between these
portraits will be distorted. Therefore, the matching of
portraits rows should be made. The dynamic programming
algorithm was used for this matching according to the
criterion of the minimum distance between the matching
portraits</p>
      <p>However, there is a significant drawback: an ACP
reflects the features of one SC pronunciation. This is
noticeable in two portraits of the SC "Air Conditioning 1"
and "Air Conditioning 2" in Fig. 1, built from the
pronunciations obtained at different times. During this time,
the voice timbre of the speaker, his health status, etc. could
change significantly. The standards seemed to be “aging”, so
the ACPs of the standard SCs and the portraits of the same
recognized SC could vary significantly, which reduced the
quality of recognition. Therefore, the standards need to be
updated from time to time.</p>
      <p>
        More complete properties of SCs are presented in its
CCPs, which are built using two pronunciations [
        <xref ref-type="bibr" rid="ref17">20</xref>
        ]. Let X
and Y be two pronunciations of the same SC by one speaker
at different times. They are divided into the same number of
M  1 segments with lengths L X and LY , respectively.
Each CCP row is a sequence of sample correlation
coefficients r (t , k ) of the segment X t  x ( t 1) L X 1 ,..., x tL X  of
with the segments Yt , k  y ( t 1) LY  1  k ,..., y tL Y  k  of SC
SC X
Y :
r ( t , k ) 
      </p>
      <p>L X 1
 x ( t 1) L X  j y ( t 1) LY  j  k   X ,t  Y ,t  k
j  0</p>
      <p>L X  X ,t Y ,t  k
(2)
where t  1,..., M , k  1,..., K ,  X ,t ,  Y ,t  k
are sample
means, and  X2 ,t ,  Y2 ,t  k
are the
corresponding sample
variances. Thus, CCP is the M  K array (image) of sample
cross-correlation coefficients of two SCs X and Y . If
Y  X , then CCP coincides with ACP. Fig. 2 shows the
CCPs of SCs using two of their pronunciations with the
number of split segments (i.e. rows) M  100 and the
number of shifts (i.e. columns) K  50 . It is noticeable that
the CCPs of the various SCs are individual, which makes
them a good basis for recognition. At the same time, they to
a greater extent reflect the variability of pronunciation, as
they are built from two pronunciations, which are advisable
to take at different times. It is noticeable that the CCPs "Air
Conditioning 1 + Air Conditioning 3" and "Air Conditioning
2 + Air Conditioning 3" in Fig. 2 are less different than the
portraits ACPs " Air Conditioning 1" and "Air Conditioning
2" in Fig. 1.</p>
      <p>The standards SCs are stored in the computer's memory
as CCPs. Recognized SC is also converted into CCP in pair
with some pre-read pronunciation, for example, from the
standards. This CCP refers to the nearest of the standard
CCPs according to some metric. The distance between two
CCPs is defined as the sum of the distances between the
corresponding rows, similar to the ACPs case.</p>
      <p>III.</p>
    </sec>
    <sec id="sec-2">
      <title>METHODS TO INCREASE THE PROBABILITY OF</title>
      <p>CORRECT COMMANDS RECOGNITION</p>
      <p>
        The described recognition method gives an almost
absolute correct recognition in the noise absence. The
presence of strong noise significantly reduces it for a number
of reasons. Let us consider some of the interfering factors
and methods to reduce their influence. Some of these
methods were applied to improve the recognition of SCs by
their ACPs [
        <xref ref-type="bibr" rid="ref15 ref16 ref23 ref24">18,19,26,27</xref>
        ].
      </p>
      <p>The varying of the recognized SC boundaries. To
compare the standards with the recognized command P ,
first of all, it is necessary to determine its beginning (start)
and ending (end). At the same time, due to strong noise,
errors are inevitable: advancing or delaying. It is especially
difficult to find the ending of an SC, as it is usually
pronounced quieter than the beginning. To mitigate the
influence of these errors, trial additions and deletions of t
samples of the signal at the estimated boundaries were
applied. The value of the parameter t was chosen
empirically, taking into account the fact that too large a value
of it can change the command itself. In the process of
recognition, the command P( start ,end ) is converted into 9
commands: P( start ,end ) , P( start  t ,end ) , P( start  t ,end ) , P( start ,end  t ) ,
P( start t ,end t ) , P( start  t ,end  t ) ,
P( start ,end  t ) ,</p>
      <p>P( start  t ,end  t ) ,
P( start  t ,end  t ) , where “start” and “end” are the estimated
bounds of the command. For each of the 9 received variants
of the command, its own CCP is built. The variant that has
the smallest distance to the standard CCPs is taken as the true
CCP of the recognized SC.</p>
      <p>The CCPs width optimization. The width of the CCP
(the number of columns in the portrait) K is chosen
empirically. However, as the practice has shown, the optimal
value of the parameter K depends on the length of the SC.
Therefore, all dictionary commands were divided into groups
of approximately the same length, and each group used its
own value of this parameter.</p>
      <p>The phonemes matching. When building CCPs, the SCs
are divided into M  1 segments. Each segment contains
some part of a phoneme. Due to the variability of the
pronunciation rate, the segments of CCPs can begin with
different phonemes, so the correlation coefficient can have a
“false” value and the CCP will be distorted. To avoid this
distortion, the dynamic phonemes matching algorithm was
used. As a result, the beginning of the segment of one SC is
shifted so that this segment is maximally correlated with the
segment of the second SC.</p>
      <p>
        Optimization of the standards library. The quality of
recognition directly depends on how well the standard CCPs
present features of pronouncing commands. In this regard, an
additional problem arises of choosing the "best" standards.
To do this, first, several standards of each command are built
and directional was applied to achieve the best library of the
standard CCPs [
        <xref ref-type="bibr" rid="ref23">26</xref>
        ]. To perform this operation, it is desirable
to have a large number of recognized pronunciations SC,
which requires a large time expenditure of speakers. In
[
        <xref ref-type="bibr" rid="ref22">25,28</xref>
        ], the methods for obtaining realizations of
quasiperiodic processes in the form of autoregressive models
of cylindrical images are described. Phonemes of speech
signals are also quasiperiodic processes, which made it
possible to simulate many variants of pronouncing the SC
even from one of its real pronunciations by a speaker.
      </p>
      <p>The noise adding to the standards. The standards are
usually built in advance by pronunciations in the absence of
noise. The recognized SC contains significant noise,
therefore, its CCP inevitably differs from the standard
CCPs. Therefore, the distances between the CCPs are
distorted and the quality of recognition is reduced. To
correct the distances, the noise addition to the standard SCs
was applied before their conversion into CCPs. In this case,
the noise for the standards came from an additional
microphone far from the operator’s mouth while
pronouncing the recognized SC, which ensured the
similarity of the noise characteristics in the compared CCPs.
The disadvantage of this method is the calculation of all
noisy standards for each incoming recognized SC.</p>
      <p>IV.</p>
    </sec>
    <sec id="sec-3">
      <title>THE RESULTS OF THE EXPERIMENTS</title>
      <p>The following experiment was conducted to assess the
significance of the considered methods of the correct
recognition probability increasing. There was a dictionary
consisting of 41 SCs on aviation topics. The dictionary was
divided into 4 groups, containing 10, 5, 8, and 19 SCs,
respectively. Each SC was pronounced 30 times (in total
1230 SCs participated in recognition). The SCs were
additively noisy with the noise of an aircraft engine with a
signal-to-noise ratio of 4. When constructing the CCPs, the
first two pronunciations were chosen as standard ones. As a
result of recognition (without applying the methods
described above) 158 SCs were not recognized. Using the
methods described above, 67 of the unrecognized SC were
recognized. At the same time, the SCs recognized correctly
in the first case were also recognized by the improved
method. As a result, the probability of correct recognition
increased from 87% to 93% (significance was tested by
Student's criterion with a significance level of 0.05).</p>
      <p>V.</p>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSIONS</title>
      <p>The paper proposes the use of the conversion of the SCs
into CCPs for commands recognition on the background of
strong noise. The CCP of two SCs is two-dimensional
images, rows of which consist of cross-correlation
coefficients between these SCs. The use of two
pronunciations in the CCP allows you to take into account
the variability of pronunciations. The standard CCP is
constructed for each SC. Recognition is carried out by
comparing the CCP of the recognized SC with the standard
CCPs. The performed experiments showed that the use of
several modifications of this method significantly increases
the probability of correct recognition.</p>
    </sec>
    <sec id="sec-5">
      <title>ACKNOWLEDGMENT</title>
      <p>The reported study was funded by the RFBR, project
number 20-01-00613.
[Online].</p>
      <p>URL:
[15] N. McKeegan, “Speech recognition technology allows voice control
of aircraft systems,” 2020 [Online]. URL:
https://newatlas.com/speech-recognition-technology-allows-voicecontrol-of-aircraft-systems/7484/.</p>
      <p>URL:
[17] Speech recognition technology for air traffic controllers, 2020
[Online]. URL: https://www.internationalairportreview.com/
news/75900/voice-recognition-air-traffic/.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhdanov</surname>
          </string-name>
          , “
          <article-title>Speech input as an alternative to keyboard input,” 2020 [Online]</article-title>
          . URL: https://compress.ru/article.aspx?id=
          <fpage>11907</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>M.V.</given-names>
            <surname>Mikhaylyuk</surname>
          </string-name>
          , “
          <article-title>Ergonomic voice control interface for anthropomorphic robot</article-title>
          ,”
          <year>2020</year>
          [Online]. URL: https://cyberleninka.ru/article/n/ergonomichnyy
          <article-title>-golosovoy-interfeysupravleniya-antropomorfnym-robotom/viewer.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Gerasimov</surname>
          </string-name>
          , “
          <article-title>Smart home from Apple, Google</article-title>
          and Yandex - voice control,”
          <year>2020</year>
          [Online]. URL: https://voiceapp.ru/articles/smarthome.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>SpeechKit - Yandex</surname>
          </string-name>
          speech technology,
          <year>2020</year>
          [Online]. URL: https://yandex.ru/company/technologies/speech_technologies/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Geer</surname>
          </string-name>
          , “
          <article-title>5 impacts of speech recognition system in various fields</article-title>
          ,”
          <year>2020</year>
          [Online]. URL: https://thenextweb.com/contributors/ 2017/09/05/5
          <article-title>-impacts-speech-recognition-system-various-fields/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Rustamov</surname>
          </string-name>
          , E. Gasimov,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hasanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jahangirli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mustafayev</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Usikov</surname>
          </string-name>
          , “
          <article-title>Speech recognition in flight simulator</article-title>
          ,”
          <year>2020</year>
          [Online]. URL: https://www.researchgate.net/publication/329485063_ Speech_recognition_in_flight_simulator.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>8 Innovative Ways to Use Speech Recognition for Business, 2020 [Online]</article-title>
          . URL: https://www.transcribeme.com/blog/8
          <article-title>-innovativeways-to-use-speech-recognition-for-business.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Savvides</surname>
          </string-name>
          , “
          <article-title>Hey Siri, take off! Get ready for more-advanced planes</article-title>
          ,”
          <year>2020</year>
          [Online]. URL: https://www.cnet.com/news/ honeywell-tests
          <article-title>-gear-for-even-more-high-tech-planes/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Woodrow Bellamy III. Rockwell Collins Rapidly Advancing Cockpit Voice Recognition Technology</surname>
          </string-name>
          ,
          <year>2020</year>
          [Online]. URL: https://www.aviationtoday.com/
          <year>2014</year>
          /11/13/rockwell-collins
          <article-title>-rapidlyadvancing-cockpit-voice-recognition-technology/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gauci</surname>
          </string-name>
          , “
          <article-title>Aircraft control through the use voice commands</article-title>
          ,”
          <year>2020</year>
          [Online]. URL: https://www.um.edu.mt/newspoint/news/features/ 2019/07/aircraftcontrolthroughtheuseofvoicecommands.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Crist</surname>
          </string-name>
          , “
          <article-title>Talk to your house with these voice-activated smart-home systems</article-title>
          ,”
          <year>2020</year>
          [Online]. URL: https://www.cnet.com/news/talk-toyour
          <article-title>-house-with-these-voice-activated-smart-home-systems/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Talking to Your With Telligence Voice Control</surname>
          </string-name>
          ,
          <year>2020</year>
          [Online]. URL: https://www.transcribeme.com/blog/8
          <article-title>-innovative-ways-to-usespeech-recognition-for-business.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Speech</given-names>
            <surname>Recognition Interfaces Improve Flight Safety</surname>
          </string-name>
          ,
          <year>2020</year>
          [Online]. URL: https://spinoff.nasa.gov/Spinoff2012/t_4.html.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Pilot</given-names>
            <surname>Speech</surname>
          </string-name>
          <string-name>
            <surname>Recognition</surname>
          </string-name>
          , http://www.voiceflight.com/.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>V.R.</given-names>
            <surname>Krasheninnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.I.</given-names>
            <surname>Armer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.A.</given-names>
            <surname>Krasheninnikova</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.V.</given-names>
            <surname>Hvostov</surname>
          </string-name>
          , “
          <article-title>Recognition of noisy speech command by autocorrelation portraits,” Naukoemkie tekhnologii</article-title>
          , vol.
          <volume>9</volume>
          , pp.
          <fpage>65</fpage>
          -
          <lpage>74</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>V.R.</given-names>
            <surname>Krasheninnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.I.</given-names>
            <surname>Armer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.V.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Yu</surname>
          </string-name>
          . Lebedeva, “
          <article-title>Cross-Correlation Portraits of Voice Signals in the Problem of Recognizing Voice Commands According to Patterns,” Pattern Recognition and Image Analysis</article-title>
          , vol.
          <volume>21</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>185</fpage>
          -
          <lpage>187</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>V.A.</given-names>
            <surname>Soifer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.B.</given-names>
            <surname>Popov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.V.</given-names>
            <surname>Mysnikov</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.V.</given-names>
            <surname>Sergeev</surname>
          </string-name>
          , “
          <article-title>Computer image processing. Part I: Basic concepts and theory</article-title>
          ,” VDM Verlag,
          <year>Dr</year>
          ..
          <source>Muller</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.C.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.E.</given-names>
            <surname>Woods</surname>
          </string-name>
          , “Digital image processing,” Pearson, Prentice-Hall, New York,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>R.G.</given-names>
            <surname>Magdeev</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Tashlinskii</surname>
          </string-name>
          , “
          <article-title>Efficiency of object identification for binary images</article-title>
          ,”
          <source>Computer Optics</source>
          , vol.
          <volume>43</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>277</fpage>
          -
          <lpage>281</lpage>
          ,
          <year>2019</year>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>2412</fpage>
          -6179-2019-43-2-
          <fpage>277</fpage>
          -281.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>V.V.</given-names>
            <surname>Myasnikov</surname>
          </string-name>
          , “
          <article-title>Description of images using a configuration equivalence relation</article-title>
          ,”
          <source>Computer Optics</source>
          , vol.
          <volume>42</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>998</fpage>
          -
          <lpage>1007</lpage>
          ,
          <year>2018</year>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>2412</fpage>
          -6179-2018-42-6-
          <fpage>998</fpage>
          -1007.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>V.R.</given-names>
            <surname>Krasheninnikov</surname>
          </string-name>
          and
          <string-name>
            <surname>K.K. Vasil</surname>
          </string-name>
          <article-title>'ev, “Multidimensional Image Models</article-title>
          and Processing,”
          <source>Computer Vision in Control Systems-3. Intelligent Systems Reference Library 135</source>
          , Springer International Publishing, pp.
          <fpage>11</fpage>
          -
          <lpage>64</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>V.R.</given-names>
            <surname>Krasheninnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.A.</given-names>
            <surname>Krasheninnikova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.V.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Yu</surname>
          </string-name>
          . Lebedeva, “
          <article-title>Optimization of dictionary and model library for recognition of speech commands,” Pattern Recognition and Image Analysis</article-title>
          , vol.
          <volume>21</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>505</fpage>
          -
          <lpage>507</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>V.R.</given-names>
            <surname>Krasheninnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.V.</given-names>
            <surname>Khvostov</surname>
          </string-name>
          and
          <string-name>
            <surname>A.I. Armer</surname>
          </string-name>
          , “
          <article-title>Preparation of Templates in Speech Command Recognition by Single-</article-title>
          and DoubleChannel Scheme in Background Noise,”
          <article-title>Pattern Recognition and Image Analysis</article-title>
          , vol.
          <volume>18</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>580</fpage>
          -
          <lpage>583</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>V.R.</given-names>
            <surname>Krasheninnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.I.</given-names>
            <surname>Armer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.A.</given-names>
            <surname>Krasheninnikova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.R.</given-names>
            <surname>Derevyankin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.I.</given-names>
            <surname>Kozhevnikov</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.N.</given-names>
            <surname>Makarov</surname>
          </string-name>
          , “
          <article-title>Autoregressive Models of Speech Signal Variability in the Speech Commands Statistical Distinction,”</article-title>
          <source>Internetional Conference on Computational Science and it's Applications</source>
          , Springer-Verlag: Berlin Heidelberg, pp.
          <fpage>974</fpage>
          -
          <lpage>982</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>