=Paper=
{{Paper
|id=Vol-2590/short4
|storemode=property
|title=An Improved MVDR Filter Using Speech Presence Probability
|pdfUrl=https://ceur-ws.org/Vol-2590/short4.pdf
|volume=Vol-2590
|authors=Trong The Quan
|dblpUrl=https://dblp.org/rec/conf/micsecs/Quan19
}}
==An Improved MVDR Filter Using Speech Presence Probability==
<pdf width="1500px">https://ceur-ws.org/Vol-2590/short4.pdf</pdf>
<pre>
                An Improved MVDR Filter
             Using Speech Presence Probability

                       Quan Trong The1[0000−0002−2456−9598]
                       1
                           University ITMO, St.Petersburg, Russia
                                  quantrongthe@itmo.ru


        Abstract. This paper describes an improved minimum variance distor-
        tionless response filter in a two-microphone speech enhancement system.
        Dual-microphone system, which is one of the most basically form of mi-
        crophone array, has potentially ability of easy implementation, low cost
        of computation, exploiting of a priori spatial information. The proposed
        algorithm uses a current estimation information of target speech activ-
        ity for calculating more precisely auto and cross power spectral den-
        sities. Due to the disadvantage of the conventional algorithm is still
        existing speech distortion, the author introduces a new adaptive tech-
        nique signal processing, which is suitable for dual-microphone system.
        The proposal technique evaluated in noisy environments and compared
        with the conventional algorithm. The results show the reduction of tar-
        get speech suppression about 8dB, and the quality of estimated speech
        in the term of the signal-to-noise ratio increased about 15.2 (dB). The
        enhanced performance provided that suggested algorithm can be incor-
        porated into multi microphone signal processing system. Furthermore,
        speech presence probability intends to combine with various pre-filtering,
        post-filtering technique to obtain a certain of noise reduction. The rest
        of paper is organized as follow. In the Section 2, the scenario of dual-
        microphone system and combination with speech presence probability are
        introduced. Section 3 includes the experiments, discussion of significant
        achievement of the efficient proposed technique. Finally, Section 4 gives
        the future of the above algorithm’s development in different condition of
        noise.

        Keywords: noise reduction, microphone array, dual-microphone, min-
        imum variance distortion response, speech presence probability


1     Introduction

Almost single-channel algorithm aim using spectral subtraction at reduce back-
ground noise while maintaining useful speech component. In many speech appli-
cation, which associated with human life, such as speech coding, communication
    Copyright © 2019 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
2       Quan Trong The

system, distant conference require a high speech quality without speech dis-
tortion or delay. In real environment, the target speaker always interfered by
coherent, incoherent, diffuse noise and others unwanted acoustic sound. Due to
highly noisy environments, the recorded signals can be corrupted and it’s speech
intelligibility is affected. With single algorithm approach, the limitation is ex-
isting of original speech suppression and musical noise. Microphone array [1]
has been studied in many research articles. Since the crucial spatial information
has been exploited, many unresolved problem include attenuation desired signal,
residual noise can be easily removed. Microphone array signal processing give us
more advantages than mono system. In such scenario, the most important factor
is the spatial diversity, which obtained by geometry distribution of microphones.
The diversity is combined with some appropriate signal processing techniques to
improve captured signals, which contain unknown interferences and additional
noises.
     Minimum Variance Distortionless Response (MVDR) [2-6] is the most effec-
tive algorithm in term of noise reduction while saving the target speaker. MVDR
filter processes the input diversity, that is the direction of arrival (DOA) of useful
signal, and based on a constraint condition of minimization total output power
noise and unaffectedness on desired signal.
     However, in real application; due to the rapidly change of undetermined type
of noise or complex surroundings, MVDR filter’s evaluation has the limitation. In
this paper, that author proposes to incorporate the speech presence probability
[7-8] (SPP) into the MVDR filter to reduce speech distortion and increase the
speech quality of suggested algorithm. Objective measure used for comparing to
the conventional MVDR filter. The promising preliminary results provided that,
the suggested algorithm can be considered as pre-filtering method in various
complex equipments.


2    Combination of MVDR filter and Speech Presence
     Probability

In this section, the author presented signal processing principles of combination
between conventional MVDR filter and speech presence probability.
    A dual - microphone system (MA2), in which placed two omnidirectional
microphones, is the basic form of microphone array signal processing. The re-
ceived diversity such as coherence between two noisy signals, direction of arrival,
phase difference, power level difference, are easily need to processed to achieve
significant noise reduction in compared to single-channel method. The scheme
of digital signal processing by MA2 show in Fig 1. The desired target speaker
source in the same workplace with dual microphone and relates to axis MA2 an
angle θs .
    We’re after here denote the distance between microphones is d, the sound
speed is c (343m/s), τ0 = d/c is the sound delay. General algorithm is con-
sidered in frequency-domain with current frame k, frequency f , desired signal
X(f, k), two noisy signals were recorded Y1 (f, k), Y2 (f, k), and additive noise
              An Improved MVDR Filter Using Speech Presence Probability               3


                             Fig. 1. The scheme of MA2.


N1 (f, k), N2 (f, k). In the form of vector, the representation of short-term Fourier
transform can be expressed as:


                        Y1 (f, k) = X(f, k)ejΦs + V1 (f, k)                         (1)
                                                −jΦs
                        Y2 (f, k) = X(f, k)e           + V2 (f, k)                  (2)

    Let’s start with Y (f, k) = [Y1 (f, k) Y2 (f, k)]T , V (f, k) = [V1 (f, k) V2 (f, k)]T
and D(f, θs ) = [ejΦs e−jΦs ]T with ()T indicates transpose operator, and D(f, θs )
is phase shift vector, where Φs = πf τ0 cos(θs ). The equation (1-2) can be rewrit-
ten as:

                       Y (f, k) = X(f, k)D(f, θs ) + V (f, k)                       (3)
   All signal processing algorithm aim finding an optimal solution W (f, k) at
ensuring noise reduction and maintaining target speaker. The output signal is
obtained by multiplying the coefficients of solution with vector input signals
Y (f, k). The estimated signal X̂(f, k) given by:

                            X̂(f, k) = W H (f, k)Y (f, k)                           (4)
where ()H is the symbol of Hermitian conjugation.
    With inverse short-term Fourier and add-overlap, the output signal is trans-
formed into time domain.
    The purpose of dual-microphone system is to extract the interest signal.
Exploiting of priori diversity is main advantage of MVDR filter. MVDR ensures
minimization of the total noise power, while maintaining the undistorted desired
signal from given determined direction θs . The constraint problem leads to the
optimal solution, which can be expressed in form of vector coefficients as follows:

                                         P −1
                                           V V (f, k)D s (f )
                        W (f, k) =                                                  (5)
                                     D s (f )P −1
                                       H
                                                V V (f, k)D s (f )
  where P V V (f, k) is a cross spectral matrix of noise signals, P V V (f, k) =
E{V (f, k)V ∗ (f, k)}.
4         Quan Trong The

   Unfortunately, it’s always difficult to calculate spectral matrix P V V (f, k), so
spectral matrix of observed signals used instead of. The cross spectral matrix of
observed signals: P Y Y (f, k) = E{Y (f, k)Y ∗ (f, k)}.
   Matrix P Y Y (f, k) can be computed as:
                                                                         
                              PY1 Y1 (f, k) ∗ 1.001     PY1 Y2 (f, k)
             P Y Y (f, k) =                                                       (6)
                                  PY2 Y1 (f, k)     PY2 Y2 (f, k) ∗ 1.001
where PYi Yi (f, k), PYi Yj (f, k) are the smoothed cross-spectra:

    PYi Yj (f, k) = αPXi Xj (f, k − 1) + (1 − α)Yi∗ (f, k)Yj (f, k)    i, j ∈ {1, 2} (7)

where α is the smoothing parameter, which in the range {0...1}.
   So in conventional MVDR fitler, the coefficients become:

                                           P −1
                                             Y Y (f, k)D s (f )
                          W (f, k) =                                                (8)
                                       D s (f )P −1
                                         H
                                                  Y Y (f, k)D s (f )

    In practical implementations, the target speaker may not stay precisely, the
captured signals can be influenced by unwanted interference and can not give
accuracy direction of arrival of useful signal; furthermore, the different sensitiv-
ities, spatial location, frequency response, mismatch of microphones or errors of
calculation of steering vector can negative affect on remaining desired speech at
the output of system. This produces may led to both poor interference reduction
and target speech distortion, and hence cause performance degradation [2].
    Requirement of knowledge of speech presence probability is an essential in-
formation to estimate and control the updating rate.


                           Fig. 2. The scheme of combination.


   The author proposed an current estimation of speech presence probability to
adjust the auto and cross power spectral densities of observed signals.
              An Improved MVDR Filter Using Speech Presence Probability           5


PXi Xj (f, k) = SP P (f, k)PXi Xj (f, k−1)+(1−SP P (f, k))Xi∗ (f, k)Xj (f, k) i, j ∈ {1, 2}
                                                                                (9)
   The new adaptive suggested algorithm ensures accuracy, exactly and im-
mediately calculating according to the presence or absence of speech compo-
nents. This approach leads to decrease the speech distortion when compared to
conventional MVDR filter. Experiments have confirmed the effectiveness of the
proposed solution.


3    Experiments and results

In this section, the suggested algorithm (MVDR-SPP) is performed to deal
speech enhancement and reduce speech distortion problem in an anechoeic cham-
ber. Dual microphone were placed on a table at the center of room, distance
between two microphones was set 5(cm); a speaker stood at distance 2(m) from
dual-microphone. The purpose of the experiment was to test the MVDR-SPP
algorithm on real signals and verify the improvement of reducing speech suppres-
sion when compared to conventional MVDR filter (MVDR-CONV). The objec-
tive measure NIST STNR [9] used to measure the signal-to-noise ratio (SNR).
The scheme of the experiment is shown in Fig. 3. Two noisy recorded signals
was sampled at sampling rate 16(kHz). For calculating PSD estimation, these
necessary parameters: 512 point FFT, a Hamming window, overlap 50% were
set.


                        Fig. 3. The scheme of experiments


   The target direction was set in the direction of the speaker (Φs = −300 ).
   In Figure 4, 5; the amplitude and spectrogram of original were demonstrated.
NIST STNR measured signal-to-noise ratio was −0.1(dB). After using proposal
algorithm, amplitude and spectrogram of estimated signal shown in Figure 6, 7.
    Figure 8 shows RMS between original and processed signal by MVDR-SPP.
6      Quan Trong The


                      Fig. 4. Amplitude of original signal.


                     Fig. 5. Spectrogram of original signal.


                     Fig. 6. Amplitude of processed signal.


   The adaptive algorithm MVDR SPP allows to suppress nonstationary noise,
and prove the ability of algorithm. The noise reduction was about 33.5 dB. The
target speaker was remained.
             An Improved MVDR Filter Using Speech Presence Probability            7


                     Fig. 7. Spectrogram of processed signal.


 Fig. 8. RMS of microphone signal and MVDR SPP output signal (Φv = 00 ...600 ).


        Fig. 9. Comparison between MVDR SPP and conventional MVDR


   From Figure 9; as we can see that algorithm MVDR SPP can save the target
speech due to at these frames, the auto and cross spectral were updated according
speech presence probability; while conventional MVDR doesn’t take in account.
The advantage of MVDR SPP is increasing capability of saving speech up to
8       Quan Trong The

8dB. The improvement in speech quality presented in Table 1, the increasing of
SNR from 26.8 to 42 (dB) provided the capability of suggested algorithm.

                    Table 1. The signal-to-noise ratio SNR (dB)

           Method Estimation Original signal MVDR-CONV MVDR-SPP
             NIST STNR             4.0          26.8      42


4   Conclusions

This paper addresses the problem of enhancing a speech signal corrupted with
additive noise when observations from two microphones are available. The ex-
perimental results indicate when the spectral components of the noisy speech
changes rapidly, we need an information of speech presence probability to cal-
culate accurately the auto and cross power spectral densities. The algorithm
achieves better noise cancellation, less speech distortion, increasing efficiency up
to 8 dB and can be used as an efficient front end of speech application. The
challenge of time-varying environment is always available, the author continues
using other priori spatial diversities to enhance Minimum Variance Distortionless
Response filter in different type of noise.


References
 1. Brandstein M. and Ward D. (Eds.). Microphone Arrays: Signal Processing Tech-
    niques and Applications, Springer, 2001.
 2. Ehrenberg L. et al.: Sensitivity Analysis of MVDR and MPDR Beamformers/
    IEEE 26-th Convention of Electrical and Electronics Engineers in Israel, 2010, pp.
    416-420.
 3. Lockwood, M. et al.: Performance of time- and frequency-domain binaural beam-
    formers based on recorded signals from real rooms. J. Acoust. Soc. Am. 115 (1),
    pp. 379-391, (2004).
 4. Stolbov, M., The, Q. Study of MVDR dual-microphone algorithm for speech en-
    hancement in coherent noise presence. Scientific and Technical Journal of Infor-
    mation Technologies, Mechanics and Optics, 2019, vol. 19, no.1, pp. 180–183(in
    Russian).
 5. Souden M., Benesty J., Affes S., A study of the LCMV and MVDR noise reduction
    filters, IEEE Trans.Signal Process., vol. 58, pp. 4925–4935, Sept. 2010.
 6. Stolbov, M., Quan Trong The.: Dual-Microphone Speech Enhancement System At-
    tenuating both Coherent and Diffuse Background Noise In: A. A. Salah et al.(Eds.)
    Proc SPECOM 2019.
 7. Gerkmann T. Unbiased MMSE-Based Noise Power Estimation with Low Complex-
    ity and Low Tracking Delay, IEEE TASL, 2012.
 8. Gerkmann T., Hendriks R. Noise Power Estimation Based on the Probability of
    Speech Presence, WASPAA 2011.
 9. https://labrosa.ee.columbia.edu/projects/snreval/.

</pre>