=Paper= {{Paper |id=Vol-3396/paper15 |storemode=property |title=Differential Microphone Array Speech Enhancement Based on Post - Filtering |pdfUrl=https://ceur-ws.org/Vol-3396/paper15.pdf |volume=Vol-3396 |authors=Quan Trong The |dblpUrl=https://dblp.org/rec/conf/colins/The23 }} ==Differential Microphone Array Speech Enhancement Based on Post - Filtering== https://ceur-ws.org/Vol-3396/paper15.pdf
Differential Microphone Array Speech Enhancement Based on
Post - Filtering
Quan Trong The
Digital Agriculture Cooperative, Cau Giay, Ha Noi, Viet Nam.


                 Abstract
                 Noise suppression has become an essential requirement for numerous acoustic devices, mobile
                 phones and surveillance equipment. One standard criterion for almost digital signal processing
                 is the saving of the target speech component while eliminating all background noise and
                 interferences. Dual - microphone system is one of the most basic elements, which is widely
                 commonly installed in speech applications. However, the designed signal processing faces
                 many complex challenges in extracting the directional sound source. In this paper, the author
                 proposed using an additive post-filtering to improve the author’s previous research’s
                 performance. The evaluated experiment has confirmed the desired noise reduction to 5.5 (dB)
                 and increasing the speech quality in terms of the signal-to-noise ratio from 3.2 (dB) to 5.7 (dB).

                 Keywords 1
                 microphone array, the signal-to-noise ratio, speech enhancement, noise reduction, dual -
                 microphone system, post - filtering

1. Introduction




 Figure 1: The complicated preserving of the target speaker real-life

   The demand of noise suppression in almost speech processing applications like speech recognition,
hearing aids, cell phone, surveillance equipment, smart home applications to work anytime and

COLINS-2023: 7th International Conference on Computational Linguistics and Intelligent Systems, April 20–21, 2023, Kharkiv, Ukraine
EMAIL: quantrongthe1984@gmail.com
ORCID: 0000 - 0002 - 2456 - 9598
              ©️ 2020 Copyright for this paper by its authors.
              Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)
anywhere, makes them has the capacity of reducing the effect of annoying disturbances, like
background noise, interferences, third - party or surrounding vehicle transport. To decrease the
degradation of desired target talker in terms of speech quality, speech intelligibility, the single and multi
- microphone noise reduction approach are often applied for signal processing. Spectral estimation
technique is the most widely implemented in single - channel techniques, such as: subspace method,
Wiener filter and spectral subtraction, are based on calculation of the noise power with assumption that
the background noise are stationary, or under the situation with ambient speech noise.




 Figure 2: The using microphone array for extracting the desired talker

   The microphone array beamforming [1-20] exploits the priori spatial information of the direction of
speaker, the properties of recording environment, the direction of arrival of useful interest signal and
the geometry of MA’s configuration. Due to the possibility to perform spatial beamforming, MA’s
algorithm has a better ability to alleviate the noise level and save the speech component. For multi-
microphone noise suppression, using spatial information to obtain the advantage of preserving the target
signal while eliminating noisy environment. Besides, MA beamforming can be applied the pre -
processing, post - filtering methods to achieve noise reduction and decrease the speech distortion.




 Figure 3: The implementation of microphone array in time - frequency domain
    Dual - microphone system (DMA2) has numerous advantages for signal processing. DMA2 owns
compact, easy to implement MA digital signal processing. In the previous work [23], the author
suggested an effective method for separating each target, which stands at two opposite directions.
However, in the real-life world, due to its undetermined recording situations, the performance also
corrupted. In this contribution, the author proposed using an additive post - filtering to block the
remaining noisy component after using [23]. The numerical results have rated the better performance
of noise suppression to 5.5 (dB) and increasing of the speech quality from 3.2 to 5.7 (dB). The purpose
of this article is demonstrating a new effective technique to enhance speech enhancement, which based
on DMA2.
    The remaining section of this article is organized as follows. The next section describes the signal
model of DMA2 and the author’s previous evaluation. Section III demonstrated additive post - Filtering,
which uses MMSE estimator. Section IV shows the illustrated experiment and Section V concludes the
purpose of this paper.

2. The signal model
    A scheme of principal working of a certain differential microphone array (DMA2) [24 - 30] is
illustrated in Figure 4. DMA2 owns high directional beampattern, high noise reduction and easy
implemented to form a beampattern toward the sound source. DMA2 has a compact size, and is very
suitable for microphone array technique, digital signal processing method to block surrounding noise
environment and save the target desired speaker.




 Figure 4: A certain differential microphone array

   With the definition 𝑓, 𝑘 is the frequency index and current considered frame. We denote the speech
propagation of sound source is 𝑐 (343 𝑚/𝑠), 𝑑 is distance between two installed microphones, 𝜏0 =
𝑑/𝑐 is the sound delay, the direction of arrival of useful signal is 𝜃, 𝛷𝑠 = 𝜋𝑓𝜏0 𝑐𝑜𝑠(𝜃). The
representation of two captured microphone array signals in the frequency-domain defined can be
expressed as the following way:

                                   𝑋1 (𝑓, 𝑘) = 𝑆(𝑓, 𝑘)𝑒 𝑗𝛷𝑠                                        (1)
                                   𝑋2 (𝑓, 𝑘) = 𝑆(𝑓, 𝑘)𝑒 −𝑗𝛷𝑠                                       (2)

    With a defined time, delay 𝜏 is added, the directivity pattern of the processed signal is obtained by
determined value of 𝜏. DMA2 is used for extracting two directional different speakers at opposite
directions. The output of DMA, which exploits subtraction signal between 𝑋1 (𝑓, 𝑘), 𝑋2 (𝑓, 𝑘) can be
illustrated that:
                                         𝑋1 (𝑓, 𝑘) − 𝑋2 (𝑓, 𝑘)𝑒 −𝑗𝜔𝜏                       (3)
                            𝑌1𝐷𝐼𝐹 (𝑓, 𝑘) =
                                                      2
                                                       𝜔𝜏0           𝜏                     (4)
                                       = 𝑗𝑆(𝑓, 𝑘)𝑠𝑖𝑛 (      (𝑐𝑜𝑠𝜃 + ))
                                                        2            𝜏0

                                          𝑋2 (𝑓, 𝑘) − 𝑋1 (𝑓, 𝑘)𝑒 −𝑗𝜔𝜏                      (5)
                            𝑌2𝐷𝐼𝐹 (𝑓, 𝑘) =
                                                       2
                                                         𝜔𝜏0          𝜏                    (6)
                                       = −𝑗𝑆(𝑓, 𝑘)𝑠𝑖𝑛 (       (𝑐𝑜𝑠𝜃 − ))
                                                          2           𝜏0




 Figure 5: Shapes of beampattern. 𝒅 = 𝟓(𝒄𝒎), 𝒇 = 𝟏𝟓𝟎𝟎 (𝑯𝒛)


   The resulting obtained beampattern toward two directional talkers have the high pattern, high
resolution and can be expressed:

                                     𝑌1 (𝑓, 𝑘)          𝜔𝜏0        𝜏                       (7)
                        𝐵1 (𝑓, 𝜃) = |         | = |𝑠𝑖𝑛 (    (𝑐𝑜𝑠𝜃 + ))|
                                      𝑆(𝑓, 𝑘)            2         𝜏0
                                     𝑌2 (𝑓, 𝑘)          𝜔𝜏0        𝜏                       (8)
                        𝐵2 (𝑓, 𝜃) = |         | = |𝑠𝑖𝑛 (    (𝑐𝑜𝑠𝜃 − ))|
                                      𝑆(𝑓, 𝑘)            2         𝜏0

   In the previous research, [10] proposed the use of an additive equalizer.

                                        6     0 (𝐻𝑧) < 𝑓 < 200 (𝐻𝑧)                        (9)
                                        1
                                                200 (𝐻𝑧) < 𝑓 ≤ 𝐹𝑐
                                         𝜋𝑓
                        𝐻𝑒𝑞 (𝑓) =   𝑠𝑖𝑛 (2 )
                                           𝑓𝑐
                                         1       𝐹𝑐 < 𝑓 ≤ 2𝐹𝑐
                                  {      0       2𝐹𝑐 < 𝑓

                1
   where 𝐹𝑐 = 4𝜏 .
                    0
   The value of 𝐻𝑒𝑞 (𝑓) is limited with a determined threshold 12(dB). This equalizer ensures deriving
desired signal.
   So finally, the received signals are:

                                𝑌1 (𝑓, 𝑘) = 𝑌1𝐷𝐼𝐹 (𝑓, 𝑘) × 𝐻𝑒𝑞 (𝑓)                                 (10)
                                𝑌2 (𝑓, 𝑘) = 𝑌2𝐷𝐼𝐹 (𝑓, 𝑘) × 𝐻𝑒𝑞 (𝑓)                                 (11)



3. The suggested post - Filtering
    The central ideal of suggested post - Filtering is based on the estimation of noise power. The MMSE
estimator [21] is used for estimation a spectral gain:

                                                                              1                    (12)
                               √𝑣 (𝑓, 𝑘)        𝛼      𝛼               𝛼
                  𝐺𝐻1 (𝑓, 𝑘) =           [Г (1 + ) 𝑀 (− ; 1; −𝑣(𝑓, 𝑘))]
                                𝛾(𝑓, 𝑘)         2      2

                               𝜉(𝑓,𝑘)
   Where 𝑣(𝑓, 𝑘) ≜ 𝛾(𝑓, 𝑘) 𝜉(𝑓,𝑘)+1; 𝑀(𝛼; 𝑐; 𝑥) is the confluent hypergeometric function. The author
proposed the calculation of a priori SNR 𝜉(𝑓, 𝑘) and a posteriori SNR 𝛾(𝑓, 𝑘) as the following
equations:

                                               𝐸[|𝑌1 (𝑓, 𝑘)|2 ]                                    (13)
                                  𝜉(𝑓, 𝑘) =
                                              𝐸[|𝑌2𝐷𝐼𝐹 (𝑓, 𝑘)|2 ]
                                                 𝐸[|𝑋 (𝑓,𝑘)|2 ]                                    (14)
                                    𝛾(𝑓, 𝑘) = 𝐸[|𝑌 1 (𝑓,𝑘)|2 ]
                                                   2𝐷𝐼𝐹


   With a defined appropriate value 𝛼, the obtained gain function by MMSE estimator can be used as
an effective post - Filtering. In single - channel approach, the 𝐺𝐻1 (𝑓, 𝑘) is applied to gain the desired
speech component while suppressing noise level. In the next section, this post - Filtering has the ability
of preserving the target speaker while decreasing the background noise and enhancing the overall
performance.

4. Experiments




 Figure 6: The illustrated experiment with DMA2
    In this section, the author illustrated an experiment to enhance the performance of DMA2 in noise
reduction. This experiment is conducted in a living room, where in presence of annoying background
noise, diffuse noise field. The speaker in stand at 𝐿 = 2(𝑚) to the DMA2. For further to rate the
performance, an objective measurement [22] is used for calculating the speech quality of the previous
work and additive post - Filtering. Two microphone array signals are sampled at 𝐹𝑠 = 16 𝑘𝐻𝑧, and
transformed in the frequency domain with these parameters: 𝑁𝐹𝐹𝑇 = 512, overlap 50%.
    The waveform of the original microphone array signal can be expressed in Figure 7. From 0 - 1.4
(s), the speech component of desired talker exits, and from 1.6 - 3 (s), there are only direction noise
source.




 Figure 7: The waveform of microphone array signal

   Using [23], the obtained waveform is shown in Figure 8.




 Figure 8: The waveform of processed by the author’s previous research [23]

   By using the post - Filtering, the effectiveness of noise reduction can be obtained. The processed
signal is shown in Figure 9.
 Figure 9: The waveform of processed by the additive post - Filtering

   The overall energy of microphone array signal, the processed signals by [23], and post – Filtering is
depicted in Figure 10.




 Figure 10: The energy of microphone array signal, the processed signal by the author’s previous
 work and the additive post - Filtering

   As we can see that, the advantage of post - Filtering is presented. The achieved noise reduction is to
5.5 (dB), and the speech quality in the terms of the signal - to - noise ratio (SNR) is increased from 3.2
(dB) to 5.7 (dB).

   Table 1.
   The signal - to - noise ratio (dB)
    Method Estimation            Microphone array           The author’s            The additive post -
                                     signal               previous work                Filtering
    NIST STNR                           4.8                     21.0                       24.2
    WADA SNR                            2.8                     16.1                       21.8

   An essential problem is almost signal processing is achievement of more robust noise reduction.
DMA2 is one of the most widely common installed in numerous speech applications, such as mobile
phone, surveillance equipment, smart home, teleconferencing due to its compact. Therefore, an
effective post - Filtering for DMA2 is an attractive research direction. In this section, the improvement
of noise reduction has been confirmed. The numerical results show that proposed post - Filtering allows
obtaining better performance in DMA2.

5. Conclusion
   DMA2 has the capacity of compact arrangements, low size and promise high directional
beampattern, high gain the output signal in comparison with other MA beamforming. However, DMA
owns its drawback that even the complex noisy environment can corrupt its performance. For many
several speech applications, decreasing speech distortion or noise suppression is always a considered
problem. In this research, the author has presented and demonstrated a post - Filtering for enhancing
the DMA2’s performance in realistic recording scenario. The author has shown how the noisy
component at certain direction can be suppressed, and the speech quality of DMA2’s evaluation was
increased in comparison with the author’s previous work. This post - Filtering can be integrated into
multi-microphone system, which use other different MA beamforming.

6. Acknowledgements
   This research was supported by Digital Agriculture Cooperative. The author thanks our colleagues
from Digital Agriculture Cooperative, who provided insight and expertise that greatly assisted the
research.

7. References
[1] Albertini D., Bernardini A., Borra F., Antonacci F., Sarti A. Two-Stage Beamforming With
    Arbitrary Planar Arrays of Differential Microphone Array Units. IEEE/ACM Transactions on
    Audio, Speech, and Language Processing. Pp: 590 - 602. DOI: 10.1109/TASLP.2022.3231719.
[2] Huang W., Feng J. Robust Steerable Differential Beamformer for Concentric Circular Array With
    Directional Microphones // Proc 2022 Asia-Pacific Signal and Information Processing Association
    Annual            Summit           and           Conference           (APSIPA           ASC).
    DOI: 10.23919/APSIPAASC55919.2022.9980184.
[3] JWang J., Yang F., Yang J. Insights Into the MMSE-Based Frequency-Invariant Beamformers for
    Uniform Circular Arrays. IEEE Signal Processing Letters. Pp(s): 2432 – 2436.
    DOI: 10.1109/LSP.2022.3224687.
[4] Ueno N., Kameoka H. Multiple Sound Source Localization Based on Stochastic Modeling of Spatial
    Gradient Spectral // Proc 2022 30th European Signal Processing Conference (EUSIPCO).
    DOI: 10.23919/EUSIPCO55093.2022.9909524.
[5] Yan L., Huang W., Kleijn W.B., Abhayapala T. D. Phase Error Analysis for First-Order
    Linear Differential Microphone Arrays // Proc 2022 International Workshop on Acoustic Signal
    Enhancement (IWAENC). DOI: 10.1109/IWAENC53105.2022.9914748.
[6] Itzhak G., Cohen I. Differential and Constant-Beamwidth Beamforming with Uniform
    Rectangular Arrays. 2022 International Workshop on Acoustic Signal Enhancement (IWAENC).
    DOI: 10.1109/IWAENC53105.2022.9914769.
[7] Huang G., Benesty J., Chen J. Fundamental Approaches to Robust Differential Beamforming With
    High Directivity Factors. IEEE/ACM Transactions on Audio, Speech, and Language Processing.
    Pp(s): 3074 - 3088. DOI: 10.1109/TASLP.2022.3209935.
[8] Jin J., Benesty J., Huang H., Chen J. On Differential Beamforming With Nonuniform
    Linear Microphone Arrays. IEEE/ACM Transactions on Audio, Speech, and Language Processing.
    Pp(s): 1840 - 1852. DOI: 10.1109/TASLP.2022.3178229.
[9] Yang X., Wei J. DMANET: Deep Learning-Based Differential Microphone Arrays for Multi-
    Channel Speech Separation. ICASSP 2022 - 2022 IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP). DOI: 10.1109/ICASSP43922.2022.9747725.
[10] Wang X., Cohen I., Benesty J., Chen J. Study of the Null Directions on The Performance
     of Differential Beamformers. ICASSP 2022 - 2022 IEEE International Conference on Acoustics,
     Speech and Signal Processing (ICASSP). DOI: 10.1109/ICASSP43922.2022.9746462.
[11] Tu Q., Chen H. Theoretical Lower Bounds on the Performance of the First-
     Order Differential Microphone Arrays With Sensor Imperfections. IEEE/ACM Transactions on
     Audio, Speech, and Language Processing. Pp(s): 785 – 801. DOI: 10.1109/TASLP.2022.3145317.
[12] Shi C., Du F., Liu C., Li H. Differential Error Feedback Active Noise Control With the Auxiliary
     Filter Based Mapping Method. IEEE Signal Processing Letters. Pp(s): 573 – 577.
     DOI: 10.1109/LSP.2022.3144839.
[13] Itzhak G., Cohen I., Benesty J. Robust Differential Beamforming with Rectangular Arrays // Proc
     2021        29th       European        Signal      Processing       Conference      (EUSIPCO).
     DOI: 10.23919/EUSIPCO54536.2021.9616085.
[14] Zhang P., Gao J., Bian Y., Huang Y. A Preliminary on the Sound Pickup for Unmanned Aerial
     Vehicles Using Differential Microphone Arrays. 2021 4th International Conference on
     Information          Communication           and       Signal         Processing       (ICICSP).
     DOI: 10.1109/ICICSP54369.2021.9611979.
[15] Chen Z., Chen H., Tu Q. Sensor Imperfection Tolerance Analysis of Robust
     Linear Differential Microphone Arrays. IEEE/ACM Transactions on Audio, Speech, and
     Language Processing. Pp(s): 2915 – 2929. DOI: 10.1109/TASLP.2021.3110136.
[16] Yu G., Qiu Y., Wang N. A Robust Wavenumber-Domain Superdirective Beamforming for
     Endfire Arrays. IEEE Transactions on Signal Processing Page(s): 4890 – 4905.
     DOI: 10.1109/TSP.2021.3105754.
[17] Huang G., Wang Y., Benesty J., Cohen I., Chen J. Combined Differential Beamforming With
     Uniform Linear Microphone Arrays. ICASSP 2021 - 2021 IEEE International Conference on
     Acoustics, Speech and Signal Processing (ICASSP). DOI: 10.1109/ICASSP39728.2021.9414189.
[18] Zhao X., Huang G., Benesty J., Chen J., Cohen I. On the Design of
     Square Differential Microphone Arrays with a Multistage Structure. ICASSP 2021 - 2021 IEEE
     International Conference on Acoustics, Speech and Signal Processing (ICASSP).
     DOI: 10.1109/ICASSP39728.2021.9413759.
[19] Borra F., Bernardini A., Bertuletti I., Antonacci F., Sarti A. Arrays of First-Order
     Steerable Differential Microphones. ICASSP 2021 - 2021 IEEE International Conference on
     Acoustics, Speech and Signal Processing (ICASSP). DOI: 10.1109/ICASSP39728.2021.9413476.
[20] Wang X., Huang G., Cohen I., Benesty J., Chen J. Robust Steerable Differential Beamformers with
     Null Constraints for Concentric Circular Microphone Arrays. ICASSP 2021 - 2021 IEEE
     International Conference on Acoustics, Speech and Signal Processing (ICASSP).
     DOI: 10.1109/ICASSP39728.2021.9414119.
[21] Ephraim Y., Malah D. Speech enhancement using minimum mean-square error log-spectral
     amplitude estimator. IEEE Trans. ASSP, vol. ASSP-33, no. 2, pp. 443–445, 1985.
     DOI: 10.1109/TASSP.1985.1164550.
[22] https://labrosa.ee.columbia.edu/projects/snreval/
[23] Stolbov M., Tatarnikova M., The Q.T. (2018) Using Dual-Element Microphone Arrays for
     Automatic Keyword Recognition // Karpov A., Jokisch O., Potapova R. (eds) Speech and
     Computer. SPECOM 2018. Lecture Notes in Computer Science, vol 11096. Springer, Cham.
     https://doi.org/10.1007/978-3-319-99579-3-68.
[24] Ma D., Wang Y., He L., Jin M., Su D., Yu D. DP-DWA: Dual-Path Dynamic Weight Attention
     Network With Streaming Dfsmn-San For Automatic Speech Recognition. ICASSP 2022 - 2022
     IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
     DOI: 10.1109/ICASSP43922.2022.9746328.
[25] Xu X., Gu R., Zou Y. Improving Dual-Microphone Speech Enhancement by Learning Cross-
     Channel Features with Multi-Head Attention. ICASSP 2022 - 2022 IEEE International Conference
     on         Acoustics,        Speech          and       Signal        Processing       (ICASSP).
     DOI: 10.1109/ICASSP43922.2022.9746359.
[26] Xiao Z., Chen T., Liu Y., Li J., Li Z. Keystroke Recognition with the Tapping Sound Recorded by
     Mobile       Phone Microphones.         IEEE     Transactions      on     Mobile     Computing.
     DOI: 10.1109/TMC.2021.3137229.
[27] Tan K., Zhang X., Wang D.L. Deep Learning Based Real-Time Speech Enhancement for Dual-
     Microphone Mobile Phones. IEEE/ACM Transactions on Audio, Speech, and Language
     Processing. DOI: 10.1109/TASLP.2021.3082318.
[28] Tan K., Zhang X., Wang D.L. Real-Time Speech Enhancement for Mobile Communication Based
     on Dual-Channel Complex Spectral Mapping. ICASSP 2021 - 2021 IEEE International
     Conference      on     Acoustics,    Speech       and     Signal    Processing    (ICASSP).
     DOI: 10.1109/ICASSP39728.2021.9414346.
[29] Shankar N., Bhat G.S., Panahi M.S. Real-time dual-channel speech enhancement by VAD assisted
     MVDR beamformer for hearing aid applications using smartphone. 2020 42nd Annual
     International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC).
     DOI: 10.1109/EMBC44109.2020.9175212.
[30] Li H., Zhang X., Gao G. Beamformed Feature for Learning-based Dual-channel Speech
     Separation. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal
     Processing (ICASSP). DOI: 10.1109/ICASSP40776.2020.9054049.