=Paper= {{Paper |id=Vol-3695/p08 |storemode=property |title=Techniques for Recognising and Classifying Environmental Noise Using Deep Learning |pdfUrl=https://ceur-ws.org/Vol-3695/p08.pdf |volume=Vol-3695 |authors=Ludovica Beritelli,Maria Grazia Borzì,Cristian Randieri,Roberta Avanzato,Francesco Beritelli |dblpUrl=https://dblp.org/rec/conf/system/BeritelliBRAB23 }} ==Techniques for Recognising and Classifying Environmental Noise Using Deep Learning== https://ceur-ws.org/Vol-3695/p08.pdf
                         Techniques for Recognising and Classifying Environmental Noise
                         Using Deep Learning
                         Ludovica Beritelli1 , Maria Grazia Borzì1 , Cristian Randieri2 , Roberta Avanzato1 and Francesco Beritelli1
                         1
                             Department of Electrical, Electronic and Computer Engineering University of Catania, Catania, Italy
                         2
                             Università degli Studi e-Campus, Novedrate (CO), Italy


                                             Abstract
                                             Increasing urbanisation poses new challenges in mitigating noise pollution and preserving quality of life. In this study, we present an
                                             innovative approach for the classification of environmental noise, exploiting advanced Deep Learning (DL) techniques. By merging
                                             three different public datasets, we created a unified corpus to train and test a convolutional neural network (CNN), with the aim
                                             of efficiently recognising and classifying various noise events. The proposed approach overcomes the limitations of conventional
                                             methodologies, avoiding the need for data pre-processing that could alter sound characteristics. The experimental results demonstrate
                                             a significant improvement in classification accuracy, reaching 96.93% with the test set and 100% by applying a post-processing filter.
                                             These results emphasise the potential of DL in the treatment of environmental noise, offering new perspectives for signal processing
                                             and telecommunications.

                                             Keywords
                                             Environmental Noise Classification, Convolutional Neural Networks, Signal Processing, Noise Pollution



                         1. Introduction                                                                                                          achieved 99.72% accuracy. These results underline the signif-
                                                                                                                                                  icant potential of the deep learning approach in identifying
                         The search for sustainable solutions to mitigate the impact                                                              sounds harmful to the environment. Another contribution
                         of environmental noise has become crucial to preserving                                                                  is made by Jeon et al. [13], proposing a multi-channel indoor
                         the quality of life in our increasingly urban society. In this                                                           noise database for the development and evaluation of speech
                         context, the recognition and classification of environmental                                                             processing algorithms. This database includes noise signals
                         noise emerge as key challenges in the field of signal process-                                                           generated by physical actions and loudspeakers placed in
                         ing and telecommunications, where noise can significantly                                                                various locations within an apartment building, allowing for
                         degrade the quality and intelligibility of transmitted signals                                                           a wide range of noise conditions. A further study, conducted
                         [1]. Recently, the advancement of machine learning (ML)                                                                  by Ramli et al. [14], proposes a mechanism to reduce back-
                         and deep learning (DL) techniques has opened new frontiers                                                               ground noise in voice communications through the use of
                         in the accuracy of noise classification.                                                                                 a two-sensor adaptive noise canceller. This system demon-
                            Pioneering studies, such as that of Couvreur et al. [2],                                                              strated high convergence rates, significant improvements in
                         have demonstrated the effective use of hidden Markov mod-                                                                the signal-to-noise ratio, and a 65% reduction in computa-
                         els (HMMs) for the recognition of sound events, offering                                                                 tional power compared to traditional methods. The study by
                         detailed analysis of sound signals in time and frequency. De-                                                            Tsai et al. [15] analyses the spatial characteristics of urban
                         spite their effectiveness, these techniques require consider-                                                            noise using noise maps and emphasises the importance of
                         able computational resources, posing challenges in practical                                                             noise maps for a better understanding and management of
                         implementation [3, 4, 5, 6, 7, 8, 9]. In parallel, the approach                                                          urban noise.
                         by Alsouda et al. [10] presents a machine-learning-based                                                                    This study demonstrates how the application of DL tech-
                         method for urban noise identification using an inexpensive                                                               niques can offer effective solutions to the challenges of en-
                         IoT unit and Mel-frequency cepstral coefficient extraction of                                                            vironmental noise classification, with potential significant
                         audio features and supervised classification algorithms (such                                                            benefits for the telecommunication sector and society at
                         as support vector machine, k-nearest neighbours, bootstrap                                                               large. Our research opens new perspectives for the use of
                         aggregation and random forest). This approach achieved                                                                   artificial intelligence in urban noise mitigation, promoting
                         noise classification accuracy in the range of 88% to 94%.                                                                a more sustainable environment and a better quality of life.
                         The integration of HMM, fuzzy logic and neural networks                                                                     In section 2 we discuss the importance of developing ef-
                         proposed by Beritelli et al. [11] further emphasised the im-                                                             fective noise classification strategies, which are essential for
                         portance of combining different methodologies to improve                                                                 improving the quality of communication and, consequently,
                         classification accuracy on large noise databases. Further-                                                               the quality of life in urban areas.
                         more, a study conducted by Aksoy et al. [12] used advanced                                                                  In section 3, we present our innovative approach, which
                         deep learning models, including VGG-13BN, ResNet-50 and                                                                  exploits advanced DL techniques for analysing and classi-
                         DenseNet-121, to classify sounds according to their envi-                                                                fying environmental noise. We will illustrate how, through
                         ronmental relevance. The results demonstrated high accu-                                                                 the use of Convolutional Neural Networks (CNNs), our
                         racy in classifying sounds, with correctness rates of over                                                               model works directly with the raw audio data, avoiding
                         95%, highlighting in particular the VGG-13 BN model that                                                                 the loss of significant information that could result from
                                                                                                                                                  pre-processing processes.
                          SYSYEM 2023: 9th Scholar’s Yearly Symposium of Technology, Engineering                                                     In section 4, we present the results obtained from our
                          and Mathematics, Rome, December 3-6, 2023                                                                               study, demonstrating the effectiveness of the proposed
                          $ beritelli.ludovica@gmail.com (L. Beritelli);                                                                          model in classifying environmental noise. The results show
                          borzi.m@studium.unict.it (M. G. Borzì);
                                                                                                                                                  a significant improvement in classification accuracy, achiev-
                          cristian.randieri@uniecampus.it (C. Randieri);
                          roberta.avanzato@unict.it (R. Avanzato); francesco.beritelli@unict.it                                                   ing remarkable performance in the tests performed. We will
                          (F. Beritelli)                                                                                                          also discuss the impact of a post-processing filter [16] in
                                     © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribu-
                                     tion 4.0 International (CC BY 4.0).



CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                                                                                                                                             62
Ludovica Beritelli et al. CEUR Workshop Proceedings                                                                            62–67



further increasing the accuracy of the model.                         testing, ensuring an equal distribution of sound classes be-
                                                                      tween the two. The learning dataset includes classes such
                                                                      as “air_conditioner", “children_playing", and “traffic", with a
2. Environmental Noise                                                variable number of sound sequences per class. Similarly, the
                                                                      test dataset maintains a representative proportion of each
Environmental noise, defined as any unwanted sound gen-
                                                                      class, ensuring a valid evaluation of network performance.
erated by the surrounding environment, is a major source
of noise pollution. These sounds may come from natural
sources such as sea waves or from man-made sources such               3.1.1. Learning dataset
as vehicle traffic, alarms, voices and electronic devices. Ef-             • “air_conditioner": 1271 audio sequences,
fective management of such noise requires methods that                     • “children_playing": 704 audio sequences,
go beyond simply measuring sound pressure levels (dB), in-                 • “babble": 259 audio sequences,
cluding characterising and identifying the type of noise [17].
                                                                           • “car_horn": 307 audio sequences,
In the field of telecommunications, environmental noise in-
troduces significant challenges, degrading signal quality                  • “drilling": 622 audio sequences,
and compromising communication efficiency. Research has                    • “engine_idling": 704 audio sequences,
highlighted the importance of developing advanced noise                    • “jackhammer": 917 audio sequences,
reduction strategies, through the use of machine learning                  • “metro": 1800 audio sequences,
(ML) and deep learning (DL) techniques, aimed at improv-                   • “office": 1800 audio sequences,
ing the accuracy of noise classification [18]. The studies in              • “river": 1800 audio sequences,
[19, 20] have contributed greatly to the understanding of                  • “siren": 956 audio sequences,
environmental noise by providing innovative approaches                     • “square": 1800 audio sequences,
for its analysis and classification. These works emphasise                 • “street_music": 850 audio sequences,
the need for authentic and versatile databases to test and
                                                                           • “traffic": 1800 audio sequences.
develop signal processing algorithms capable of handling
the complexity of real acoustic environments [21]. The accu-
rate identification and classification of environmental noise         3.1.2. Testing dataset
not only improves the performance of telecommunication                     • “air_conditioner": 543 audio sequences,
systems but also contributes to the health and well-being                  • “children_playing": 299 audio sequences,
of individuals by reducing exposure to harmful levels of                   • “babble": 15 audio sequences,
noise. Therefore, research in this area is crucial to advance
                                                                           • “car_horn": 129 audio sequences,
the design of more resilient communication systems and to
                                                                           • “drilling": 268 audio sequences,
promote a more sustainable sound environment.
                                                                           • “engine_idling": 303 audio sequences,
                                                                           • “jackhammer": 398 audio sequences,
3. Method proposed                                                         • “metro": 600 audio sequences,
                                                                           • “office": 600 audio sequences,
Advances in Machine Learning (ML) and Deep Learning
                                                                           • “river": 600 audio sequences,
(DL) techniques have radically transformed the approach
to data analysis, allowing us to discover unexpected com-                  • “siren": 412 audio sequences,
plex patterns in audio data. In this study, we adopted an                  • “square": 600 audio sequences,
innovative methodology that exploits neural networks to                    • “street_music": 362 audio sequences,
directly process audio signals in .wav format. The aim is to               • “traffic": 600 audio sequences.
evaluate the ability of these networks to accurately classify
different sound events without resorting to pre-processing            3.2. Application of CNNs
techniques that could compromise data integrity. In the
subsection 3.1 we will describe the datasets used, the break-         Artificial intelligence (AI) represents a vast and evolving
down of these for training, validation and testing of the             field of study that aims to emulate human cognitive capa-
neural network and in 3.2 the CNN network used for the                bilities through the development of autonomous hardware
classification of ambient noise.                                      and software systems. This ambition to reflect human intel-
                                                                      ligence in machines has led to the development of technolo-
                                                                      gies capable of autonomous learning, adaptation, reasoning
3.1. Dataset
                                                                      and planning. At the heart of AI are advanced algorithms
The dataset used in this research was composed by merg-               and computational techniques, which make it possible to
ing three distinct public databases: UrbanSound [18], De-             replicate typically human behaviours, such as interaction
mand [17] and Noisex-92 [19]. This fusion created a het-              with the environment and decision-making. The applica-
erogeneous dataset that includes a wide range of sound                tions of artificial intelligence range in different fields, from
classes, specifically excluding the dog bark class from Ur-           industrial to domestic, demonstrating its potential to im-
banSound, but incorporating common ambient noise classes              prove both the activities of businesses and public adminis-
from Noisex-92 and Demand. A prepocessing phase is car-               trations and the everyday lives of people.
ried out before giving the data as input to the CNN network.             Convolutional Neural Networks (CNNs) stand out for
Specifically, the recordings were all divided into 2-second           their effectiveness in analysing visual and sound data due
sub-sequences and sampled at 22050 Hz. The dataset was                to their ability to identify complex patterns through the use
randomly divided into two different sets, one used for net-           of convolutional filters.
work training and validation and the other for network                   Our CNN architecture follows a structured model starting
                                                                      with the input layer, proceeding through convolutional and



                                                                 63
Ludovica Beritelli et al. CEUR Workshop Proceedings                                                                        62–67




          Figure 1: Accuracy and loss trends during training and validation.



activation (ReLU) layers, pooling, and culminating in a fully         convolutions (1D Convolution Layer, Batch Normalisation
connected layer for final classification. This design allows          Layer, ReLU Layer and Pooling Layer) and the last layer is
the network to process audio features from the simplest to            the output (Softmax).
the most complex, facilitating deep and robust data learning.            The neural network’s input is a vector containing se-
The detailed configuration of convolutional, pooling, and             quences of audio waveforms, each with a duration of 𝑊𝑖𝑛 =
fully connected layers provides a powerful means to extract           2 seconds. The CNN neural network determines the in-
and interpret sound features, making CNNs particularly                dex associated with one of 𝑁𝐶 = 14 different classes
suitable for the recognition and classification of complex            𝐶𝑖 (𝑖 = 1, ..., 𝑁 ) using the LogSoftMax function. The net-
sound events. Our research aims to demonstrate the effec-             work is trained by feeding RAW sequences representing
tiveness of this approach in the field of acoustic analysis,          different environmental noises.
contributing significantly to the field of signal processing
and audio classification.
   The neural network used in this study is based on the              4. Experimental Results
architecture of 1D convolutional neural networks and, in
                                                                      The validation process of our approach was carried out
particular, on the “M5 (0.5M)" model described in [16]. This
                                                                      through a rigorous experiment involving the direct input
network consists of five layers, the first four of which are




                                                                 64
Ludovica Beritelli et al. CEUR Workshop Proceedings                                                                            62–67




          Figure 2: Confusion matrix for the validation dataset.



of raw audio data, in .wav format, into the convolutional               of deep learning techniques in overcoming the challenges
neural network (CNN). Below, we present a detailed analysis             of accurately recognising complex sound events.
of the performance obtained during the different phases of
training, validation and testing of the network.
                                                                        5. Conclusion
4.1. Training and Validation                                            This study introduced a new approach for the classification
During the training phase, we observed a progressive im-                of environmental noise, exploiting the potential of Deep
provement in network performance, as illustrated in Fig. 1.             Learning techniques to address one of the most pressing
This graph shows an increase in accuracy and a decrease in              challenges in signal processing and telecommunications.
the loss function as the epochs progress, highlighting the              Through the use of a CNN trained on a unified dataset de-
effectiveness of the learning process. The dataset was split            rived from three different public sources, it is shown that
into a proportion of 70% for training and 30% for validation,           high accuracy in the classification of environmental noise
as illustrated in Section 3.1.                                          events can be achieved without the need for complex pre-
   Fig. 2 presents the confusion matrix obtained from the val-          processing. The results obtained reveal a marked improve-
idation of the model, providing a clear indication of its clas-         ment in classification accuracy, highlighting the effective-
sification capability across the different sound categories.            ness of our model both in the testing phase and in the ap-
                                                                        plication of post-processing techniques. These results not
                                                                        only confirm the value of convolutional neural networks in
4.2. Testing                                                            acoustic analysis, but also open the way for future research
The effectiveness of the model was further verified through             to explore the applicability of such methods in broader ar-
testing on a separate dataset, achieving an impressive ac-              eas, including urban noise monitoring and the improvement
curacy of 96.93%. Fig. 3 illustrates the confusion matrix               of telecommunication systems. In conclusion, our study
for this phase, confirming the network’s high accuracy in               contributes significantly to the body of research on signal
recognising environmental sounds.                                       processing, proposing an effective and efficient model for
                                                                        the classification of ambient noise, with direct implications
                                                                        for environmental sustainability and quality of life in urban
4.3. Post-Processing and Time Window                                    areas.
     Analysis
The introduction of a post-processing filter, called the “re-           References
currence filter" [16], further improved the performance of
the model. As demonstrated in Fig. 4, the accuracy of the                [1] F. Beritelli, A. Gallotta, C. Rametta, A dual stream-
system increases significantly by extending the analysis                     ing approach for speech quality enhancement of voip
time window. In particular, it can be seen that by extending                 service over 3g networks, in: 2013 18th International
the analysis beyond 28 seconds, the accuracy reaches 100%.                   Conference on Digital Signal Processing (DSP), IEEE,
   The results underline the effectiveness of our approach                   2013, pp. 1–5.
based on the use of convolutional neural networks for                    [2] C. Couvreur, V. Fontaine, P. Gaunard, C. G. Mubikang-
analysing environmental sound, highlighting the potential                    iey, Automatic classification of environmental noise




                                                                   65
Ludovica Beritelli et al. CEUR Workshop Proceedings                                                                            62–67




          Figure 3: Confusion matrix for the testing dataset.




          Figure 4: Effect of post-processing filter on accuracy as a function of time window.



     events by hidden markov models, Applied Acoustics                      Conference on Clean Electrical Power (ICCEP), IEEE,
     54 (1998) 187–206.                                                     2013, pp. 772–776.
 [3] G. Capizzi, C. Napoli, L. Paternò, An innovative hybrid            [6] V. Ponzi, S. Russo, A. Wajda, R. Brociek, C. Napoli,
     neuro-wavelet method for reconstruction of missing                     Analysis pre and post covid-19 pandemic rorschach
     data in astronomical photometric surveys, in: Arti-                    test data of using em algorithms and gmm models,
     ficial Intelligence and Soft Computing: 11th Interna-                  volume 3360, 2022, pp. 55 – 63.
     tional Conference, ICAISC 2012, Zakopane, Poland,                  [7] G. Capizzi, G. L. Sciuto, C. Napoli, M. Woźniak, G. Susi,
     April 29-May 3, 2012, Proceedings, Part I 11, Springer,                A spiking neural network-based long-term prediction
     2012, pp. 21–29.                                                       system for biogas production, Neural Networks 129
 [4] N. Brandizzi, S. Russo, R. Brociek, A. Wajda, First                    (2020) 271–279.
     studies to apply the theory of mind theory to green                [8] G. De Magistris, M. Romano, J. Starczewski, C. Napoli,
     and smart mobility by using gaussian area clustering,                  A novel dwt-based encoder for human pose estimation,
     volume 3118, 2021, pp. 71 – 76.                                        volume 3360, 2022, pp. 33 – 40.
 [5] F. Bonanno, G. Capizzi, G. Lo Sciuto, A neuro wavelet-             [9] F. Bonanno, G. Capizzi, G. L. Sciuto, C. Napoli, Wavelet
     based approach for short-term load forecasting in in-                  recurrent neural network with semi-parametric input
     tegrated generation systems, in: 2013 International                    data preprocessing for micro-wind power forecasting



                                                                  66
Ludovica Beritelli et al. CEUR Workshop Proceedings                  62–67



     in integrated generation systems, 2015, pp. 602 – 609.
     doi:10.1109/ICCEP.2015.7177554.
[10] Y. Alsouda, S. Pllana, A. Kurti, Iot-based urban noise
     identification using machine learning: performance
     of svm, knn, bagging, and random forest, in: Proceed-
     ings of the international conference on omni-layer
     intelligent systems, 2019, pp. 62–67.
[11] F. Beritelli, R. Grasso, A pattern recognition system
     for environmental sound classification based on mfccs
     and neural networks, in: 2008 2nd International Con-
     ference on Signal Processing and Communication Sys-
     tems, IEEE, 2008, pp. 1–4.
[12] B. Aksoy, U. Uygar, G. Karadağ, A. R. Kaya, Ö. Melek,
     Classification of environmental sounds with deep
     learning, Advances in Artificial Intelligence Research
     2 (2022) 20–28.
[13] K. M. Jeon, N. K. Kim, M. J. Jo, H. K. Kim, Design of
     multi-channel indoor noise database for speech pro-
     cessing in noise, in: 2017 20th Conference of the
     Oriental Chapter of the International Coordinating
     Committee on Speech Databases and Speech I/O Sys-
     tems and Assessment (O-COCOSDA), IEEE, 2017, pp.
     1–4.
[14] R. M. Ramli, A. O. A. Noor, S. Abdul Samad, Noise
     cancellation using selectable adaptive algorithm for
     speech in variable noise environment, International
     Journal of Speech Technology 20 (2017) 535–542.
[15] K.-T. Tsai, M.-D. Lin, Y.-H. Chen, Noise mapping in ur-
     ban environments: A taiwan study, Applied Acoustics
     70 (2009) 964–972.
[16] R. Avanzato, F. Beritelli, Heart sound multiclass anal-
     ysis based on raw data and convolutional neural net-
     work, IEEE Sensors Letters 4 (2020) 1–4.
[17] J. Thiemann, N. Ito, E. Vincent, The diverse envi-
     ronments multi-channel acoustic noise database (de-
     mand): A database of multichannel environmental
     noise recordings, in: Proceedings of Meetings on
     Acoustics, volume 19, AIP Publishing, 2013.
[18] J. Salamon, C. Jacoby, J. P. Bello, A dataset and taxon-
     omy for urban sound research, in: Proceedings of the
     22nd ACM international conference on Multimedia,
     2014, pp. 1041–1044.
[19] A. Varga, H. J. Steeneken, Assessment for automatic
     speech recognition: Ii. noisex-92: A database and an
     experiment to study the effect of additive noise on
     speech recognition systems, Speech communication
     12 (1993) 247–251.
[20] J. Salamon, J. P. Bello, Unsupervised feature learning
     for urban sound classification, in: 2015 IEEE Inter-
     national Conference on Acoustics, Speech and Signal
     Processing (ICASSP), IEEE, 2015, pp. 171–175.
[21] K. J. Piczak, Esc: Dataset for environmental sound
     classification, in: Proceedings of the 23rd ACM inter-
     national conference on Multimedia, 2015, pp. 1015–
     1018.




                                                                67