=Paper=
{{Paper
|id=Vol-3695/p08
|storemode=property
|title=Techniques for Recognising and Classifying Environmental Noise Using Deep Learning
|pdfUrl=https://ceur-ws.org/Vol-3695/p08.pdf
|volume=Vol-3695
|authors=Ludovica Beritelli,Maria Grazia Borzì,Cristian Randieri,Roberta Avanzato,Francesco Beritelli
|dblpUrl=https://dblp.org/rec/conf/system/BeritelliBRAB23
}}
==Techniques for Recognising and Classifying Environmental Noise Using Deep Learning==
Techniques for Recognising and Classifying Environmental Noise
Using Deep Learning
Ludovica Beritelli1 , Maria Grazia Borzì1 , Cristian Randieri2 , Roberta Avanzato1 and Francesco Beritelli1
1
Department of Electrical, Electronic and Computer Engineering University of Catania, Catania, Italy
2
Università degli Studi e-Campus, Novedrate (CO), Italy
Abstract
Increasing urbanisation poses new challenges in mitigating noise pollution and preserving quality of life. In this study, we present an
innovative approach for the classification of environmental noise, exploiting advanced Deep Learning (DL) techniques. By merging
three different public datasets, we created a unified corpus to train and test a convolutional neural network (CNN), with the aim
of efficiently recognising and classifying various noise events. The proposed approach overcomes the limitations of conventional
methodologies, avoiding the need for data pre-processing that could alter sound characteristics. The experimental results demonstrate
a significant improvement in classification accuracy, reaching 96.93% with the test set and 100% by applying a post-processing filter.
These results emphasise the potential of DL in the treatment of environmental noise, offering new perspectives for signal processing
and telecommunications.
Keywords
Environmental Noise Classification, Convolutional Neural Networks, Signal Processing, Noise Pollution
1. Introduction achieved 99.72% accuracy. These results underline the signif-
icant potential of the deep learning approach in identifying
The search for sustainable solutions to mitigate the impact sounds harmful to the environment. Another contribution
of environmental noise has become crucial to preserving is made by Jeon et al. [13], proposing a multi-channel indoor
the quality of life in our increasingly urban society. In this noise database for the development and evaluation of speech
context, the recognition and classification of environmental processing algorithms. This database includes noise signals
noise emerge as key challenges in the field of signal process- generated by physical actions and loudspeakers placed in
ing and telecommunications, where noise can significantly various locations within an apartment building, allowing for
degrade the quality and intelligibility of transmitted signals a wide range of noise conditions. A further study, conducted
[1]. Recently, the advancement of machine learning (ML) by Ramli et al. [14], proposes a mechanism to reduce back-
and deep learning (DL) techniques has opened new frontiers ground noise in voice communications through the use of
in the accuracy of noise classification. a two-sensor adaptive noise canceller. This system demon-
Pioneering studies, such as that of Couvreur et al. [2], strated high convergence rates, significant improvements in
have demonstrated the effective use of hidden Markov mod- the signal-to-noise ratio, and a 65% reduction in computa-
els (HMMs) for the recognition of sound events, offering tional power compared to traditional methods. The study by
detailed analysis of sound signals in time and frequency. De- Tsai et al. [15] analyses the spatial characteristics of urban
spite their effectiveness, these techniques require consider- noise using noise maps and emphasises the importance of
able computational resources, posing challenges in practical noise maps for a better understanding and management of
implementation [3, 4, 5, 6, 7, 8, 9]. In parallel, the approach urban noise.
by Alsouda et al. [10] presents a machine-learning-based This study demonstrates how the application of DL tech-
method for urban noise identification using an inexpensive niques can offer effective solutions to the challenges of en-
IoT unit and Mel-frequency cepstral coefficient extraction of vironmental noise classification, with potential significant
audio features and supervised classification algorithms (such benefits for the telecommunication sector and society at
as support vector machine, k-nearest neighbours, bootstrap large. Our research opens new perspectives for the use of
aggregation and random forest). This approach achieved artificial intelligence in urban noise mitigation, promoting
noise classification accuracy in the range of 88% to 94%. a more sustainable environment and a better quality of life.
The integration of HMM, fuzzy logic and neural networks In section 2 we discuss the importance of developing ef-
proposed by Beritelli et al. [11] further emphasised the im- fective noise classification strategies, which are essential for
portance of combining different methodologies to improve improving the quality of communication and, consequently,
classification accuracy on large noise databases. Further- the quality of life in urban areas.
more, a study conducted by Aksoy et al. [12] used advanced In section 3, we present our innovative approach, which
deep learning models, including VGG-13BN, ResNet-50 and exploits advanced DL techniques for analysing and classi-
DenseNet-121, to classify sounds according to their envi- fying environmental noise. We will illustrate how, through
ronmental relevance. The results demonstrated high accu- the use of Convolutional Neural Networks (CNNs), our
racy in classifying sounds, with correctness rates of over model works directly with the raw audio data, avoiding
95%, highlighting in particular the VGG-13 BN model that the loss of significant information that could result from
pre-processing processes.
SYSYEM 2023: 9th Scholar’s Yearly Symposium of Technology, Engineering In section 4, we present the results obtained from our
and Mathematics, Rome, December 3-6, 2023 study, demonstrating the effectiveness of the proposed
$ beritelli.ludovica@gmail.com (L. Beritelli); model in classifying environmental noise. The results show
borzi.m@studium.unict.it (M. G. Borzì);
a significant improvement in classification accuracy, achiev-
cristian.randieri@uniecampus.it (C. Randieri);
roberta.avanzato@unict.it (R. Avanzato); francesco.beritelli@unict.it ing remarkable performance in the tests performed. We will
(F. Beritelli) also discuss the impact of a post-processing filter [16] in
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribu-
tion 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
62
Ludovica Beritelli et al. CEUR Workshop Proceedings 62–67
further increasing the accuracy of the model. testing, ensuring an equal distribution of sound classes be-
tween the two. The learning dataset includes classes such
as “air_conditioner", “children_playing", and “traffic", with a
2. Environmental Noise variable number of sound sequences per class. Similarly, the
test dataset maintains a representative proportion of each
Environmental noise, defined as any unwanted sound gen-
class, ensuring a valid evaluation of network performance.
erated by the surrounding environment, is a major source
of noise pollution. These sounds may come from natural
sources such as sea waves or from man-made sources such 3.1.1. Learning dataset
as vehicle traffic, alarms, voices and electronic devices. Ef- • “air_conditioner": 1271 audio sequences,
fective management of such noise requires methods that • “children_playing": 704 audio sequences,
go beyond simply measuring sound pressure levels (dB), in- • “babble": 259 audio sequences,
cluding characterising and identifying the type of noise [17].
• “car_horn": 307 audio sequences,
In the field of telecommunications, environmental noise in-
troduces significant challenges, degrading signal quality • “drilling": 622 audio sequences,
and compromising communication efficiency. Research has • “engine_idling": 704 audio sequences,
highlighted the importance of developing advanced noise • “jackhammer": 917 audio sequences,
reduction strategies, through the use of machine learning • “metro": 1800 audio sequences,
(ML) and deep learning (DL) techniques, aimed at improv- • “office": 1800 audio sequences,
ing the accuracy of noise classification [18]. The studies in • “river": 1800 audio sequences,
[19, 20] have contributed greatly to the understanding of • “siren": 956 audio sequences,
environmental noise by providing innovative approaches • “square": 1800 audio sequences,
for its analysis and classification. These works emphasise • “street_music": 850 audio sequences,
the need for authentic and versatile databases to test and
• “traffic": 1800 audio sequences.
develop signal processing algorithms capable of handling
the complexity of real acoustic environments [21]. The accu-
rate identification and classification of environmental noise 3.1.2. Testing dataset
not only improves the performance of telecommunication • “air_conditioner": 543 audio sequences,
systems but also contributes to the health and well-being • “children_playing": 299 audio sequences,
of individuals by reducing exposure to harmful levels of • “babble": 15 audio sequences,
noise. Therefore, research in this area is crucial to advance
• “car_horn": 129 audio sequences,
the design of more resilient communication systems and to
• “drilling": 268 audio sequences,
promote a more sustainable sound environment.
• “engine_idling": 303 audio sequences,
• “jackhammer": 398 audio sequences,
3. Method proposed • “metro": 600 audio sequences,
• “office": 600 audio sequences,
Advances in Machine Learning (ML) and Deep Learning
• “river": 600 audio sequences,
(DL) techniques have radically transformed the approach
to data analysis, allowing us to discover unexpected com- • “siren": 412 audio sequences,
plex patterns in audio data. In this study, we adopted an • “square": 600 audio sequences,
innovative methodology that exploits neural networks to • “street_music": 362 audio sequences,
directly process audio signals in .wav format. The aim is to • “traffic": 600 audio sequences.
evaluate the ability of these networks to accurately classify
different sound events without resorting to pre-processing 3.2. Application of CNNs
techniques that could compromise data integrity. In the
subsection 3.1 we will describe the datasets used, the break- Artificial intelligence (AI) represents a vast and evolving
down of these for training, validation and testing of the field of study that aims to emulate human cognitive capa-
neural network and in 3.2 the CNN network used for the bilities through the development of autonomous hardware
classification of ambient noise. and software systems. This ambition to reflect human intel-
ligence in machines has led to the development of technolo-
gies capable of autonomous learning, adaptation, reasoning
3.1. Dataset
and planning. At the heart of AI are advanced algorithms
The dataset used in this research was composed by merg- and computational techniques, which make it possible to
ing three distinct public databases: UrbanSound [18], De- replicate typically human behaviours, such as interaction
mand [17] and Noisex-92 [19]. This fusion created a het- with the environment and decision-making. The applica-
erogeneous dataset that includes a wide range of sound tions of artificial intelligence range in different fields, from
classes, specifically excluding the dog bark class from Ur- industrial to domestic, demonstrating its potential to im-
banSound, but incorporating common ambient noise classes prove both the activities of businesses and public adminis-
from Noisex-92 and Demand. A prepocessing phase is car- trations and the everyday lives of people.
ried out before giving the data as input to the CNN network. Convolutional Neural Networks (CNNs) stand out for
Specifically, the recordings were all divided into 2-second their effectiveness in analysing visual and sound data due
sub-sequences and sampled at 22050 Hz. The dataset was to their ability to identify complex patterns through the use
randomly divided into two different sets, one used for net- of convolutional filters.
work training and validation and the other for network Our CNN architecture follows a structured model starting
with the input layer, proceeding through convolutional and
63
Ludovica Beritelli et al. CEUR Workshop Proceedings 62–67
Figure 1: Accuracy and loss trends during training and validation.
activation (ReLU) layers, pooling, and culminating in a fully convolutions (1D Convolution Layer, Batch Normalisation
connected layer for final classification. This design allows Layer, ReLU Layer and Pooling Layer) and the last layer is
the network to process audio features from the simplest to the output (Softmax).
the most complex, facilitating deep and robust data learning. The neural network’s input is a vector containing se-
The detailed configuration of convolutional, pooling, and quences of audio waveforms, each with a duration of 𝑊𝑖𝑛 =
fully connected layers provides a powerful means to extract 2 seconds. The CNN neural network determines the in-
and interpret sound features, making CNNs particularly dex associated with one of 𝑁𝐶 = 14 different classes
suitable for the recognition and classification of complex 𝐶𝑖 (𝑖 = 1, ..., 𝑁 ) using the LogSoftMax function. The net-
sound events. Our research aims to demonstrate the effec- work is trained by feeding RAW sequences representing
tiveness of this approach in the field of acoustic analysis, different environmental noises.
contributing significantly to the field of signal processing
and audio classification.
The neural network used in this study is based on the 4. Experimental Results
architecture of 1D convolutional neural networks and, in
The validation process of our approach was carried out
particular, on the “M5 (0.5M)" model described in [16]. This
through a rigorous experiment involving the direct input
network consists of five layers, the first four of which are
64
Ludovica Beritelli et al. CEUR Workshop Proceedings 62–67
Figure 2: Confusion matrix for the validation dataset.
of raw audio data, in .wav format, into the convolutional of deep learning techniques in overcoming the challenges
neural network (CNN). Below, we present a detailed analysis of accurately recognising complex sound events.
of the performance obtained during the different phases of
training, validation and testing of the network.
5. Conclusion
4.1. Training and Validation This study introduced a new approach for the classification
During the training phase, we observed a progressive im- of environmental noise, exploiting the potential of Deep
provement in network performance, as illustrated in Fig. 1. Learning techniques to address one of the most pressing
This graph shows an increase in accuracy and a decrease in challenges in signal processing and telecommunications.
the loss function as the epochs progress, highlighting the Through the use of a CNN trained on a unified dataset de-
effectiveness of the learning process. The dataset was split rived from three different public sources, it is shown that
into a proportion of 70% for training and 30% for validation, high accuracy in the classification of environmental noise
as illustrated in Section 3.1. events can be achieved without the need for complex pre-
Fig. 2 presents the confusion matrix obtained from the val- processing. The results obtained reveal a marked improve-
idation of the model, providing a clear indication of its clas- ment in classification accuracy, highlighting the effective-
sification capability across the different sound categories. ness of our model both in the testing phase and in the ap-
plication of post-processing techniques. These results not
only confirm the value of convolutional neural networks in
4.2. Testing acoustic analysis, but also open the way for future research
The effectiveness of the model was further verified through to explore the applicability of such methods in broader ar-
testing on a separate dataset, achieving an impressive ac- eas, including urban noise monitoring and the improvement
curacy of 96.93%. Fig. 3 illustrates the confusion matrix of telecommunication systems. In conclusion, our study
for this phase, confirming the network’s high accuracy in contributes significantly to the body of research on signal
recognising environmental sounds. processing, proposing an effective and efficient model for
the classification of ambient noise, with direct implications
for environmental sustainability and quality of life in urban
4.3. Post-Processing and Time Window areas.
Analysis
The introduction of a post-processing filter, called the “re- References
currence filter" [16], further improved the performance of
the model. As demonstrated in Fig. 4, the accuracy of the [1] F. Beritelli, A. Gallotta, C. Rametta, A dual stream-
system increases significantly by extending the analysis ing approach for speech quality enhancement of voip
time window. In particular, it can be seen that by extending service over 3g networks, in: 2013 18th International
the analysis beyond 28 seconds, the accuracy reaches 100%. Conference on Digital Signal Processing (DSP), IEEE,
The results underline the effectiveness of our approach 2013, pp. 1–5.
based on the use of convolutional neural networks for [2] C. Couvreur, V. Fontaine, P. Gaunard, C. G. Mubikang-
analysing environmental sound, highlighting the potential iey, Automatic classification of environmental noise
65
Ludovica Beritelli et al. CEUR Workshop Proceedings 62–67
Figure 3: Confusion matrix for the testing dataset.
Figure 4: Effect of post-processing filter on accuracy as a function of time window.
events by hidden markov models, Applied Acoustics Conference on Clean Electrical Power (ICCEP), IEEE,
54 (1998) 187–206. 2013, pp. 772–776.
[3] G. Capizzi, C. Napoli, L. Paternò, An innovative hybrid [6] V. Ponzi, S. Russo, A. Wajda, R. Brociek, C. Napoli,
neuro-wavelet method for reconstruction of missing Analysis pre and post covid-19 pandemic rorschach
data in astronomical photometric surveys, in: Arti- test data of using em algorithms and gmm models,
ficial Intelligence and Soft Computing: 11th Interna- volume 3360, 2022, pp. 55 – 63.
tional Conference, ICAISC 2012, Zakopane, Poland, [7] G. Capizzi, G. L. Sciuto, C. Napoli, M. Woźniak, G. Susi,
April 29-May 3, 2012, Proceedings, Part I 11, Springer, A spiking neural network-based long-term prediction
2012, pp. 21–29. system for biogas production, Neural Networks 129
[4] N. Brandizzi, S. Russo, R. Brociek, A. Wajda, First (2020) 271–279.
studies to apply the theory of mind theory to green [8] G. De Magistris, M. Romano, J. Starczewski, C. Napoli,
and smart mobility by using gaussian area clustering, A novel dwt-based encoder for human pose estimation,
volume 3118, 2021, pp. 71 – 76. volume 3360, 2022, pp. 33 – 40.
[5] F. Bonanno, G. Capizzi, G. Lo Sciuto, A neuro wavelet- [9] F. Bonanno, G. Capizzi, G. L. Sciuto, C. Napoli, Wavelet
based approach for short-term load forecasting in in- recurrent neural network with semi-parametric input
tegrated generation systems, in: 2013 International data preprocessing for micro-wind power forecasting
66
Ludovica Beritelli et al. CEUR Workshop Proceedings 62–67
in integrated generation systems, 2015, pp. 602 – 609.
doi:10.1109/ICCEP.2015.7177554.
[10] Y. Alsouda, S. Pllana, A. Kurti, Iot-based urban noise
identification using machine learning: performance
of svm, knn, bagging, and random forest, in: Proceed-
ings of the international conference on omni-layer
intelligent systems, 2019, pp. 62–67.
[11] F. Beritelli, R. Grasso, A pattern recognition system
for environmental sound classification based on mfccs
and neural networks, in: 2008 2nd International Con-
ference on Signal Processing and Communication Sys-
tems, IEEE, 2008, pp. 1–4.
[12] B. Aksoy, U. Uygar, G. Karadağ, A. R. Kaya, Ö. Melek,
Classification of environmental sounds with deep
learning, Advances in Artificial Intelligence Research
2 (2022) 20–28.
[13] K. M. Jeon, N. K. Kim, M. J. Jo, H. K. Kim, Design of
multi-channel indoor noise database for speech pro-
cessing in noise, in: 2017 20th Conference of the
Oriental Chapter of the International Coordinating
Committee on Speech Databases and Speech I/O Sys-
tems and Assessment (O-COCOSDA), IEEE, 2017, pp.
1–4.
[14] R. M. Ramli, A. O. A. Noor, S. Abdul Samad, Noise
cancellation using selectable adaptive algorithm for
speech in variable noise environment, International
Journal of Speech Technology 20 (2017) 535–542.
[15] K.-T. Tsai, M.-D. Lin, Y.-H. Chen, Noise mapping in ur-
ban environments: A taiwan study, Applied Acoustics
70 (2009) 964–972.
[16] R. Avanzato, F. Beritelli, Heart sound multiclass anal-
ysis based on raw data and convolutional neural net-
work, IEEE Sensors Letters 4 (2020) 1–4.
[17] J. Thiemann, N. Ito, E. Vincent, The diverse envi-
ronments multi-channel acoustic noise database (de-
mand): A database of multichannel environmental
noise recordings, in: Proceedings of Meetings on
Acoustics, volume 19, AIP Publishing, 2013.
[18] J. Salamon, C. Jacoby, J. P. Bello, A dataset and taxon-
omy for urban sound research, in: Proceedings of the
22nd ACM international conference on Multimedia,
2014, pp. 1041–1044.
[19] A. Varga, H. J. Steeneken, Assessment for automatic
speech recognition: Ii. noisex-92: A database and an
experiment to study the effect of additive noise on
speech recognition systems, Speech communication
12 (1993) 247–251.
[20] J. Salamon, J. P. Bello, Unsupervised feature learning
for urban sound classification, in: 2015 IEEE Inter-
national Conference on Acoustics, Speech and Signal
Processing (ICASSP), IEEE, 2015, pp. 171–175.
[21] K. J. Piczak, Esc: Dataset for environmental sound
classification, in: Proceedings of the 23rd ACM inter-
national conference on Multimedia, 2015, pp. 1015–
1018.
67