=Paper= {{Paper |id=Vol-2351/paper12 |storemode=property |title=Real-Time Detection of Impulsive Sounds for Audio Surveillance Systems |pdfUrl=https://ceur-ws.org/Vol-2351/paper_49.pdf |volume=Vol-2351 |authors=Faycal Ykhlef,Sarah Ahmed Hamada,Farid Ykhlef,Abdeladhim Derbal,Djamel Bouchaffra |dblpUrl=https://dblp.org/rec/conf/jeri/YkhlefHYDB19 }} ==Real-Time Detection of Impulsive Sounds for Audio Surveillance Systems== https://ceur-ws.org/Vol-2351/paper_49.pdf

Real-Time Detection of Impulsive Sounds for Audio
Surveillance Systems

Faycal Ykhlef1, Sarah Ahmed Hamada2, Farid Ykhlef2, Abdeladhim Derbal1 and
Djamel Bouchaffra1
1 Centre de Développement des Technologies Avancées, Division ASM, Algiers, Algeria
2 University of BLIDA 1, LATSI and FUNDAPL Laboratories, Blida, Algeria

{fykhlef, aderbal, dbouchaffra}@cdta.dz1
{sarah.medhamada, ykhlefarid}@gmail.com2

Abstract. The monitoring of dangerous audio events is very important in sur-
veillance systems. One of the most significant phase in audio surveillance is the
detection of impulsive sounds (IS). It is considered as a preprocessing stage pri-
or to the recognition phase. We propose in this paper an indoor audio monitor-
ing software to detect IS in real-time. It is composed of three main stages: (i)
audio acquisition, (ii) preprocessing module and (iii) sound detector. We have
used MEMS microphone to acquire the audio data. The preprocessing stage
aims at tuning the microphone sensitivity. It is used to mask the non-desired
frequency components of the environment by adding white noise. The detection
of IS is conducted using a thresholding scheme based on normalized form of
power sequences. The proposed prototype is running under Windows 7 on an
ordinary laptop. The results we have obtained are very promising.

Keywords: Impulsive sounds detection; power; real-time audio surveillance.

1 Introduction

The security of citizens in public environments is an important issue facing all the
countries of the world. Therefore, setting up efficient surveillance systems has be-
come essential in urban environments. In addition to video data, the third generation
of surveillance systems includes additional sensors to provide extra information about
anomalous events. Several types of sensors can be exploited. One can mention: tem-
perature-meters, movement detectors, infra-red sensors, seismometers, and micro-
phones [1]. In particular, the audio data captured by the microphones can be used to
track-down the dangerous events which are happening outside the range of the camera
view. In addition, audio data can be useful when the video information captured by
the camera do not have enough clues to identify dangerous events especially when the
climatic conditions become unfavorable. Acoustic events that may be identified as
indicators of dangerous situations include but are not limited to: gunshots, screams,
dogs barking, car accidents, alarms and glace breaking [2]. Theoretically, the main
feature that is shared by all these acoustic events is a sudden energy increase. The
detection and recognition of these events is a key phase for the implementation of an
2

efficient surveillance system. The detection step consists in identifying the special
acoustical events which are happening in the environment, typically impulsive sounds
(IS). On the other hand, sound recognition consists in distinguishing between the
different types of impulsive waves [3]. The sound detection module has to be perma-
nently activated to ensure continuous monitoring of environmental events. The tech-
niques used to achieve this goal have to be non-complex, capable of performing ro-
bust detection in noisy conditions and must operate in real-time. On the other hand,
sound recognition methods exploit more complex schemes which are generally based
on advanced machine learning paradigms [1], [4], [5]. Once an IS has been detected,
the recognition stage is evoked to identify its exact type. Basically, the issue of sound
detection can be addressed in two different ways: (i) thresholding methods and (ii)
detection by classification [5]. Most of the thresholding methods are based on the
comparison of a significant feature with a fixed threshold. For instance, one can men-
tion: power measures [3], Teager Energy Operator (TEO) [6], and Chi-square distri-
bution [7]. The detection by classification uses the same scheme as for the recogni-
tion issue. In fact, it is composed of two main steps: (i) feature extraction and (ii)
classification. It can be considered as a two-class problem where the positive class is
the “IS” and the negative one in the “non-IS” [14]. The approach of sound detection
based on thresholding is less computationally demanding than detection-by-
classification [5].
In this paper, we will only focus on the detection of IS. The recognition problem
will be approached in our future works. As far as we are aware, many solutions re-
ported in the literature for IS detection are focusing on the algorithmic aspects [2],
[3], [4], [7]. The performance of these methods are generally evaluated offline using
local databases. Few contributions are tackling the issue of real time IS detection. We
can mention the studies reported in references [5] and [6]. K. Lopatka [5] proposes a
system for the recognition of threatening acoustic events using supercomputing clus-
ter. The detection stage uses an adaptive thresholding method. The recognition is
based on support vector machines. A parallel processing scheme is introduced to tack-
le latency, delays and online decisions. The developed solution can be regarded as
nearly real-time since the time needed to recognize the acoustic events is about 0.2s.
The sound detection scheme has not been evaluated separately in this study.
The detection and recognition system of acoustic events reported by
R. Levorato [6] is developed in the Network-Integrated Multimedia Middleware and
is operating in real-time. It uses a computer equipped with a sound card and a wired
acoustical microphone. The TEO was exploited to detect impulsive events. The
recognition is based on Gaussian mixture models. The entire system (detection and
recognition) has been evaluated using four types of IS: gunshots, screams, broken
glasses and barking dogs. The detection phase has not been evaluated separately since
the main purpose of this study was the recognition of environmental sounds. Real
time IS detection is an important issue in environmental sounds recognition. It can be
considered as a preprocessing phase prior to the recognition stage. In fact, the elabora-
tion of an audio surveillance system does not rely only on the efficiency of the algo-
rithms; the software and hardware constraints have to be taken into account to achieve
better performance.
3

Fig 1. Design methodology and realization of the software

In addition, the sensitivity of acoustical sensors and the quality of audio data may
loom large in strengthening sound detectors. Therefore, our concern in this paper is to
elaborate an IS detector regardless of their exact type. We have adopted a threshold-
ing scheme. We have used an algorithm based on normalized version of power se-
quences to detect sudden changes of acoustical power. A special attention was given
to the microphone sensitivity in the design of our prototype. Therefore, we have pro-
posed a preprocessing module in order to tune the microphone sensitivity by using
Gaussian white noise. The software we have conceived is running under Windows 7
on a laptop equipped with Core i5 processor and 6G of RAM. The acquisition of au-
dio data is achieved wirelessly using MEMS audio sensor. The software can detect IS
under noisy conditions in an indoor environment. The detector includes: (i) offline
and (ii) real-time processing (Fig.1). Our main contributions in this paper are twofold:
(i) the optimization of the detector parameters for real time operation in indoor envi-
ronment and (ii) the design of a preprocessing stage for microphone sensitivity tuning.

2 Design methodology
2.1 Offline processing
The goals of offline processing are fourfold: (i) the choice of an adequate audio sen-
sor, (ii) the construction of an audio database (iii) the implementation of IS detector
and (iv) the optimization of its algorithmic parameters.
4

Audio sensor (microphone): The audio sensor which is used to acquire data plays an
important role in the detection process. Several specifications need to be taken into
account. We can mention: decibel scale, frequency response, signal to noise ratio,
polar response, noise level, sensitivity, dynamic range, and sound pressor level (SPL)
capability [8]. Special focus should be addressed to dynamic range, sensitivity and
SPL capacity in sound detection. One of the most appropriate sensors for audio moni-
toring are those produced by Buel & Kjaer sound and vibration [9]. Unfortunately, we
were not able to purchase such microphones due to their high pricing. To cope with
this problem, we have exploited another type of sensors entitled Micro-Electro-
Mechanical Systems (MEMS) microphones. These sensors are usually embedded in
smartphones and smart electronic devises. They are congruous for systems that com-
pel a very high dynamic range and tight sensitivity matching [10]. As far as we are
aware, the exact type of microphones which are embedded in smartphones are unluck-
ily not provided within the smartphone technical guide. However, an overview of the
specifications can be found on the following website [10]. We have used Samsung
Galaxy Ace III smartphone in our experiments. It is equipped with an omni-
directional microphone and offers high quality, sensitivity and maximum SPL capa-
bility (around 94dBSPL). We have exploited Wo-Mic software to transform our
smartphone to be a wireless-microphone for our computer [11]. Mypublic WIFI has
been used to connect the smartphone into the laptop [12].
Database: The procedure proposed to optimize the parameters that influence the IS
detector requires the use of a benchmark composed of multiple sequences that contain
impulsive waveforms. The recording of sounds need to be conducted using the same
microphone that we plan to use in real time processing phase. We have downloaded
200 audio files of gunshots from sounddogs website [13]. The sampling frequency
(Fs) of these files is 11025Hz. After that, the dynamic range of all these files has been
adjusted. Silence sections have also been eliminated from the audio waveform. We
have used a desktop computer equipped with high quality loudspeakers to play the
audio files. The microphone (integrated in the smartphone) has been placed at a dis-
tance of 4 m from the loudspeakers in indoor environment (the surface of the room is
about 30 m2) and connected wirelessly to a laptop. The progress of the experiment is
given as follows. The audio files which are saved on a desktop computer are played
one after the other using MATLAB software. The silence duration between each file
has been fixed to 3s. The acoustic waveforms, which are generated by loudspeakers,
are simultaneously recorded using the microphone. The obtained audio sequence is
saved on the hard disk of the laptop. This experimentation is repeated 3 times to ob-
tain sequences of IS recorded respectively at 70, 80 and 90 dBSPLs. These audio
sequences are separately saved as seq(1), seq(2) and seq(3). The variation of sound pres-
sure is carried out by changing the volume of the loudspeakers and measuring its SPL
using a professional sound level meter. The three audio sequences are manually
tagged to pinpoint the starting instants of impulsive events (Marks) (Fig. 2). These
instants are saved respectively as ref(1), ref(2) and ref(3) and will be exploited later to
compute the detection errors.
5

Fig. 2 Instants of beginning of impulsive events (selected section:70 dBSPL)

IS Detector: The method we have used to detect IS is originally proposed by Dufaux
[3]. It is based on the power sequences of audio waveforms. The author has used this
scheme as a preprocessing stage to recognize audio environmental events.
As far as we are aware, this method has never been employed before for real time
detection of audio events. The tasks of detection and recognition of sounds in refer-
ence [3] were conducted offline. We have optimized the algorithmic parameters of
this method to exploit it for real time detection of IS. It was implemented on
MATLAB software. The detection process is summarized as follows.
Algorithm 1:
1. Computation of the kth power block e(k)
1
e(k)= ∑N-1 2
n=0 x (n+kN) (1)
N
th
x(n): n sample of the audio waveform which is sampled at Fs,
k: is the index of blocks. It varies from 0 to +∞,
N: is the length of power blocks.

2. Framing of the power sequence ewin(j/k)
This step consists in creating a power sequence ewin of length L.
ewin (j) = e(i) (2)
The variation of ‘i’ and ‘j’ indexes are related to ‘k’ values. According to ‘k’, we can distinguish two
states:
2.1. Transient state (k