=Paper=
{{Paper
|id=Vol-2351/paper12
|storemode=property
|title=Real-Time Detection of Impulsive Sounds for Audio Surveillance Systems
|pdfUrl=https://ceur-ws.org/Vol-2351/paper_49.pdf
|volume=Vol-2351
|authors=Faycal Ykhlef,Sarah Ahmed Hamada,Farid Ykhlef,Abdeladhim Derbal,Djamel Bouchaffra
|dblpUrl=https://dblp.org/rec/conf/jeri/YkhlefHYDB19
}}
==Real-Time Detection of Impulsive Sounds for Audio Surveillance Systems==
Real-Time Detection of Impulsive Sounds for Audio Surveillance Systems Faycal Ykhlef1, Sarah Ahmed Hamada2, Farid Ykhlef2, Abdeladhim Derbal1 and Djamel Bouchaffra1 1 Centre de Développement des Technologies Avancées, Division ASM, Algiers, Algeria 2 University of BLIDA 1, LATSI and FUNDAPL Laboratories, Blida, Algeria {fykhlef, aderbal, dbouchaffra}@cdta.dz1 {sarah.medhamada, ykhlefarid}@gmail.com2 Abstract. The monitoring of dangerous audio events is very important in sur- veillance systems. One of the most significant phase in audio surveillance is the detection of impulsive sounds (IS). It is considered as a preprocessing stage pri- or to the recognition phase. We propose in this paper an indoor audio monitor- ing software to detect IS in real-time. It is composed of three main stages: (i) audio acquisition, (ii) preprocessing module and (iii) sound detector. We have used MEMS microphone to acquire the audio data. The preprocessing stage aims at tuning the microphone sensitivity. It is used to mask the non-desired frequency components of the environment by adding white noise. The detection of IS is conducted using a thresholding scheme based on normalized form of power sequences. The proposed prototype is running under Windows 7 on an ordinary laptop. The results we have obtained are very promising. Keywords: Impulsive sounds detection; power; real-time audio surveillance. 1 Introduction The security of citizens in public environments is an important issue facing all the countries of the world. Therefore, setting up efficient surveillance systems has be- come essential in urban environments. In addition to video data, the third generation of surveillance systems includes additional sensors to provide extra information about anomalous events. Several types of sensors can be exploited. One can mention: tem- perature-meters, movement detectors, infra-red sensors, seismometers, and micro- phones [1]. In particular, the audio data captured by the microphones can be used to track-down the dangerous events which are happening outside the range of the camera view. In addition, audio data can be useful when the video information captured by the camera do not have enough clues to identify dangerous events especially when the climatic conditions become unfavorable. Acoustic events that may be identified as indicators of dangerous situations include but are not limited to: gunshots, screams, dogs barking, car accidents, alarms and glace breaking [2]. Theoretically, the main feature that is shared by all these acoustic events is a sudden energy increase. The detection and recognition of these events is a key phase for the implementation of an 2 efficient surveillance system. The detection step consists in identifying the special acoustical events which are happening in the environment, typically impulsive sounds (IS). On the other hand, sound recognition consists in distinguishing between the different types of impulsive waves [3]. The sound detection module has to be perma- nently activated to ensure continuous monitoring of environmental events. The tech- niques used to achieve this goal have to be non-complex, capable of performing ro- bust detection in noisy conditions and must operate in real-time. On the other hand, sound recognition methods exploit more complex schemes which are generally based on advanced machine learning paradigms [1], [4], [5]. Once an IS has been detected, the recognition stage is evoked to identify its exact type. Basically, the issue of sound detection can be addressed in two different ways: (i) thresholding methods and (ii) detection by classification [5]. Most of the thresholding methods are based on the comparison of a significant feature with a fixed threshold. For instance, one can men- tion: power measures [3], Teager Energy Operator (TEO) [6], and Chi-square distri- bution [7]. The detection by classification uses the same scheme as for the recogni- tion issue. In fact, it is composed of two main steps: (i) feature extraction and (ii) classification. It can be considered as a two-class problem where the positive class is the “IS” and the negative one in the “non-IS” [14]. The approach of sound detection based on thresholding is less computationally demanding than detection-by- classification [5]. In this paper, we will only focus on the detection of IS. The recognition problem will be approached in our future works. As far as we are aware, many solutions re- ported in the literature for IS detection are focusing on the algorithmic aspects [2], [3], [4], [7]. The performance of these methods are generally evaluated offline using local databases. Few contributions are tackling the issue of real time IS detection. We can mention the studies reported in references [5] and [6]. K. Lopatka [5] proposes a system for the recognition of threatening acoustic events using supercomputing clus- ter. The detection stage uses an adaptive thresholding method. The recognition is based on support vector machines. A parallel processing scheme is introduced to tack- le latency, delays and online decisions. The developed solution can be regarded as nearly real-time since the time needed to recognize the acoustic events is about 0.2s. The sound detection scheme has not been evaluated separately in this study. The detection and recognition system of acoustic events reported by R. Levorato [6] is developed in the Network-Integrated Multimedia Middleware and is operating in real-time. It uses a computer equipped with a sound card and a wired acoustical microphone. The TEO was exploited to detect impulsive events. The recognition is based on Gaussian mixture models. The entire system (detection and recognition) has been evaluated using four types of IS: gunshots, screams, broken glasses and barking dogs. The detection phase has not been evaluated separately since the main purpose of this study was the recognition of environmental sounds. Real time IS detection is an important issue in environmental sounds recognition. It can be considered as a preprocessing phase prior to the recognition stage. In fact, the elabora- tion of an audio surveillance system does not rely only on the efficiency of the algo- rithms; the software and hardware constraints have to be taken into account to achieve better performance. 3 Fig 1. Design methodology and realization of the software In addition, the sensitivity of acoustical sensors and the quality of audio data may loom large in strengthening sound detectors. Therefore, our concern in this paper is to elaborate an IS detector regardless of their exact type. We have adopted a threshold- ing scheme. We have used an algorithm based on normalized version of power se- quences to detect sudden changes of acoustical power. A special attention was given to the microphone sensitivity in the design of our prototype. Therefore, we have pro- posed a preprocessing module in order to tune the microphone sensitivity by using Gaussian white noise. The software we have conceived is running under Windows 7 on a laptop equipped with Core i5 processor and 6G of RAM. The acquisition of au- dio data is achieved wirelessly using MEMS audio sensor. The software can detect IS under noisy conditions in an indoor environment. The detector includes: (i) offline and (ii) real-time processing (Fig.1). Our main contributions in this paper are twofold: (i) the optimization of the detector parameters for real time operation in indoor envi- ronment and (ii) the design of a preprocessing stage for microphone sensitivity tuning. 2 Design methodology 2.1 Offline processing The goals of offline processing are fourfold: (i) the choice of an adequate audio sen- sor, (ii) the construction of an audio database (iii) the implementation of IS detector and (iv) the optimization of its algorithmic parameters. 4 Audio sensor (microphone): The audio sensor which is used to acquire data plays an important role in the detection process. Several specifications need to be taken into account. We can mention: decibel scale, frequency response, signal to noise ratio, polar response, noise level, sensitivity, dynamic range, and sound pressor level (SPL) capability [8]. Special focus should be addressed to dynamic range, sensitivity and SPL capacity in sound detection. One of the most appropriate sensors for audio moni- toring are those produced by Buel & Kjaer sound and vibration [9]. Unfortunately, we were not able to purchase such microphones due to their high pricing. To cope with this problem, we have exploited another type of sensors entitled Micro-Electro- Mechanical Systems (MEMS) microphones. These sensors are usually embedded in smartphones and smart electronic devises. They are congruous for systems that com- pel a very high dynamic range and tight sensitivity matching [10]. As far as we are aware, the exact type of microphones which are embedded in smartphones are unluck- ily not provided within the smartphone technical guide. However, an overview of the specifications can be found on the following website [10]. We have used Samsung Galaxy Ace III smartphone in our experiments. It is equipped with an omni- directional microphone and offers high quality, sensitivity and maximum SPL capa- bility (around 94dBSPL). We have exploited Wo-Mic software to transform our smartphone to be a wireless-microphone for our computer [11]. Mypublic WIFI has been used to connect the smartphone into the laptop [12]. Database: The procedure proposed to optimize the parameters that influence the IS detector requires the use of a benchmark composed of multiple sequences that contain impulsive waveforms. The recording of sounds need to be conducted using the same microphone that we plan to use in real time processing phase. We have downloaded 200 audio files of gunshots from sounddogs website [13]. The sampling frequency (Fs) of these files is 11025Hz. After that, the dynamic range of all these files has been adjusted. Silence sections have also been eliminated from the audio waveform. We have used a desktop computer equipped with high quality loudspeakers to play the audio files. The microphone (integrated in the smartphone) has been placed at a dis- tance of 4 m from the loudspeakers in indoor environment (the surface of the room is about 30 m2) and connected wirelessly to a laptop. The progress of the experiment is given as follows. The audio files which are saved on a desktop computer are played one after the other using MATLAB software. The silence duration between each file has been fixed to 3s. The acoustic waveforms, which are generated by loudspeakers, are simultaneously recorded using the microphone. The obtained audio sequence is saved on the hard disk of the laptop. This experimentation is repeated 3 times to ob- tain sequences of IS recorded respectively at 70, 80 and 90 dBSPLs. These audio sequences are separately saved as seq(1), seq(2) and seq(3). The variation of sound pres- sure is carried out by changing the volume of the loudspeakers and measuring its SPL using a professional sound level meter. The three audio sequences are manually tagged to pinpoint the starting instants of impulsive events (Marks) (Fig. 2). These instants are saved respectively as ref(1), ref(2) and ref(3) and will be exploited later to compute the detection errors. 5 Fig. 2 Instants of beginning of impulsive events (selected section:70 dBSPL) IS Detector: The method we have used to detect IS is originally proposed by Dufaux [3]. It is based on the power sequences of audio waveforms. The author has used this scheme as a preprocessing stage to recognize audio environmental events. As far as we are aware, this method has never been employed before for real time detection of audio events. The tasks of detection and recognition of sounds in refer- ence [3] were conducted offline. We have optimized the algorithmic parameters of this method to exploit it for real time detection of IS. It was implemented on MATLAB software. The detection process is summarized as follows. Algorithm 1: 1. Computation of the kth power block e(k) 1 e(k)= ∑N-1 2 n=0 x (n+kN) (1) N th x(n): n sample of the audio waveform which is sampled at Fs, k: is the index of blocks. It varies from 0 to +∞, N: is the length of power blocks. 2. Framing of the power sequence ewin(j/k) This step consists in creating a power sequence ewin of length L. ewin (j) = e(i) (2) The variation of ‘i’ and ‘j’ indexes are related to ‘k’ values. According to ‘k’, we can distinguish two states: 2.1. Transient state (k