=Paper=
{{Paper
|id=Vol-2498/short31
|storemode=property
|title=Indoor localization using multiple stereo speakers for smartphones
|pdfUrl=https://ceur-ws.org/Vol-2498/short31.pdf
|volume=Vol-2498
|authors=Masanari Nakamura,Hiroshi Kameda
|dblpUrl=https://dblp.org/rec/conf/ipin/NakamuraK19
}}
==Indoor localization using multiple stereo speakers for smartphones==
Indoor Localization Using Multiple Stereo Speakers for Smartphones Masanari Nakamura1 and Hiroshi Kameda1 1 Information Technology R&D Center, Mitsubishi Electric Corp., Kanagawa, Japan Nakamura.Masanari@db.MitsubishiElectric.co.jp Abstract. In this paper, we propose an acoustic indoor localization method for an embedded microphone of a smartphone using multiple stereo speakers in- stalled indoors. In this configuration, although stereo speakers are synchro- nized, speakers not producing stereo sound are not synchronized. Therefore, a moving microphone could receive acoustic signals required for localization at different locations. This causes bias error in the conventional method. To ad- dress this issue, we propose a method that utilizes an asynchronous tracking fil- ter and compensates the differences in locations of received signals using mo- tion modelling. Through experiments, we verify that our proposed method can effectively reduce the bias error. Keywords: Acoustic Signal, Asynchronous Tracking Filter, Smartphone, Time Difference of Arrival 1 Introduction These days, mobile devices like smartphones are highly popular, and localization using such devices are gaining attention [1]. Although global navigation satellite sys- tem (GNSS) is a common localization method, it cannot be used indoors because receiving GNSS signals is difficult in such environments. Hence, other approaches employing the embedded sensors of smartphones are required. Acoustic signals are suitable for accurate indoor localization using smartphones. In a related work [2-5], such signals were produced by speakers and received by a microphone; subsequently, the microphoneβs location was calculated. The time differ- ence of arrival (TDoA) between the received signals provides elliptic hyperboloid, which indicates the area where the microphone exists. The estimation of location is done by calculating the intersection between multiple elliptic hyperboloids In these methods, highly precise time synchronization of all the speakers is re- quired. Certain audio players that can synchronize several speakers are available. However, they are more expensive than commercial off the shelf (COTS) stereo speakers, which have only two channels owing to their limited application. Therefore, we utilize COTS stereo speakers in this study. These speakers can pro- duce two acoustic signals simultaneously from two sides. The receiver obtains the elliptical hyperboloid from the TDoA of the speaker signals. Using the TDoAs of 2 multiple stereo speakers, the receiverβs location can be calculated. However, when the microphone is moving, TDoA can be observed at different locations, thereby render- ing the intersection of elliptical hyperboloid to drift away from the true location. This is because speakers not producing stereo sound cannot produce signals simultaneous- ly. We herein propose an asynchronous tracking filter to compensate the difference of observation locations, and therefore reduce the bias error. The rest of this paper is organized as follows. Chapter 2 describes the conventional method and its problem. Chapter 3 presents the proposed method for dealing with the above-mentioned problem of bias error. In Chapter 4, simulation experiments were conducted to evaluate the proposed method. Chapter 5 concludes this paper. 2 Related work In indoor environments, some localization methods using acoustic signals utilize the embedded speakers of smartphones and microphones installed at indoor environ- ment [6]. Whereas, other methods use smartphoneβs embedded microphone and speakers installed at indoor environment [2-5]. Herein, these two types of methods are referred to as active-tag system and passive-tag system, respectively. In the former, multiple signals produced by speakers of multiple smartphones collide at the micro- phone. Although difficult, if the transmission time of speakers is precisely synchro- nized, this collision is avoidable. Therefore, the latter being more preferable for local- ization of multiple devices, is the focus of this section. Acoustic indoor localization methods are of two types: ranging-based and TDoA- based. The former utilizes range measurements between speakers and microphones. The latter utilizes the difference in signal received times. Ranging-based localization is typically more precise than TDoA-based localization. However, it requires high accurate time synchronization (e.g., ΞΌ order) between speakers and microphones and is therefore used for systems employing designated devices [7]. TDoA-based localiza- tion does not require such synchronizations. As accurate time synchronization is diffi- cult in smartphones, TDoA-based localization more preferable. We herein describe the localization for a two-dimensional space for our convenience. The relationship between the location and received time is represented as ππππππππ οΏ½ππ1 β ππ(π‘π‘1 )οΏ½ β ππππππππ οΏ½ππ2 β ππ(π‘π‘2 )οΏ½ = ππ β (π‘π‘1 β π‘π‘2 ) οΏ½ (1) ππππππππ οΏ½ππ1 β ππ(π‘π‘1 )οΏ½ β ππππππππ οΏ½ππ3 β ππ(π‘π‘3 )οΏ½ = ππ β (π‘π‘1 β π‘π‘3 ) where the term ππππ = (π₯π₯ππ , π¦π¦ππ ) denotes the position vector of speaker ππ, and ππ(π‘π‘) denotes the position vector of microphone at time π‘π‘. π‘π‘ππ (ππ = 1,2,3) denotes the re- ceived time of signal produced by speaker ππ,c is the sound speed, and ππππππππ (β ) is the Euclidean distance. Equations in (1) represent hyperbolic curves of the microphoneβs position ππ(π‘π‘). When the smartphone is stationary, the positions, ππ(π‘π‘1 ), ππ(π‘π‘2 ), ππ(π‘π‘3 ) are the same and location can be calculated by solving equation (1). When the smartphone is mov- ing, ππ(π‘π‘1 ), ππ(π‘π‘2 ), ππ(π‘π‘3 )are not the same. However, the differences are insignificant as 3 all the speakers are synchronized. Therefore, location can be obtained without large bias error. To simultaneously produce signals from all speakers, a special device such as a multi-channel audio player is required. As such devices are expensive, we instead use a multiple COTS stereo speaker, which hereinafter is referred to as a unit. In this configuration, although speakers from a same unit are synchronized, those from different units are not. The relationship between the location and received time is ππ (ππ1 β ππ(π‘π‘π π 1 )) β ππππππππ (ππ1πΏπΏ β ππ(π‘π‘πΏπΏ1 )) = ππ β (π‘π‘π π 1 β π‘π‘πΏπΏ1 ) οΏ½ ππππππ 2π π . (2) ππππππππ (πππ π β ππ(π‘π‘π π 2 )) β ππππππππ (ππ2πΏπΏ β ππ(π‘π‘πΏπΏ2 )) = ππ β (π‘π‘π π 2 β π‘π‘πΏπΏ2 ) where πππ π π’π’ οΌ πππΏπΏπ’π’ are locations of speakers R and L, belonging to unit π’π’ (π’π’ = 1,2), re- spectively; π‘π‘π π π’π’ οΌπ‘π‘πΏπΏπ’π’ are received times of signals produced by R and L, respectively. When the smartphone is stationary, microphone positions, ππ(π‘π‘π π 1 ), ππ(π‘π‘πΏπΏ1 ), ππ(π‘π‘π π 2 ), and ππ(π‘π‘πΏπΏ2 ) are same, and the location can be easily estimated without bias error. When the smartphone is moving, equations ππ(π‘π‘π π 1 ) β ππ(π‘π‘πΏπΏ1 ) and ππ(π‘π‘π π 2 ) β ππ(π‘π‘πΏπΏ2 ) are satisfied owing to speaker synchronization. However, the differences between ππ(π‘π‘π π 1 ) and ππ(π‘π‘π π 2 ) can increase if the smartphone starts moving. In this case, the intersection of the hyperbolic curve is drifted away from the true position (refer Figure 1-(a)), and loca- tion estimated by solving (2) incurs bias error. Fig. 1. Localization of moving smartphone using multiple stereo speakers Takabayashi [8] proposed a bias error reduction method for single active-tag sys- tems. We herein apply this method to a passive-tag system with multiple COTS stereo speakers. 3 Proposed method 3.1 System architecture In the proposed method, two speakers of a unit simultaneously produce acoustic signals. If the transmission time of each unit is not controlled, numerous signals could collide with each other at the microphone, thereby resulting in large errors. Therefore, 4 the transmission time of each unit should be controlled using general communication systems such as Wi-Fi and Bluetooth. Speaker time lags in conventional methods produce bias errors, therefore necessi- tating a multiple-channel audio player to synchronize all speakers [2]. The smartphoneβs current position is estimated using an asynchronous tracking fil- ter when a TDoA is measured. The estimation is done by calculating the intersection between the predicted position and hyperbolic curve of TDoA, as shown in Figure 1- (b), and it reduces the bias error as shown in Figure 1-(a), the details of which are described in the next section. 3.2 Asynchronous tracking filter. TDoA observation model and motion model. The state vector, ππ comprises position and velocity [π₯π₯ π¦π¦ π₯π₯Μ π¦π¦Μ ]π‘π‘π‘π‘ where π₯π₯, π¦π¦ are positions, π₯π₯Μ , π¦π¦Μ are velocities, and π‘π‘π‘π‘ is the transpose. The relationship between the state and position is represented as 1 0 0 0οΏ½ ππ = οΏ½ ππ = π―π―π―π― (3) 0 1 0 0 In the asynchronous tracking filter, the estimation is conducted sequentially at every TDoA measurement, comprising π‘π‘π π π’π’ and π‘π‘πΏπΏπ’π’ . We herein define the lesser of π‘π‘π π π’π’ or t π’π’πΏπΏ as the TDoA observation time. π‘π‘ππ denotes as the observation time of ππ-th TDoA. The TDoA measurement, π§π§πππ’π’ received at π‘π‘ππ , corresponding to unit π’π’ is represented as π§π§πππ’π’ = οΏ½π‘π‘π π π’π’ + πππ‘π‘π π π’π’ οΏ½ β οΏ½π‘π‘πΏπΏπ’π’ + πππ‘π‘πΏπΏπ’π’ οΏ½ = βπ’π’ (ππππ ) + οΏ½πππ‘π‘π π π’π’ β πππ‘π‘πΏπΏπ’π’ οΏ½ (4) where πππ‘π‘π π π’π’ , πππ‘π‘πΏπΏπ’π’ are the measured noises of received times produced by R and L of the unit π’π’, respectively. ππππ is the state vector [π₯π₯ππ π¦π¦ππ π₯π₯Μ ππ π¦π¦Μ ππ ]π‘π‘π‘π‘ at time π‘π‘ππ . From equation (2), βπ’π’ (ππππ ) represents ππππππππ (πππππ π β π―π―π―π―ππ ) β ππππππππ (πππ’π’πΏπΏ β π―π―ππππ ) βπ’π’ (ππππ ) = (5) ππ The observation noises πππ‘π‘π π π’π’ , πππ‘π‘πΏπΏπ’π’ are white Gaussian noise with zero-mean and known variance πππ‘π‘2π’π’ , πππ‘π‘2π’π’ , respectively. The measurement error of TDoA πππππ’π’ is π π πΏπΏ πππ‘π‘π π π’π’ β πππ‘π‘πΏπΏπ’π’ . Notably, we assume that πππ‘π‘π π π’π’ ππnd πππ‘π‘πΏπΏπ’π’ are independent. In this case, πππππ’π’ is white Gaussian noise with zero-mean and variance πππ‘π‘2π’π’ + πππ‘π‘2π’π’ because Gaussian π π πΏπΏ distribution has the reproductive property. As the motion model of the microphone, we utilize the constant velocity model. 1 0 π₯π₯π‘π‘ππ 0 π₯π₯π‘π‘ππ 0 0 0 1 0 0 π₯π₯π‘π‘ππ ππππ = οΏ½ οΏ½ ππ +οΏ½ οΏ½ ππππ = ππ(π₯π₯π‘π‘ππ )ππππβ1 + ππππππ (6) 0 1 0 π₯π₯π‘π‘ππ ππβ1 1 0 0 0 0 1 0 1 5 π¦π¦ π‘π‘π‘π‘ where πππ€π€ is the process noise vector οΏ½π€π€πππ₯π₯ π€π€ππ οΏ½ which represents ambiguity of π¦π¦ the motion. π€π€πππ₯π₯ and π€π€ππ are white Gaussian noise with zero-mean and variance πππ€π€2 . Ξπ‘π‘ππ is the difference of the observation times π‘π‘ππ β π‘π‘ππβ1. Algorithm We herein describe the algorithm to estimate the microphone state. When the first ππ measurement is inputted to the asynchronous tracking filter (ππ = 1), particles ππππ representing the microphone state are generated (ππ = 1, β¦ , π½π½). Each element of the ππ ππ ππ π‘π‘π‘π‘ state ππππ = οΏ½π₯π₯ππ π¦π¦ππ π₯π₯Μ ππ ππ π¦π¦Μ ππ ππ οΏ½ is generated by uniform random numbers. The ππ ππ ππ ππ ππ weight πΌπΌk of the state ππππ is given as πΌπΌk = 1/π½π½. When ππππ and πΌπΌk were already gener- ated (ππ β₯ 2), the prediction based on the motion model (6) is conducted for each par- ticle. In the updating step, the likelihood of each particle is calculated using π§π§πππ’π’ and the ππ weight πΌπΌππ is updated. 2 ππ 1 οΏ½π§π§πππ’π’ β βπ’π’ (ππππππ )οΏ½ πΌπΌππ = exp οΏ½β οΏ½ (7) β2ππππ 2 2ππ 2 οΏ½ππ is obtained by the weighted sum βπ½π½ππ=1 πΌπΌππππ β ππππππ . The estimated state ππ When the next measurement is inputted, the above-mentioned process is repeated with the resampled particles [9]. 4 Numerical result 4.1 Scenarios We conducted following simulation experiments for the evaluation of our proposed method. Figure 2 shows the microphone route and the four speaker locations. The microphone moves at a speed of 1 m/sec and receives its first TDoA measurement at the routeβs starting position (-3.0, 1.5). Fig. 2. Simulation scenario At indoor environments, the transmission interval is designed according to the sig- nal length and the reverberation time of multipath. Regarding the signal length, a 6 short one with large amplitude is preferable for the precision and the updating rate of localization. However, the amplitude is generally restricted due to speakerβs inaudibil- ity. To obtain the desired signal-to-noise ratio (SNR) with the restricted amplitude, a longer signal can be utilized. A reverberation time of 100 ms is sufficient to attenuate the multipath [4]. Considering the above-mentioned reasons, transmission intervals of these simulations are set to 100 ms and 300 ms for the short and the long signal cases, respectively. The observation noise of the received time ππ depends on the bandwidth and SNR. The bandwidth and SNR herein are set to 1 kHz and 30 dB, respectively. These pa- rameters were determined in consideration of the characteristic of microphones em- bedded in smartphones [4-5]. In this case, the observation noise ππ is 1.58 Γ 10β5 sec. Hence, the observation noise of TDoA is 2 Γ 1.58 Γ 10β5 = 3.16 Γ 10β5 sec. The conventional method numerically solves equation (2) using the current and previous measurements. The observation noise, process noise, and number of particles of the proposed method were set to 3.16 Γ 10β5 sec, 0.5 m/sec, and 5000, respective- ly. The range of uniform random numbers to generate these particles β² οΏ½π₯π₯ππππ π¦π¦ππππ π₯π₯Μ ππ ππ π¦π¦Μ ππ ππ οΏ½ at ππ = 1 were [-3, 3]οΌ[-1, 3]οΌ[-1.5, 1.5]οΌ[-1.5, 1.5] m, re- spectively. Root mean square error (RMSE) was used for evaluating each trial with the num- ber of trials set to 100. The starting position ππ = 1 was excluded from this evaluation as the conventional method cannot estimate it. 4.2 Results and Discussion Figures 3-(a) and 3-(b) show the results of conventional and proposed methods with transmission intervals set to 100 ms and 300 ms, respectively. The horizontal axis shows the signal receiving position, clearly indicating the relationship between the RMSE and the receiving position as well as effectiveness of the proposed method in RMSE reduction. Fig. 3. RMSE of evaluation result Figures 4-(a) and 4-(b) show examples of results obtained by the conventional and proposed methods, respectively. In Figure 4-(a), estimated locations were around the intersections of hyperbolic curves, far from true positions, which represent the bias 7 errors. Figure 4-(b) indicates that the proposed method can reduce these errors using the asynchronous tracking filter. Fig. 4. Example of estimated locations (transmission interval: 300 ms) In Figures 3-(a) and 3-(b), RMSE of the proposed method was minimum around π₯π₯ = 0 m, turning slightly worse in π₯π₯ > 0 m. For clarification, predicted particles at π₯π₯ = 0 m and 2.5 m were plotted in Figures 5-(a) and 5-(b). The transmission interval was set to 300 ms. In these figures, the particles are scattered in an ellipse, and the major axis was roughly parallel to the previously plotted hyperbolic curve. This is because these particles were generated by extrapolating the resampled particles based on the previously obtained hyperbolic curve. Fig. 5. Predicted particles distribution When the microphoneβs position was π₯π₯ = 0 m (Figure 5-(a)), current hyperbolic curve crossed the major axis of the ellipse nearly at right angles. In this case, the area of resampled particles was narrow, leading to high precision. When the microphoneβs position was π₯π₯ = 2.5 m (Figure 5-(b)), the minor axis of the ellipse crossed the cur- rent hyperbolic curve at roughly right angles, and the area of resampled particles was relatively broad, leading to reduction in precision. 8 5 Conclusion In this paper, we described an acoustic indoor localization system with multiple COTS stereo speakers. We proposed the asynchronous tracking filter to reduce the bias error caused by the asynchrony of speakers that do not produce stereo sound when the smartphone is moving. The simulation experiments showed that the pro- posed method can effectively reduce the bias error. References 1. J. Xiao, Z. Zhou, Y. Yi and L. M. Ni, βA survey on wireless indoor localization from the device perspective,β ACM Computing Surveys, vol. 49, no. 2, pp. 25:1-25:31, (2016). 2. K. Liu, X. Liu and X. Li, βGuoguo: Enabling Fine-Grained Smartphone Localization via Acoustic Anchors,β IEEE Trans. on Mobile Computing, vol. 15, no. 5, pp. 1144-1156, (2016). 3. F. J. Γlvarez, T. Aguilera and R. L. Valcarce, βCDMA-based acoustic local positioning system for portable devices with multipath cancellation,β Digital Signal Processing, vol. 62, pp. 38-51, (2017). 4. T. Akiyama, M. Nakamura, M. Sugimoto and H. Hashizume, βSmart phone localization method using dual-carrier acoustic waves,β Proc. of IPIN 2013, pp. 1-9, 2013. 5. M. Nakamura, T. Akiyama, M. Sugimoto and H. Hashizume, β3D FDM-PAM: rapid and precise indoor 3D localization using acoustic signal for smartphone,β Proc. of ACM Ubicomp 2014, pp. 123-126, (2014). 6. F. HΓΆflinger, R. Zhang, J. Hoppe, A. Bannoura, A. Reindl, J. Wendeberg, M. Buhrer and C. Schindelhauer, βAcoustic Self-calibrating System for Indoor Smartphone Tracking (ASSIST),β Proc. of IEEE IPIN 2012, pp. 1-9, (2012). 7. N. Priyantha, A. Chakraborty and H. Balakrishnan, βThe Cricket Location Support Sys- tem,β Proc. of ACM MobiCom 2000, pp. 32-43, (2000). 8. Y. Takabayashi, T. Matsuzaki, H. Kameda and M. Ito, βTarget tracking using TDOA/FDOA measurements in the distributed sensor network,β Proc. of SICE 2008, pp. 3441-3446, (2008). 9. S. M. Arulampalam, S. Maskell, N. Gordon and T. Clapp, βA Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking,β IEEE Trans. on Signal Pro- cessing, vol. 50, no. 2, pp. 174-188, (2002). 10. R. A. Mark, J. A. Scheer and W. A. Holm, βPrinciples of Modern Radar: Basic Princi- ples,β SciTech Publishing, (2010).