=Paper= {{Paper |id=Vol-2498/short31 |storemode=property |title=Indoor localization using multiple stereo speakers for smartphones |pdfUrl=https://ceur-ws.org/Vol-2498/short31.pdf |volume=Vol-2498 |authors=Masanari Nakamura,Hiroshi Kameda |dblpUrl=https://dblp.org/rec/conf/ipin/NakamuraK19 }} ==Indoor localization using multiple stereo speakers for smartphones== https://ceur-ws.org/Vol-2498/short31.pdf
    Indoor Localization Using Multiple Stereo Speakers
                     for Smartphones

                         Masanari Nakamura1 and Hiroshi Kameda1
     1
         Information Technology R&D Center, Mitsubishi Electric Corp., Kanagawa, Japan
               Nakamura.Masanari@db.MitsubishiElectric.co.jp



         Abstract. In this paper, we propose an acoustic indoor localization method for
         an embedded microphone of a smartphone using multiple stereo speakers in-
         stalled indoors. In this configuration, although stereo speakers are synchro-
         nized, speakers not producing stereo sound are not synchronized. Therefore, a
         moving microphone could receive acoustic signals required for localization at
         different locations. This causes bias error in the conventional method. To ad-
         dress this issue, we propose a method that utilizes an asynchronous tracking fil-
         ter and compensates the differences in locations of received signals using mo-
         tion modelling. Through experiments, we verify that our proposed method can
         effectively reduce the bias error.

         Keywords: Acoustic Signal, Asynchronous Tracking Filter, Smartphone, Time
         Difference of Arrival


1        Introduction

   These days, mobile devices like smartphones are highly popular, and localization
using such devices are gaining attention [1]. Although global navigation satellite sys-
tem (GNSS) is a common localization method, it cannot be used indoors because
receiving GNSS signals is difficult in such environments. Hence, other approaches
employing the embedded sensors of smartphones are required.
    Acoustic signals are suitable for accurate indoor localization using smartphones.
In a related work [2-5], such signals were produced by speakers and received by a
microphone; subsequently, the microphone’s location was calculated. The time differ-
ence of arrival (TDoA) between the received signals provides elliptic hyperboloid,
which indicates the area where the microphone exists. The estimation of location is
done by calculating the intersection between multiple elliptic hyperboloids
   In these methods, highly precise time synchronization of all the speakers is re-
quired. Certain audio players that can synchronize several speakers are available.
However, they are more expensive than commercial off the shelf (COTS) stereo
speakers, which have only two channels owing to their limited application.
   Therefore, we utilize COTS stereo speakers in this study. These speakers can pro-
duce two acoustic signals simultaneously from two sides. The receiver obtains the
elliptical hyperboloid from the TDoA of the speaker signals. Using the TDoAs of
2


multiple stereo speakers, the receiver’s location can be calculated. However, when the
microphone is moving, TDoA can be observed at different locations, thereby render-
ing the intersection of elliptical hyperboloid to drift away from the true location. This
is because speakers not producing stereo sound cannot produce signals simultaneous-
ly.
    We herein propose an asynchronous tracking filter to compensate the difference of
observation locations, and therefore reduce the bias error.
    The rest of this paper is organized as follows. Chapter 2 describes the conventional
method and its problem. Chapter 3 presents the proposed method for dealing with the
above-mentioned problem of bias error. In Chapter 4, simulation experiments were
conducted to evaluate the proposed method. Chapter 5 concludes this paper.


2      Related work

    In indoor environments, some localization methods using acoustic signals utilize
the embedded speakers of smartphones and microphones installed at indoor environ-
ment [6]. Whereas, other methods use smartphone’s embedded microphone and
speakers installed at indoor environment [2-5]. Herein, these two types of methods are
referred to as active-tag system and passive-tag system, respectively. In the former,
multiple signals produced by speakers of multiple smartphones collide at the micro-
phone. Although difficult, if the transmission time of speakers is precisely synchro-
nized, this collision is avoidable. Therefore, the latter being more preferable for local-
ization of multiple devices, is the focus of this section.
    Acoustic indoor localization methods are of two types: ranging-based and TDoA-
based. The former utilizes range measurements between speakers and microphones.
The latter utilizes the difference in signal received times. Ranging-based localization
is typically more precise than TDoA-based localization. However, it requires high
accurate time synchronization (e.g., ΞΌ order) between speakers and microphones and
is therefore used for systems employing designated devices [7]. TDoA-based localiza-
tion does not require such synchronizations. As accurate time synchronization is diffi-
cult in smartphones, TDoA-based localization more preferable. We herein describe
the localization for a two-dimensional space for our convenience.
    The relationship between the location and received time is represented as

                     𝑑𝑑𝑒𝑒𝑒𝑒𝑒𝑒 �𝒓𝒓1 βˆ’ 𝒓𝒓(𝑑𝑑1 )οΏ½ βˆ’ 𝑑𝑑𝑒𝑒𝑒𝑒𝑒𝑒 �𝒓𝒓2 βˆ’ 𝒓𝒓(𝑑𝑑2 )οΏ½ = 𝑐𝑐 β‹… (𝑑𝑑1 βˆ’ 𝑑𝑑2 )
                    οΏ½                                                                            (1)
                     𝑑𝑑𝑒𝑒𝑒𝑒𝑒𝑒 �𝒓𝒓1 βˆ’ 𝒓𝒓(𝑑𝑑1 )οΏ½ βˆ’ 𝑑𝑑𝑒𝑒𝑒𝑒𝑒𝑒 �𝒓𝒓3 βˆ’ 𝒓𝒓(𝑑𝑑3 )οΏ½ = 𝑐𝑐 β‹… (𝑑𝑑1 βˆ’ 𝑑𝑑3 )

      where the term π’“π’“π’Šπ’Š = (π‘₯π‘₯𝑖𝑖 , 𝑦𝑦𝑖𝑖 ) denotes the position vector of speaker 𝑖𝑖, and 𝒓𝒓(𝑑𝑑)
denotes the position vector of microphone at time 𝑑𝑑. 𝑑𝑑𝑖𝑖 (𝑖𝑖 = 1,2,3) denotes the re-
ceived time of signal produced by speaker 𝑖𝑖,c is the sound speed, and 𝑑𝑑𝑒𝑒𝑒𝑒𝑒𝑒 (β‹…) is the
Euclidean distance. Equations in (1) represent hyperbolic curves of the microphone’s
position 𝒓𝒓(𝑑𝑑).
   When the smartphone is stationary, the positions, 𝒓𝒓(𝑑𝑑1 ), 𝒓𝒓(𝑑𝑑2 ), 𝒓𝒓(𝑑𝑑3 ) are the same
and location can be calculated by solving equation (1). When the smartphone is mov-
ing, 𝒓𝒓(𝑑𝑑1 ), 𝒓𝒓(𝑑𝑑2 ), 𝒓𝒓(𝑑𝑑3 )are not the same. However, the differences are insignificant as
                                                                                                             3


all the speakers are synchronized. Therefore, location can be obtained without large
bias error.
   To simultaneously produce signals from all speakers, a special device such as a
multi-channel audio player is required. As such devices are expensive, we instead use
a multiple COTS stereo speaker, which hereinafter is referred to as a unit.
   In this configuration, although speakers from a same unit are synchronized, those
from different units are not. The relationship between the location and received time
is
                    𝑑𝑑 (𝒓𝒓1 βˆ’ 𝒓𝒓(𝑑𝑑𝑅𝑅1 )) βˆ’ 𝑑𝑑𝑒𝑒𝑒𝑒𝑒𝑒 (𝒓𝒓1𝐿𝐿 βˆ’ 𝒓𝒓(𝑑𝑑𝐿𝐿1 )) = 𝑐𝑐 β‹… (𝑑𝑑𝑅𝑅1 βˆ’ 𝑑𝑑𝐿𝐿1 )
                   οΏ½ 𝑒𝑒𝑒𝑒𝑒𝑒 2𝑅𝑅                                                                         .   (2)
                    𝑑𝑑𝑒𝑒𝑒𝑒𝑒𝑒 (𝒓𝒓𝑅𝑅 βˆ’ 𝒓𝒓(𝑑𝑑𝑅𝑅2 )) βˆ’ 𝑑𝑑𝑒𝑒𝑒𝑒𝑒𝑒 (𝒓𝒓2𝐿𝐿 βˆ’ 𝒓𝒓(𝑑𝑑𝐿𝐿2 )) = 𝑐𝑐 β‹… (𝑑𝑑𝑅𝑅2 βˆ’ 𝑑𝑑𝐿𝐿2 )
    where π‘Ÿπ‘Ÿπ‘…π‘…π‘’π‘’ , π‘Ÿπ‘ŸπΏπΏπ‘’π‘’ are locations of speakers R and L, belonging to unit 𝑒𝑒 (𝑒𝑒 = 1,2), re-
spectively; 𝑑𝑑𝑅𝑅𝑒𝑒 οΌŒπ‘‘π‘‘πΏπΏπ‘’π‘’ are received times of signals produced by R and L, respectively.
     When the smartphone is stationary, microphone positions, 𝒓𝒓(𝑑𝑑𝑅𝑅1 ), 𝒓𝒓(𝑑𝑑𝐿𝐿1 ), 𝒓𝒓(𝑑𝑑𝑅𝑅2 ),
and 𝒓𝒓(𝑑𝑑𝐿𝐿2 ) are same, and the location can be easily estimated without bias error. When
the smartphone is moving, equations 𝒓𝒓(𝑑𝑑𝑅𝑅1 ) β‰ˆ 𝒓𝒓(𝑑𝑑𝐿𝐿1 ) and 𝒓𝒓(𝑑𝑑𝑅𝑅2 ) β‰ˆ 𝒓𝒓(𝑑𝑑𝐿𝐿2 ) are satisfied
owing to speaker synchronization. However, the differences between 𝒓𝒓(𝑑𝑑𝑅𝑅1 ) and
𝒓𝒓(𝑑𝑑𝑅𝑅2 ) can increase if the smartphone starts moving. In this case, the intersection of the
hyperbolic curve is drifted away from the true position (refer Figure 1-(a)), and loca-
tion estimated by solving (2) incurs bias error.




           Fig. 1. Localization of moving smartphone using multiple stereo speakers

   Takabayashi [8] proposed a bias error reduction method for single active-tag sys-
tems. We herein apply this method to a passive-tag system with multiple COTS stereo
speakers.


3       Proposed method

3.1     System architecture
   In the proposed method, two speakers of a unit simultaneously produce acoustic
signals. If the transmission time of each unit is not controlled, numerous signals could
collide with each other at the microphone, thereby resulting in large errors. Therefore,
4


the transmission time of each unit should be controlled using general communication
systems such as Wi-Fi and Bluetooth.
   Speaker time lags in conventional methods produce bias errors, therefore necessi-
tating a multiple-channel audio player to synchronize all speakers [2].
   The smartphone’s current position is estimated using an asynchronous tracking fil-
ter when a TDoA is measured. The estimation is done by calculating the intersection
between the predicted position and hyperbolic curve of TDoA, as shown in Figure 1-
(b), and it reduces the bias error as shown in Figure 1-(a), the details of which are
described in the next section.

3.2      Asynchronous tracking filter.
TDoA observation model and motion model.
 The state vector, 𝒙𝒙 comprises position and velocity [π‘₯π‘₯ 𝑦𝑦 π‘₯π‘₯Μ‡ 𝑦𝑦̇ ]𝑑𝑑𝑑𝑑 where π‘₯π‘₯, 𝑦𝑦 are
positions, π‘₯π‘₯Μ‡ , 𝑦𝑦̇ are velocities, and 𝑑𝑑𝑑𝑑 is the transpose. The relationship between the
state and position is represented as
                                                        1 0 0 0οΏ½
                                               𝒓𝒓 = οΏ½            𝒙𝒙 = 𝑯𝑯𝑯𝑯                                          (3)
                                                        0 1 0 0
  In the asynchronous tracking filter, the estimation is conducted sequentially at every
TDoA measurement, comprising 𝑑𝑑𝑅𝑅𝑒𝑒 and 𝑑𝑑𝐿𝐿𝑒𝑒 . We herein define the lesser of 𝑑𝑑𝑅𝑅𝑒𝑒 or t 𝑒𝑒𝐿𝐿 as
the TDoA observation time. π‘‘π‘‘π‘˜π‘˜ denotes as the observation time of π‘˜π‘˜-th TDoA.
 The TDoA measurement, π‘§π‘§π‘˜π‘˜π‘’π‘’ received at π‘‘π‘‘π‘˜π‘˜ , corresponding to unit 𝑒𝑒 is represented as

                    π‘§π‘§π‘˜π‘˜π‘’π‘’ = �𝑑𝑑𝑅𝑅𝑒𝑒 + 𝑛𝑛𝑑𝑑𝑅𝑅𝑒𝑒 οΏ½ βˆ’ �𝑑𝑑𝐿𝐿𝑒𝑒 + 𝑛𝑛𝑑𝑑𝐿𝐿𝑒𝑒 οΏ½ = β„Žπ‘’π‘’ (π’™π’™π’Œπ’Œ ) + �𝑛𝑛𝑑𝑑𝑅𝑅𝑒𝑒 βˆ’ 𝑛𝑛𝑑𝑑𝐿𝐿𝑒𝑒 οΏ½     (4)

   where 𝑛𝑛𝑑𝑑𝑅𝑅𝑒𝑒 , 𝑛𝑛𝑑𝑑𝐿𝐿𝑒𝑒 are the measured noises of received times produced by R and L of
the unit 𝑒𝑒, respectively. π’™π’™π’Œπ’Œ is the state vector [π‘₯π‘₯π‘˜π‘˜ π‘¦π‘¦π‘˜π‘˜ π‘₯π‘₯Μ‡ π‘˜π‘˜ 𝑦𝑦̇ π‘˜π‘˜ ]𝑑𝑑𝑑𝑑 at time π‘‘π‘‘π‘˜π‘˜ . From
equation (2), β„Žπ‘’π‘’ (π’™π’™π’Œπ’Œ ) represents
                                   𝑑𝑑𝑒𝑒𝑒𝑒𝑒𝑒 (𝒓𝒓𝒖𝒖𝑅𝑅 βˆ’ π‘―π‘―π‘―π‘―π‘˜π‘˜ ) βˆ’ 𝑑𝑑𝑒𝑒𝑒𝑒𝑒𝑒 (𝒓𝒓𝑒𝑒𝐿𝐿 βˆ’ π‘―π‘―π’™π’™π‘˜π‘˜ )
                  β„Žπ‘’π‘’ (π’™π’™π‘˜π‘˜ ) =                                                                                     (5)
                                                               𝑐𝑐
   The observation noises 𝑛𝑛𝑑𝑑𝑅𝑅𝑒𝑒 , 𝑛𝑛𝑑𝑑𝐿𝐿𝑒𝑒 are white Gaussian noise with zero-mean and
known variance πœŽπœŽπ‘‘π‘‘2𝑒𝑒 , πœŽπœŽπ‘‘π‘‘2𝑒𝑒 , respectively. The measurement error of TDoA π‘›π‘›π‘˜π‘˜π‘’π‘’ is
                            𝑅𝑅    𝐿𝐿
   𝑛𝑛𝑑𝑑𝑅𝑅𝑒𝑒 βˆ’ 𝑛𝑛𝑑𝑑𝐿𝐿𝑒𝑒 . Notably, we assume that 𝑛𝑛𝑑𝑑𝑅𝑅𝑒𝑒 π‘Žπ‘Žnd 𝑛𝑛𝑑𝑑𝐿𝐿𝑒𝑒 are independent. In this case, π‘›π‘›π‘˜π‘˜π‘’π‘’
is white Gaussian noise with zero-mean and variance πœŽπœŽπ‘‘π‘‘2𝑒𝑒 + πœŽπœŽπ‘‘π‘‘2𝑒𝑒 because Gaussian
                                                                            𝑅𝑅    𝐿𝐿
distribution has the reproductive property.
  As the motion model of the microphone, we utilize the constant velocity model.
                1      0 π›₯π›₯π‘‘π‘‘π‘˜π‘˜          0           π›₯π›₯π‘‘π‘‘π‘˜π‘˜             0
                0      0 1               0             0               π›₯π›₯π‘‘π‘‘π‘˜π‘˜
       π’™π’™π‘˜π‘˜ = οΏ½                              οΏ½ 𝒙𝒙  +οΏ½                         οΏ½ π’˜π’˜π‘˜π‘˜ = 𝑭𝑭(π›₯π›₯π‘‘π‘‘π‘˜π‘˜ )π’™π’™π‘˜π‘˜βˆ’1 + πœžπœžπ’˜π’˜π‘˜π‘˜   (6)
                0      1 0             π›₯π›₯π‘‘π‘‘π‘˜π‘˜ π‘˜π‘˜βˆ’1     1                0
                0      0 0               1             0                1
                                                                                            5

                                                              𝑦𝑦 𝑑𝑑𝑑𝑑
    where π’˜π’˜π€π€ is the process noise vector οΏ½π‘€π‘€π‘˜π‘˜π‘₯π‘₯ π‘€π‘€π‘˜π‘˜ οΏ½ which represents ambiguity of
                          𝑦𝑦
the motion. π‘€π‘€π‘˜π‘˜π‘₯π‘₯ and π‘€π‘€π‘˜π‘˜ are white Gaussian noise with zero-mean and variance πœŽπœŽπ‘€π‘€2 .
Ξ”π‘‘π‘‘π‘˜π‘˜ is the difference of the observation times π‘‘π‘‘π‘˜π‘˜ βˆ’ π‘‘π‘‘π‘˜π‘˜βˆ’1.

Algorithm
  We herein describe the algorithm to estimate the microphone state. When the first
                                                                                       𝑗𝑗
measurement is inputted to the asynchronous tracking filter (π‘˜π‘˜ = 1), particles π’™π’™π‘˜π‘˜
representing the microphone state are generated (𝑗𝑗 = 1, … , 𝐽𝐽). Each element of the
        𝑗𝑗      𝑗𝑗    𝑗𝑗                     𝑑𝑑𝑑𝑑
state π’™π’™π‘˜π‘˜ = οΏ½π‘₯π‘₯π‘˜π‘˜ π‘¦π‘¦π‘˜π‘˜ π‘₯π‘₯Μ‡ 𝑗𝑗 π‘˜π‘˜ 𝑦𝑦̇ 𝑗𝑗 π‘˜π‘˜ οΏ½ is generated by uniform random numbers. The
           𝑗𝑗                 𝑗𝑗                  𝑗𝑗                    𝑗𝑗        𝑗𝑗
weight 𝛼𝛼k of the state π’™π’™π‘˜π‘˜ is given as 𝛼𝛼k = 1/𝐽𝐽. When π’™π’™π‘˜π‘˜ and 𝛼𝛼k were already gener-
ated (π‘˜π‘˜ β‰₯ 2), the prediction based on the motion model (6) is conducted for each par-
ticle.
   In the updating step, the likelihood of each particle is calculated using π‘§π‘§π‘˜π‘˜π‘’π‘’ and the
          𝑗𝑗
weight π›Όπ›Όπ‘˜π‘˜ is updated.
                                                                              2
                                 𝑗𝑗        1            οΏ½π‘§π‘§π‘˜π‘˜π‘’π‘’ βˆ’ β„Žπ‘’π‘’ (π’™π’™π‘˜π‘˜π‘—π‘— )οΏ½
                               π›Όπ›Όπ‘˜π‘˜ =            exp οΏ½βˆ’                          οΏ½      (7)
                                        √2πœ‹πœ‹πœŽπœŽ 2                 2𝜎𝜎 2

                    οΏ½π‘˜π‘˜ is obtained by the weighted sum βˆ‘π½π½π‘—π‘—=1 π›Όπ›Όπ‘˜π‘˜π‘—π‘— β‹… π’™π’™π‘˜π‘˜π‘—π‘— .
The estimated state 𝒙𝒙
  When the next measurement is inputted, the above-mentioned process is repeated
with the resampled particles [9].


4       Numerical result

4.1     Scenarios
  We conducted following simulation experiments for the evaluation of our proposed
method. Figure 2 shows the microphone route and the four speaker locations. The
microphone moves at a speed of 1 m/sec and receives its first TDoA measurement at
the route’s starting position (-3.0, 1.5).




                                        Fig. 2. Simulation scenario

 At indoor environments, the transmission interval is designed according to the sig-
nal length and the reverberation time of multipath. Regarding the signal length, a
6


short one with large amplitude is preferable for the precision and the updating rate of
localization. However, the amplitude is generally restricted due to speaker’s inaudibil-
ity. To obtain the desired signal-to-noise ratio (SNR) with the restricted amplitude, a
longer signal can be utilized. A reverberation time of 100 ms is sufficient to attenuate
the multipath [4]. Considering the above-mentioned reasons, transmission intervals of
these simulations are set to 100 ms and 300 ms for the short and the long signal cases,
respectively.
  The observation noise of the received time 𝜎𝜎 depends on the bandwidth and SNR.
The bandwidth and SNR herein are set to 1 kHz and 30 dB, respectively. These pa-
rameters were determined in consideration of the characteristic of microphones em-
bedded in smartphones [4-5]. In this case, the observation noise 𝜎𝜎 is 1.58 Γ— 10βˆ’5 sec.
Hence, the observation noise of TDoA is 2 Γ— 1.58 Γ— 10βˆ’5 = 3.16 Γ— 10βˆ’5 sec.
      The conventional method numerically solves equation (2) using the current and
previous measurements. The observation noise, process noise, and number of particles
of the proposed method were set to 3.16 Γ— 10βˆ’5 sec, 0.5 m/sec, and 5000, respective-
ly. The range of uniform random numbers to generate these particles
                                    β€²
οΏ½π‘₯π‘₯π‘˜π‘˜π‘—π‘— π‘¦π‘¦π‘˜π‘˜π‘—π‘— π‘₯π‘₯Μ‡ 𝑗𝑗 π‘˜π‘˜ 𝑦𝑦̇ 𝑗𝑗 π‘˜π‘˜ οΏ½ at π‘˜π‘˜ = 1 were [-3, 3],[-1, 3],[-1.5, 1.5],[-1.5, 1.5] m, re-
spectively.
      Root mean square error (RMSE) was used for evaluating each trial with the num-
ber of trials set to 100. The starting position π‘˜π‘˜ = 1 was excluded from this evaluation
as the conventional method cannot estimate it.


4.2     Results and Discussion
  Figures 3-(a) and 3-(b) show the results of conventional and proposed methods with
transmission intervals set to 100 ms and 300 ms, respectively. The horizontal axis
shows the signal receiving position, clearly indicating the relationship between the
RMSE and the receiving position as well as effectiveness of the proposed method in
RMSE reduction.




                               Fig. 3. RMSE of evaluation result

 Figures 4-(a) and 4-(b) show examples of results obtained by the conventional and
proposed methods, respectively. In Figure 4-(a), estimated locations were around the
intersections of hyperbolic curves, far from true positions, which represent the bias
                                                                                         7


errors. Figure 4-(b) indicates that the proposed method can reduce these errors using
the asynchronous tracking filter.




            Fig. 4. Example of estimated locations (transmission interval: 300 ms)

 In Figures 3-(a) and 3-(b), RMSE of the proposed method was minimum around π‘₯π‘₯ =
0 m, turning slightly worse in π‘₯π‘₯ > 0 m. For clarification, predicted particles at π‘₯π‘₯ =
0 m and 2.5 m were plotted in Figures 5-(a) and 5-(b). The transmission interval was
set to 300 ms. In these figures, the particles are scattered in an ellipse, and the major
axis was roughly parallel to the previously plotted hyperbolic curve. This is because
these particles were generated by extrapolating the resampled particles based on the
previously obtained hyperbolic curve.




                            Fig. 5. Predicted particles distribution

   When the microphone’s position was π‘₯π‘₯ = 0 m (Figure 5-(a)), current hyperbolic
curve crossed the major axis of the ellipse nearly at right angles. In this case, the area
of resampled particles was narrow, leading to high precision. When the microphone’s
position was π‘₯π‘₯ = 2.5 m (Figure 5-(b)), the minor axis of the ellipse crossed the cur-
rent hyperbolic curve at roughly right angles, and the area of resampled particles was
relatively broad, leading to reduction in precision.
8


5      Conclusion

   In this paper, we described an acoustic indoor localization system with multiple
COTS stereo speakers. We proposed the asynchronous tracking filter to reduce the
bias error caused by the asynchrony of speakers that do not produce stereo sound
when the smartphone is moving. The simulation experiments showed that the pro-
posed method can effectively reduce the bias error.


References
 1. J. Xiao, Z. Zhou, Y. Yi and L. M. Ni, β€œA survey on wireless indoor localization from the
    device perspective,” ACM Computing Surveys, vol. 49, no. 2, pp. 25:1-25:31, (2016).
 2. K. Liu, X. Liu and X. Li, β€œGuoguo: Enabling Fine-Grained Smartphone Localization via
    Acoustic Anchors,” IEEE Trans. on Mobile Computing, vol. 15, no. 5, pp. 1144-1156,
    (2016).
 3. F. J. Álvarez, T. Aguilera and R. L. Valcarce, β€œCDMA-based acoustic local positioning
    system for portable devices with multipath cancellation,” Digital Signal Processing, vol.
    62, pp. 38-51, (2017).
 4. T. Akiyama, M. Nakamura, M. Sugimoto and H. Hashizume, β€œSmart phone localization
    method using dual-carrier acoustic waves,” Proc. of IPIN 2013, pp. 1-9, 2013.
 5. M. Nakamura, T. Akiyama, M. Sugimoto and H. Hashizume, β€œ3D FDM-PAM: rapid and
    precise indoor 3D localization using acoustic signal for smartphone,” Proc. of ACM
    Ubicomp 2014, pp. 123-126, (2014).
 6. F. HΓΆflinger, R. Zhang, J. Hoppe, A. Bannoura, A. Reindl, J. Wendeberg, M. Buhrer and
    C. Schindelhauer, β€œAcoustic Self-calibrating System for Indoor Smartphone Tracking
    (ASSIST),” Proc. of IEEE IPIN 2012, pp. 1-9, (2012).
 7. N. Priyantha, A. Chakraborty and H. Balakrishnan, β€˜The Cricket Location Support Sys-
    tem,” Proc. of ACM MobiCom 2000, pp. 32-43, (2000).
 8. Y. Takabayashi, T. Matsuzaki, H. Kameda and M. Ito, β€œTarget tracking using
    TDOA/FDOA measurements in the distributed sensor network,” Proc. of SICE 2008, pp.
    3441-3446, (2008).
 9. S. M. Arulampalam, S. Maskell, N. Gordon and T. Clapp, β€œA Tutorial on Particle Filters
    for Online Nonlinear/Non-Gaussian Bayesian Tracking,” IEEE Trans. on Signal Pro-
    cessing, vol. 50, no. 2, pp. 174-188, (2002).
10. R. A. Mark, J. A. Scheer and W. A. Holm, β€œPrinciples of Modern Radar: Basic Princi-
    ples,” SciTech Publishing, (2010).