Self-Synchronizing Acoustic Positioning System Based on TDoA
Shuai Cao 1, Ya P. Liu 1 , Jian Xu 1, and Sheng B. Pei 2
1
    Institutes of Physical Science and Information Technology, Anhui University, Hefei 230039, China
2
    School of Computer Science and Technology, Anhui University, Hefei 230039, China

                 Abstract
                 A self-synchronizing system for acoustic positioning using low-cost acoustic devices is
                 proposed in this paper. It includes a master base station and multiple slave base stations.
                 During a positioning cycle, the master base station first transmits an audio signal with
                 synchronization and positioning functions. Then, after detecting the audio signal transmitted
                 by the master base station, each slave base station transmits the positioning audio signal after
                 delaying the set time. Finally, the target realizes its own position estimation by detecting the
                 arrival time of the audio signal transmitted by the master base station and each slave base
                 station. The proposed system realizes self-synchronization by using known information such
                 as the distance between base stations and the delay time of each slave base station. After
                 synchronization, the audio arrival time of each base station received by the target is used for
                 position estimation. Compared with the previous synchronization method, the adopted self-
                 synchronization method avoids complicated wiring and radio interference and greatly reduces
                 the cost of synchronization. The simulation results show that under the test conditions when
                 the detection noise level of the target is less than 2.5 ms, the localization accuracy of the
                 proposed system is better than 1 m.

                 Keywords 1
                 acoustic positioning, self-synchronizing, master base station, slave base station


1. Introduction
    Location-aware technology has important applications in smart cities, the Internet of Things,
medical monitoring, etc. Based on the type of signals used for positioning, location-aware technology
is mainly divided into radio, motion signal, geomagnetic, image, audio, etc. In radio positioning
technology, Bluetooth and WIFI based on received signal strength indication (RSSI) usually use
signal attenuation model ranging or fingerprint method to achieve positioning, which has low
positioning accuracy (meter level) [1]-[2], the large workload of offline fingerprint collection and
update, and signal attenuation model is highly susceptible to environmental interference. The angle of
arrival (AoA) based on antenna array[3], WIFI-based round-trip time (RTT)[4]-[6] and channel status
information (CSI)[7], and ranging-based UWB technology[8] can provide centimeter-level to sub-
meter-level high-precision positioning, but wireless base stations and smart terminals are required to
support related protocols, the deployment complexity is high, and the wide-area coverage requires
extremely high costs. Pedestrian dead reckoning (PDR) based on motion signal [9] has the advantages
of not requiring additional infrastructure and being compatible with mobile phones, but there is an
accumulation of errors, and accurate positioning cannot be achieved for a long time. Geomagnetic-
based positioning [10] can use the mobile phone's magnetic sensor to achieve positioning without
additional infrastructure but requires the offline collection of the indoor geomagnetic field distribution.
Image-based localization method [11] can achieve high-precision localization of the target, but it
needs to establish an image feature library in advance. Besides, the hardware cost is high, the


IPIN 2022 WiP Proceedings, September 5–7, 2022, BEIJING, CHINA
EMAIL: caoshuai@ustc.edu.cn (Shuai Cao); 306694534@qq.com (Ya P. Liu); 3381964790@qq.com (Jian Xu); shengbingpei@ahu.edu.cn
(Sheng B. Pei)
            ©️ 2022 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)
calculation is complex, and the performance is easily affected by environmental textures and shooting
conditions.
    The audio-based positioning technology [12] uses acoustic devices such as microphones and
speakers to achieve positioning. To support high concurrency, acoustic positioning systems (APS)
usually adopt a passive architecture, in which the base stations only transmit audio and the targets
receive and process the audio. The characteristics of this architecture are 1) Low system building
complexity. Since the propagation speed of audio is much lower than that of radio, the requirement of
clock synchronization accuracy to achieve high-precision acoustic positioning is not very high, 2)
Low system cost. On the one hand, due to the low cost of commercial audio components, although the
realization of acoustic positioning requires the deployment of acoustic base stations, location service
providers are expected to provide users with high-precision location services through low-cost
infrastructure investment. Furthermore, if the positioning of smart terminals is realized with the help
of broadcast speakers in large shopping malls, stations, and airports, there is no need to deploy
additional base stations. On the other hand, for the users, the microphones and the speakers are the
standard configurations of the hand-held mobile terminals, and there is no need to provide additional
overhead for users, 3) Good privacy. The targets can realize their own positioning only by receiving
signals without information exchanges with the acoustic base stations. Without the user's permission,
the user's location cannot be obtained by others, which is suitable for occasions with high privacy
requirements, and 4) Suitable for positioning occasions with severe electromagnetic interference and
metal substances. Compared with radio signals, the transmission of audio signals will not be disturbed
by electromagnetic waves, nor is it not affected by metal substances.
    Because synchronization between base stations and targets is not required, time difference of
arrival (TDoA) is a common measurement type used in high-precision acoustic positioning systems.
Accurate acquisition of TDoA is the key to ensuring the performance of the acoustic positioning
system, which not only requires the target to accurately detect the audio arrival time but also requires
accurate synchronization between acoustic base stations. The acoustic positioning systems involved in
previous studies mainly use wireless or wired methods to achieve synchronization [13]-[16]. The
wireless method uses radio to achieve synchronization between the base stations, which may cause
radio interference. The wired method uses connection lines to achieve synchronization between the
base stations, which is difficult to deploy. Both above synchronization methods are costly.
    In this paper, a novel self-synchronizing acoustic positioning system is proposed, which consists of
a master base station and multiple slave base stations. In a positioning cycle, the master base station
first transmits an audio signal with synchronization and positioning functions. Then, each slave base
station delays the set time after detecting the audio signal transmitted by the master base station and
starts to transmit the positioning audio signal. Finally, the target realizes its real-time position
estimation by detecting the arrival time of the audio signals transmitted by the master base station and
each slave base station. The proposed acoustic positioning system utilizes the acoustic devices
themselves to realize self-synchronization. Compared with the previous synchronization methods,
complicated wiring and radio interference are avoided, and the synchronization cost is greatly reduced.
    The rest of the paper is organized as follows. The basic principle of proposed self-synchronizing
acoustic positioning systems is presented in Section Ⅱ. The numerical simulations are described in
Section Ⅲ, and a conclusion is drawn in Section Ⅳ.

2. Self-synchronizing Acoustic Positioning
   The basic principle described below takes two-dimensional positioning as an example and is also
applicable to three-dimensional situations. As shown in Figure 1, it is assumed that there are four
acoustic base stations in the positioning area, which are represented by A, B, C, and D respectively.
The coordinates of the acoustic base stations are known, and each acoustic base station contains a
speaker and a microphone. To ensure high concurrency, the target only uses its microphone to
passively receive audio signals and realizes position calculation by detecting the arrival time of the
audio transmitted by each acoustic base station.
   The proposed self-synchronizing acoustic positioning system consists of a master base station and
multiple slave base stations. In a positioning cycle, the master base station first transmits an audio
signal that will be received by the target and each slave base station. Then, after detecting the arrival
time of the audio signal transmitted by the master base station, each slave base station delays the set
time and immediately emits the positioning audio signal. Finally, the target detects the time of arrival
of audio transmitted by all base stations, calculates the TDoA, and estimates the position using a
TDoA-based localization algorithm. Taking the timing diagram as an example, as shown in Figure 2,
the positioning processes of the self-synchronizing acoustic positioning system are described as
follows:


Figure 1: Schematic diagram of the self-synchronizing acoustic positioning system


Figure 2: Timing diagram of the self-synchronizing acoustic positioning system

    1. In a positioning period, such as 1 second, the master base station A transmits an audio signal
       at time TA. To suppress background noise, the audio signal above is modulated, such as chirp
       signal, orthogonal code modulation signal, etc.
    2. After receiving the audio signal transmitted by base station A, base stations B, C, and D
       detect the arrival times of the audio, which are represented by RB, RC, and RD respectively.
       The target also detects the arrival time of the audio signal transmitted by base station A,
       which is represented by RAT. In Figure 2, tAB, tAC, and tAD are used to represent the time taken
       for the audio transmitted by the base station A to reach the base stations B, C, and D,
       respectively, we have:

                                                tAB = RB - TA
                                               {tAC = RC - TA                                         (1)
                                                tAD = RD - TA
    3. Delayed by tdelayB from moment RB, namely at moment TB, base station B transmits
       positioning audio signal, which is then received by the target at moment RBT. tBT is the flight
       time for the positioning audio signal transmitted by base station B to reach the target.
       Similarly, Delayed by tdelayC from moment RC, namely at moment TC, base station C transmits
       positioning audio signal, which is then received by the target at moment RCT. tCT is the flight
       time for the positioning audio signal transmitted by base station C to reach the target. For
       base station D, the processing method is the same as above, and the corresponding time
       parameters are tdelayD, TD, RDT and tDT.
    4. According to Figure 2, formula (2) can be obtained:
                                    RBT - RAT = tAB + tdelayB + tBT - tAT
                                  { RCT - RAT = tAC + tdelayC + tCT - tAT                          (2)
                                    RDT - RAT = tAD + tdelayD + tDT - tAT
        Formula (3) is derived from formula (2):
                                    tBT - tAT = (RBT - RAT ) - (tAB + tdelayB )
                                  { tCT - tAT =(RCT - RAT ) - (tAC + tdelayC )                            (3)
                                   tDT - tAT = (RDT - RAT ) - (tAD + tdelayD )
        Multiply both sides of the equal sign of formula (3) by the speed of sound v (unit: m/s) to
        obtain formula (4).
                                 dBT − dAT = v ∙ (RBT – RAT ) – (dAB + v ∙ tdelayB )
                               { dCT − dAT = v ∙ (RCT – RAT ) – (dAC + v ∙ tdelayC )            (4)
                                dDT − dAT = v ∙ (RDT – RAT ) - (dAD + v ∙ tdelayD )
         where dAT , dBT , dCT and dDT are the distances between the target and base stations A, B, C
         and D, respectively. dAB , dAC , and dAD are the distances between base station A and base
         stations B, C, and D, respectively. RAT , RBT , RCT and RDT are the arrival times extracted by
         the target from the received audio signal using the audio detection algorithm [17]. On the
         right side of formula (4), dAB , dAC , dAD , v, tdelayB, tdelayC and tdelayD are known, (RBT - RAT ),
         (RCT - RAT ) and (RDT - RAT ) are the time interval of target detection, which are measured
         values and can be obtained according to the number of sampling points and sampling period.
         Therefore, the distance differences on the left side of the formula (4) are determined, i.e., with
         base station A as the reference point, the distance difference(dBT - dAT ) between the target
         and base stations B and A, the distance difference(dCT - dAT ) between the target and base
         stations C and A, and the distance difference(dDT - dAT ) between the target and base stations
         D and A are obtained.
     5. Substitute the distance difference information in formula (4) into the TDoA-based positioning
         algorithm to estimate the target's position.
    For each positioning cycle, the acoustic position system repeats the above steps to achieve target
positioning. In the whole process, the target only receives the audio signal transmitted by the base
stations and does not interact with the base stations, which makes the system supports high
concurrency, i.e., the number of users is not limited. This paper realizes a self-synchronization acoustic
positioning system through the transmitted audio signal. There are no connection lines between the
base stations, which is convenient for installation and layout. It also avoids the radio interference
problem existing in traditional radio synchronization. Furthermore, the proposed synchronization
method is low cost considering the acoustic components such as commercial loudspeakers,
microphones, etc., are cheap.

3. Numerical Simulations
    To analyze the positioning performance of the proposed system, simulations were performed to
investigate the influence of the detection accuracy of the target on the position estimation accuracy.
    The factors affecting the positioning accuracy of the proposed system include: (1) The errors of the
arrival time of the audio signal transmitted by the master base station detected by the slave base
stations, that is, the errors of RB , RC and RD in Figure 2, are represented by 𝒆𝐁 , 𝒆𝐂 and 𝒆𝐃
respectively. Since the coordinates of the base stations are fixed, the paths between the master and
slave base stations are in the line-of-sight (LOS) state, the acoustic channel states are stable, and if the
system is calibrated, the noise will be at a low level. (2) The errors of the delay tdelayB , tdelayC , and
tdelayD caused by the system clock deviations of the slave base stations are relatively small, so this
simulation ignored the influence of this factor. (3) The detection errors of the audio detection
algorithm used by the target, corresponding to the four base stations A, B, C, and D, are represented
by 𝒆𝐀𝐓 , 𝒆𝐁𝐓, 𝒆𝐂𝐓 , and 𝒆𝐃𝐓 , respectively. Due to the influence of strong multipath and non-line-of-
sight (NLOS), these errors are a major factor in degrading system performance. This simulation
mainly examines this factor. According to Figure 2, the noisy audio arrival times RAT , RBT , RCT , and
RDT are obtained by formula (5).
                                               dAT
                                RAT = TA +          + 𝒆𝐀𝐓
                                                 𝒗
                                              dAB                   dBT
                               RBT = TA +          + 𝒆𝐁 + tdelayB +        + 𝒆𝐁𝐓
                                                𝒗                     𝒗                                    (5)
                                               dAC                  dCT
                                RCT = TA +         + 𝒆𝐂 + tdelayC +        + 𝒆𝐂𝐓
                                                𝒗                     𝒗
                                              dAD                    dDT
                             { RDT = TA + 𝒗 + 𝒆𝐃 + tdelayD + 𝒗 + 𝒆𝐃𝐓
    where dAT , dBT , dCT , and dDT are the distances between the base stations and the target, and v is
the speed of sound and is set to 340 m/s.
    In this simulation, formula (5) is substituted into formula (4) to calculate the noisy distance
differences, which are then substituted into the TDoA-based combined weighted (COM-W)
positioning algorithm[18] to obtain the estimated position. By comparing the localization error (LE)
between the true position Pr (xr ,yr ) and the estimated position Pe (xe ,ye ), the influence of the detection
accuracy of target on the system positioning accuracy is evaluated. LE is the Euclidean distance,
which is calculated using (6).
                                                                    2
                                           LE=√(xe -xr )2 +(ye -yr )                                      (6)
   To avoid the influence of a single abnormal noise, it is usually necessary to add multiple noises to
measurement at a specific test point. Each time the noise is added, the COM-W algorithm estimates a
position and can use (6) to obtain its LE, and multiple noises can be added to obtain multiple LEs.
The mean positioning error (MPE) is calculated using (7).
                                                     ∑Li=1 LE2i
                                            MPE=   √                                               (7)
                                                         L
where L represents the total times of adding noise, and LE𝑖 is the LE of the position estimated by the
positioning algorithm under the i-th noise addition. In this simulation, L is set to 1000.


Figure 3: Base stations and test points
   As shown in Fig 3, four base stations enclose a positioning area, and their coordinates are A(0 m, 0
m), B(10 m, 10 m), C(10 m, 10 m), and D(0, 10 m), respectively. Three representative test points
were selected, namely TP1 (1.0 m, 1.0 m) near a base station, TP2 (1.0 m, 5.0 m) near the edge of the
positioning area, and TP3 (5.1 m, 5.1 m) near the central area. In formula (5), it is assumed that 𝒆𝐁 , 𝒆𝐂,
and 𝒆𝐃 conform to Gaussian white noise with a mean of zero and a standard deviation of 0.1
millisecond (ms), and 𝒆𝐀𝐓 , 𝒆𝐁𝐓 , 𝒆𝐂𝐓 , and 𝒆𝐃𝐓 conform to Gaussian white noise with a mean of zero
and a standard deviation of σ (ms).


Figure 4: MPE versus standard deviation of target detection noise

   Figure 4 shows the relationship between the MPE at the 3 test points and the standard deviation of
the detection noise of the target. It can be seen that as the detection noise of the target increases, the
MPE also increases, i.e., the accuracy of the position estimation of the target decreases. At low noise
levels, the MPEs at the three test points are relatively close, but at high noise levels, the MPEs at the
three test points are quite different, which is caused by the difference in geometric conditions[18].
From Figure 4, it can also be found that under the current simulation conditions, if the positioning
accuracy of the self-synchronized acoustic positioning system is to be better than 1 m, the detection
error of the target should be less than 2.5 ms.

4. Conclusion
    This paper proposes a self-synchronizing acoustic positioning system. During a positioning period,
a master base station transmits an audio signal for synchronization and positioning, multiple slave
base stations receive the audio signal and then send positioning audio signals after delay setting times.
Finally, the target receives the audio signals transmitted by the master base station and all slave base
stations, detects their arrival times, and estimates the positions. The proposed system does not need to
be synchronized by wire or radio, which reduces the cost of system layout and avoids radio
interference. The simulation results verify that the proposed system can achieve positioning better
than 1 m, if the detection error of the target is less than 2.5 ms.

5. References
[1] C. Wu, Z. Yang, Z. Zhou, Y. Liu, and M. Liu, “Mitigating Large Errors in WiFi-Based Indoor
    Localization for Smartphones,” IEEE Trans. Veh. Technol., vol. 66, no. 7, pp. 6246-6257, Jul.
    2017.
[2] J. Luo, Z. Zhang, C. Wang, C. Liu, and D. Xiao, “Indoor Multifloor Localization Method Based
     on WiFi Fingerprints and LDA,” IEEE Trans. Ind. Infomat., vol. 15, no. 9, pp. 5225-5234, Sept.
     2019.
[3] X. Qiu, B. Wang, J. Wang, and Y. Shen, “AOA-Based BLE Localization with Carrier Frequency
     Offset Mitigation,” in Proc. IEEE Int. Conf. Commun. Workshops, (ICC Workshops), 2020, pp.
     1-5.
[4] C. Gentner, M. Ulmschneider, I. Kuehner, and A. Dammann, “WiFi-RTT Indoor Positioning,” in
     Proc. IEEE/ION Position, Locat. Navig. Symp., (PLANS), 2020, pp. 1029-1035.
[5] O. Hashem, M. Youssef and K. A. Harras, “WiNar: RTT-based Sub-meter Indoor Localization
     using Commercial Devices,” in Proc. IEEE Int. Conf. Pervasive Comput. Commun., (PerCom),
     2020, pp. 1-10.
[6] H. Cao, Y. Wang, and J. Bi, “Smartphones: 3D Indoor Localization Using Wi-Fi RTT,” IEEE
     Commun. Lett., vol. 25, no. 4, pp. 1201-1205, Apr. 2021.
[7] X. Wang, L. Gao, S. Mao, and S. Pandey, “CSI-Based Fingerprinting for Indoor Localization: A
     Deep Learning Approach,” IEEE Trans. Veh. Technol., vol. 66, no. 1, pp. 763-776, Jan. 2017.
[8] F. Mazhar, M. G. Khan, and B. Sallberg, “Precise Indoor Positioning Using UWB: A Review of
     Methods, Algorithms and Implementations,” Wireless Pers. Commun., vol. 97, no. 3, pp. 4467-
     4491, Dec. 2017.
[9] L. Ciabattoni, G. Foresi, A. Monteriu, L. Pepa, D. P. Pagnotta, L. Spalazzi, and F. Verdini, “Real
     time indoor localization integrating a model based pedestrian dead reckoning on smartphone and
     BLE beacons,” J. Ambient Intell. Humanized Comput., vol. 10, no. 1, pp. 1-12, Jan. 2019.
[10] S. -C. Yeh, W. -H. Hsu, W. -Y. Lin, and Y. -F. Wu, “Study on an Indoor Positioning System
     Using Earth’s Magnetic Field,” IEEE Trans. Instrum. Meas., vol. 69, no. 3, pp. 865-872, Mar.
     2020.
[11] M. Zhao, M. Yan, and T. Li, “Vision-Based Positioning: Related Technologies, Applications,
     and Research Challenges,” in Proc. IEEE 9th Int. Conf. Software Eng. Serv. Sci., (ICSESS),
     2018, pp. 531-535.
[12] M. N. Liu, L. S. Cheng, K. Qian, J. L. Wang, J. Wang, and Y. H. Liu, “Indoor acoustic
     localization: a survey,” Hum.-Centric Comput. Inf. Sci., vol. 10, no. 1, Jan 6, 2020.
[13] S. I. Lopes, J. M. N. Vieira, J. Reis, D. Albuquerque, and N. B. Carvalho, “Accurate smartphone
     indoor positioning using a WSN infrastructure and non-invasive audio for TDoA estimation,”
     Pervas. Mobile Comput., vol. 20, pp. 29-46, Jul. 2015.
[14] J. Urena, A. Hernandez, J. J. Garcia, J. M. Villadangos, M. C. Perez, D. Gualda, F. J. Alvarez,
     and T. Aguilera, “Acoustic Local Positioning with Encoded Emission Beacons,” Proc. IEEE, vol.
     106, no. 6, pp. 1042-1062, Jun. 2018.
[15] P. Pajuelo, M. C. Perez, J. M. Villadangos, E. Garcia, D. Gualda, J. Urena, and A. Hernandez,
     “Implementation of indoor positioning algorithms using Android smartphones,” in Proc. IEEE
     20th Conf. Emerging Technol. Factory Autom., (ETFA), 2015, pp. 1-4.
[16] L. Zhang, M. L. Chen, X. H. Wang, and Z. Wang, “TOA Estimation of Chirp Signal in Dense
     Multipath Environment for Low-Cost Acoustic Ranging,” IEEE Trans. Instrum. Meas., vol. 68,
     no. 2, pp. 355-367, Feb. 2019.
[17] S. Cao, X. Chen, X. Zhang, and X. Chen, “Effective Audio Signal Arrival Time Detection
     Algorithm for Realization of Robust Acoustic Indoor Positioning,” IEEE Trans. Instrum. Meas.,
     vol. 69, no. 10, pp. 7341-7352, Oct. 2020.
[18] S. Cao, X. Chen, X. Zhang, and X. Chen, “Combined Weighted Method for TDOA-Based
     Localization,” IEEE Trans. Instrum. Meas., vol. 69, no. 5, pp. 1962-1971, May. 2020.