<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Navigation inside a closed area with no GNSS-signal accessibility is a highly challenging task. In order to tackle this problem, the Wi-Fi beacon-based methods have grabbed the attention of many researchers. However, it suffers from lack of accuracy and robustness against environmental changes and dynamic object movements. With the advancement of inference speed in real-time deep learning applications, in this paper, a multi-modal end- to-end system for large-scale indoor positioning has been proposed, namely Wi-Fi, CCTV and PDR integrated pedestrian positioning system, which increases positioning accuracy by overcoming the difficulties of environment changes. Firstly, a user request positioning and sends the detect access points (APs) and associated signal strength to a server. A Wi- Fi one-shot model is used to estimate an initial position for the user, then the nearby CCTV cameras are activated to detect and position pedestrians. A CCTV and PDR integrated particle filter is used to determine a valid correspondence to the user, simultaneously collecting Wi-Fi fingerprints from the user to passively update the Wi-Fi one-shot support set in real-time. By implementing the proposed system, we could achieve a highly accurate integrated indoor positioning with precision level of 0.3m.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Max J. L. Lee</string-name>
          <email>maxjl.lee@connect.polyu.hk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Meiling Su</string-name>
          <email>meiling.su@connect.polyu.hk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Li-Ta Hsu</string-name>
          <email>lt.hsu@polyu.edu.hk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kowloon</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hong Kong</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>China</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Aeronautical and Aviation Engineering, The Hong Kong Polytechnic University</institution>
          ,
          <addr-line>Hung</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Indoor Positioning</institution>
          ,
          <addr-line>Visual Positioning, Particle Filter, Wi-Fi Positioning</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>collecting Wi-Fi fingerprints from the user to passively update the Wi-Fi one-shot support set in real-time. By implementing the proposed system, we could achieve a highly accurate integrated indoor positioning with precision level of 0.3m.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Indoor positioning systems have attracted a great interest from the researchers over the past decade.</title>
      </sec>
      <sec id="sec-1-2">
        <title>These systems can provide positioning, navigation, tracking services where global navigation satellite</title>
        <p>
          systems (GNSSs) could not reach [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. To overcome the limitation of GNSS in indoor environments,
various indoor positioning systems have been developed using Wi-Fi, Bluetooth (BLE), ultra-wideband
(UWB), and radio-frequency identification (RFID). Typically, beacon nodes are a prerequisite to
localize in the indoor environment [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Out of the above mentioned radio-frequency based systems,
WiFi is the most popular one due to its scalability [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The Wi-Fi localization is usually performed by four
main techniques: triangulation, received signal strength (RSS), scene analysis (fingerprinting) and
proximity based. However, the radio ranging measurements contain noise on the order of several meters
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. This noise occurs because radio propagation tends to be highly non-uniform. Physical obstacles
such as walls, furniture, etc. reflect and absorb radio waves [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. As a result, the distance estimation is
deteriorated by the diffractions and reflections. The fingerprinting method attempts to overcome this
problem by making use of diffractions and reflections as additional features for positioning, however,
it suffers when there are changes in the environment, as the features also change. Hence, an alternative
positioning method is suggested in this paper to overcome the difficulties of environment changes.
        </p>
        <p>
          2020 Copyright for this paper by its authors.
In the recent years, CCTV cameras have matured to a stage where it is playing an important role in
monitoring and security [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. These CCTV cameras can capture real-time information of the
environment, which provides a reliable basis for setting up a digital twin (DT) because they can
integrate information ranging from geometric changes in the building layout to the occupancy and use
of rooms and spaces. DT for buildings can be seen as an extension to capture real-world data and feed
it back into a 3D model, thus neatly closing the information loop [
          <xref ref-type="bibr" rid="ref7 ref8 ref9">7-9</xref>
          ]. In addition, CCTV cameras also
capture the position of people and assets, hence, a position of the objects relative to the camera (and
therefore the global position) can be obtained. If the correspondence to the user is found, it can directly
transmit the person’s position to the person’s smartphone. This position can also be used to collect
realtime data for Wi-Fi fingerprinting. Therefore, the main objective of the study presented in this paper is
to develop a Wi-Fi, CCTV and PDR integrated pedestrian positioning system that enables navigation
of the indoor environment using in-built Wi-Fi access points (APs) and CCTV cameras.
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>The proposed Wi-Fi, CCTV and PDR integrated pedestrian positioning system attempts to make full use of the existing infrastructure of the indoor environment for positioning. The proposed method offers several major advantages over the existing Wi-Fi standalone positioning methods.</title>
        <p>•
•
•</p>
      </sec>
      <sec id="sec-1-4">
        <title>Firstly, the integration of CCTV and PDR can provide 0.3m positioning accuracy, enough for pedestrian positioning purposes.</title>
      </sec>
      <sec id="sec-1-5">
        <title>Secondly, the positioning can be used to update the Wi-Fi one-shot support set, eliminating the need for continuous manual Wi-Fi fingerprinting collection.</title>
      </sec>
      <sec id="sec-1-6">
        <title>Thirdly, as the CCTV is integrated for positioning, it is less susceptible to environment changes and dynamic object movements, overcoming the poor positioning due to the change in Wi-Fi features.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. The Proposed Indoor Positioning System</title>
      <sec id="sec-2-1">
        <title>The flowchart of the proposed Wi-Fi and CCTV integrated pedestrian positioning system is shown in</title>
        <p>Fig. 1. The algorithm can be divided into the offline and online processes. The offline stage includes
estimating the CCTV cameras’ intrinsic parameter by a checkboard, and extrinsic parameter using
perspective and point from the known pose of visible landmarks in the indoor environment and the BIM
(Building Information Modelling model) in Sec. 4. In the online stage, the smartphone requests a
position by performing a Wi-Fi scan, then sending the detected APs and signal strength to the server.
The server calculates an initial position based on a trained one-shot learning model (Sec. 3). The CCTV
cameras near the initial position is activated to detect people (Sec. 5). The query pixel position of each
detected person classified as “holding a phone” are corrected (Sec. 4), then their 3D positions are
calculated based on the inverse perspective projection (Sec. 5). Assuming there are multiple detected
persons (candidate positions), the problem is reformulated to estimate the correct correspondence from
a list of candidates. We used a particle filter (PF) to estimate the correct correspondence based on the
smartphone heading and velocity (Sec. 6). The PF positioning solution will then be used to estimate the
state, and if given a small state covariance, the position will be used to update the Wi-Fi few-shot
support set (Sec. 6).</p>
      </sec>
      <sec id="sec-2-2">
        <title>Several assumptions were made in the research:</title>
        <p>
          • Firstly, the Wi-Fi one-shot model was pre-trained in another environment as describe in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The
research only updates the support set.
• Secondly, the research requires initial Wi-Fi fingerprints for the Wi-Fi on-shot model support set.
        </p>
        <p>However, it can eliminate the need for continuous Wi-Fi fingerprinting collection.
• Thirdly, the world coordinate system was assumed to be the BIM 3D cartesian coordinate system
for ease of calculation. The calculated cartesian positions can be converted to WGS84.
• Fourthly, the BIM/3D model used to estimate the extrinsic parameter of the CCTV was assumed to
be accurate and up to date.
• Fifthly, the query pixel position correction assumes the query image shares the approximate same
scene with the reference image at different viewpoints.
• Sixthly, the pedestrian that request positioning are holding a smartphone for the deep learning
model to classify pedestrians “holding a smartphone”.
• Seventhly, the calculated position of pedestrian is assumed to be the bottom centre pixel of the
detected bounding box.
• Eighthly, the smartphone heading estimation is assumed to provide within ±90° uncertainty.</p>
      </sec>
      <sec id="sec-2-3">
        <title>The paper is organized as follow: Section 3 introduces the Wi-Fi one-shot positioning training and</title>
        <p>inference. Section 4 introduces the BIM and CCTV cameras’ intrinsic and extrinsic parameters
estimation. Section 5 presents the person detection and position estimation. Section 6 presents the PF.</p>
      </sec>
      <sec id="sec-2-4">
        <title>Section 7 tests the proposed method with data obtained in a laboratory. Finally, conclusions and future perspectives are presented in Sections 8 and 9, respectively.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Wi-Fi One-shot Positioning Training and Inference</title>
      <p>
        A key challenge to machine learning-based fingerprinting approaches for wireless indoor localization
is data labeling [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This includes the need to collect timely labeled data in one environment as the
environment dynamics may change over time (e.g., for real-time localization), and the need to collect a
new set of labeled data for localization in each new environment (e.g., for multi-environment
localization). This time-sensitive and environment-dependent nature of wireless data incurs significant
data collection and maintenance cost for supervised learning which requires a large amount of labeled
data. The recently introduced few-shot transfer learning overcomes this problem by reformulating
classification as a similarity [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. More specifically, a graph neural network (GNN) model can be
trained to measure the similarity of a query data to each of the  classes in the support set. Inspired by
the recent advance in few-shot learning, the proposed Wi-Fi, CCTV and PDR integrated pedestrian
positioning system attempts to eliminate the need for data collection by passively collecting Wi-Fi
fingerprints from users, while obtaining the approximate ground truth location estimated from the PF
(detailed in Section 6). The estimated position and uncertainty using Wi-Fi signals are detailed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
and described as Eq. (1) in this paper.
      </p>
      <p>Where</p>
      <p>is the estimated Wi-Fi 2D state ( ,  ) in the BIM 3D cartesian coordinate system, and
is the estimated uncertainty.  refers to the query data, a list of detected access points and signal
strength from an unknown location.  refers to the support set, where each class corresponds to one
known location, and each class has a list of detected access points and signal strength at that known
location. 
ℎ</p>
      <p>is the function that accepts a query data and calculates the similarity to each class in
the support set, the class with the highest similarity is   
.</p>
      <sec id="sec-3-1">
        <title>Once an accurate position is estimated from the PF (Sec. 6), the collected Wi-Fi fingerprints will be</title>
        <p>used to update the support set ( ) of the Wi-Fi one-shot model.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. CCTV Intrinsic and Extrinsic Estimation</title>
      <p>
        To calculate the position of pedestrians based on the cameras, the intrinsic and extrinsic parameters of
the cameras must be known. To estimate the intrinsic parameter ( ), a checkerboard can be used to
calibrate each CCTV camera. To estimate the extrinsic parameter ( ), we made use of visible landmarks
in the indoor environment to infer the pose of the camera. A BIM model stores information of objects
in the indoor environment, including the objects’ class, shape, appearance, size and pose in the world
coordinate frame [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Therefore, we can use perspective and point from the BIM to estimate the pose
of the CCTV camera. Once this is complete, we can obtain an intrinsic and extrinsic associated reference
image for each CCTV camera as described in Eq. 2.
      </p>
      <p>c   = (w c )
−1
∙ w</p>
      <p>i   = c  ∙ c  
(2)
 and
(3)
It is assumed that each camera’s intrinsic and extrinsic parameter may change overtime. The former
can be caused by optical zoom, whereas the latter from camera tilt and panning. We denote c as the
Where w and c is the world and camera coordinate frame where each coordinate is expressed as a 3D
cartesian vector c  , w ∈ ( ,  ,  ). i is the image coordinate frame, where each coordinate is expressed
of camera  in the world coordinate frame. We can therefore take a point  in the world frame w
a pixel vector  ∈ ( ,  ). c  is the 3x3 intrinsic matrix of camera  . w c is the 4x4 extrinsic matrix
express it as a pixel vector i   in the image frame (i ) for camera  . The expanded equation is
expressed in Eq. (3).</p>
      <p>
        c  
c  
c  
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
=
w  11
| 21
 31
0
(
c   [i   ] =
reference camera, and the query (changed) camera as c ′. The query camera’s intrinsic and extrinsic
parameters are difficult to estimate simultaneously from the reference camera due to many unknowns,
leading to local minima [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. However, in this paper, the position of the pedestrian is assumed and
calculated based on the 3D position of a pixel, as described in Sec. 5.
      </p>
      <sec id="sec-4-1">
        <title>Therefore, rather than estimating the query camera parameters to calculate the 3D position of a pixel,</title>
        <p>
          we estimate the pixel-to-pixel correspondence between the new and reference image. Since the 3D
position is known for each pixel in the reference image, once a pixel is detected in the query image, it
can correspond to a pixel in the reference image to estimate its 3D position. State-of-the-art dense
matching was used to estimate the pixel-to-pixel correspondence as described in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and shown in Fig.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>2. We assume the Percentage of Correct Key-points (PCK) is above 70% as evaluated in [14].</title>
        <p>
          The detailed equation can be found in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and described as Eq. (4) in this paper.
i  i ′ =
        </p>
        <sec id="sec-4-2-1">
          <title>Where i  i ′ is the correspondence matrix to transform pixel coordinate in new image coordinate frame</title>
          <p>i ′ to reference image coordinate frame i .</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Person Detection and Position Estimation</title>
      <sec id="sec-5-1">
        <title>In the online process, pedestrians are first detected using a primary PeopleNet network [15, 16]. To</title>
        <p>
          reduce the number of candidate positions (hypothesizes), we also assumed pedestrians that request
positioning are holding a smartphone. Therefore, we employed a secondary classification model, that
classifies a person either “holding a phone” or “not holding a phone” after being detected. Given a
person is detected and is “holding a phone”, we calculate the person’s position using the center bottom
pixel of the bounding box. It is assumed that the detection accuracy is 80% or above as stated in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>The 3D position of the pixel is the intersection between the ray emanating from the normalized pixel
and the 3D ground plane. We define a plane in Eq. (5).
+ 
+  = 0
(5)
c  
c  
c  
[c   ]
c  ̂
c   ̂
= (((w c ) ) )
[c  ̂ ] = (c  )−1 ∙ [i   ]
w
w
 




∙ w</p>
        <p>[w  ]</p>
      </sec>
      <sec id="sec-5-2">
        <title>We formulated a recursive state estimation problem by using a PF. The goal is to estimate a 2D state</title>
        <p>vector  . More specifically, the goal is to track the hidden state sequence {  } of a dynamical system,
where  is a discrete time step. The process model that encodes prior knowledge on how the state  is
expected to evolve over time can be written as Eq. (11).</p>
        <p>The camera coordinate of the normalized pixel is c  ̂ ∈ (c  ̂ , c  ̂ , c   ̂). The camera coordinate of the
intersection between the normalized pixel and ground plane is calculated in Eq. (8).
c


 =  ∙    ̂ ;  = c  
c  ̂ + c</p>
        <p>c  ̂ + c  
w
  , = w
 c</p>
        <p>c
∙</p>
        <p>, ∈ (  , ,   , )</p>
        <p>∈ (  ,1,   ,2, …   )</p>
      </sec>
      <sec id="sec-5-3">
        <title>Then the camera coordinate of the intersection can be expressed in the world coordinate as Eq. (9).</title>
        <p>The calculated positions are then filtered based on the Wi-Fi initial guess search area   
the remainder, as the positions are located on the ground plane, they will be expressed as a sequence of
measurements in 2D Cartesian world coordinate frame as   , for ease of notation in Eq. (10).
all cameras.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Particle Filter</title>
      <sec id="sec-6-1">
        <title>Where  is the index of the camera.  is the index of the filtered positions from  that are within the</title>
        <p>Wi-Fi search area, and   are the measurement inputs of camera  .  1… is the measurement inputs of</p>
      </sec>
      <sec id="sec-6-2">
        <title>Since the world coordinate ground plane is known from the BIM model, we can transform it into the camera coordinate frame in Eq. 6.</title>
        <p>Where  denotes the ground plane. The candidate pixel is normalized by multiplying the inverse
intrinsic parameters. We can express the normalized pixel in the camera coordinate frame in Eq. (7).
  =   −1 +   −1 ∙ cos(  −1) ∙ 
  =   −1 +   −1 ∙ sin(  −1) ∙ 
+  
+</p>
        <p>∈ ( ,  )</p>
      </sec>
      <sec id="sec-6-3">
        <title>Where  is the pedometer velocity and  is the magnetometer heading from the client smartphone [17].</title>
        <p>The cumulative uncertainty is assumed to be within ±0.2m every second. The measurements  1... , is
defined in Sec. 5. The particle filter is described in Alg. 1.</p>
      </sec>
      <sec id="sec-6-4">
        <title>Algorithm 1 Particle Filter</title>
        <p>Input: Prior particles set {  −1 =&lt;   , −1,   , −1 &gt;,   ,  1... , }
Output:  
1.   = ∅,  = 0
2. For  = 1 … 
8. For  = 1 … 
  , =  ( 1... , |  , )
 =  +   ,
  =   ∪ {&lt;   , ,   , &gt;}
  , =
  ,</p>
        <p />
        <p>Sample particles  ( ) from the discrete distribution given by   −1
4. Sample   , from  (  |  ( ), −1,   )
Where  is a set of prior particles (  , −1 ) with a corresponding weight (  , −1 ), input (  ) and
measurements ( 1... , ).  is the normalization constant. Then, hidden state   is calculated based on the
particle with the highest weight (  , ). To calculate the weight (  , ) of each particle, we need to
understand the observation uncertainty (  , , ) using the known height of the camera ( c ′  ) and
measured 3D distance from the camera to the detected person  (c ′  ,   , , ) . A Monte-Carlo
simulation was used to sample the correlation.</p>
      </sec>
      <sec id="sec-6-5">
        <title>A lowess (linear) smoothing regression was used to fit the data samples. It can be seen that when the height of camera is low and measured distance increases, the positioning error also increases. This is likely due to the pixels representing more distance when the height of camera is low. The fit has the following properties:</title>
        <p>, , = [  , , ,   , , ];   , , = [</p>
      </sec>
      <sec id="sec-6-6">
        <title>The fit was then used to estimate the positioning error in real-time for each detected person in Eq. 12.</title>
        <p>, , =  (c   ,  (ck  ,   , , )) (12)
We assumed a multivariable gaussian distribution of the positioning errors, where the covariance matrix
variance is equal to   , , to generate a likelihood for a given position in Eq. 13.</p>
        <p>0 (13)
  , ,
0
  , ,
]
ℓ , , , =</p>
        <p>1
√|  , , |(2 )2</p>
        <p>1
Where   , , is the mean and   , , is the covariance matrix of the multivariable gaussian distribution.
ℓ , , , is the calculated likelihood of a given position  from camera ( ) and person ( ). All distribution
were compared such that the highest likelihood at each position was used to combine into a global
distribution, then normalized such that the total area under the distribution is equal to 1. It is important
to note that there is no minimum distance required between two pedestrians as each pedestrian is
represented as a distribution. The final weighting (  , ) in Alg. 1 of a given position can be calculated
based on Alg. 2.</p>
        <p>Algorithm 2 Calculation of   ,
Input:  1... , ,   ,
Output:   ,
1. Calculate ℓ ,1… ,1… , for every position
2. Find the maximum ℓ ,1… ,1… , for every position and create new distribution such that ℓ ,
every position
3. Normalize distribution such that ℓ̂ , for every position
4.   , = ℓ̂  , for specific position index ( )
for</p>
      </sec>
      <sec id="sec-6-7">
        <title>The weighting calculation is visualized in Fig. 5. Where the likelihood distribution of each person is combined into a normalized global distribution. Figure 5: Example of A Likelihood Function Heatmap of 3 Pedestrians Captured by Two CCTV Cameras.</title>
        <p>The particle with the highest weighting as described in algorithm 1 will be the PF estimated state   .</p>
      </sec>
      <sec id="sec-6-8">
        <title>Given that the particle estimated state covariance is less than 1m, it will be used to update the support set described in Sec. 3.</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Experimental Results</title>
      <sec id="sec-7-1">
        <title>In this study, the experimental trajectory was conducted in a laboratory as shown in Fig. 8. The</title>
        <p>
          laboratory is a small indoor environment which contains numerous dynamic objects. Three commercial
CCTV cameras are placed across the indoor environment to test the feasibility of the proposed
positioning system. The ground truth positions of the trajectory were recorded, and the positioning
quality of the proposed method was analyzed with other positioning methods, including:
1. Ground Truth, provided by markers labelled on the floor and timestamp from the CCTV camera.
2. Pedestrian Dead Reckoning (PDR), provided by Samsung Galaxy Note 20 smartphone and PDR
app in [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
      </sec>
      <sec id="sec-7-2">
        <title>3. CCTV, provided by commercial Tapo smart cameras [18].</title>
      </sec>
      <sec id="sec-7-3">
        <title>4. Proposed Particle Filter (CCTV &amp; PDR).</title>
      </sec>
      <sec id="sec-7-4">
        <title>Methods 5, 6 are obtained through a succession of fixed points, whereas the other methods integrated with PDR are obtained during a movement.</title>
      </sec>
      <sec id="sec-7-5">
        <title>To compare with traditional Wi-Fi ML positioning, Wi-Fi data were collected one month in advance at</title>
        <p>each location with 2-meter separation as shown in Fig. 6. 200 data were collected at each location to
train an Extended Naive Bayes positioning model to classify new Wi-Fi fingerprints to a location. To
compare with the Wi-Fi One-Shot positioning, two Samsung Galaxy smartphones were used to collect</p>
      </sec>
      <sec id="sec-7-6">
        <title>Wi-Fi data simultaneously during the experiment, where one smartphone collects data for the support set input, and the other phone will act as query input for the one-shot model. Figure 6: Positions That Collected Wi-Fi Signals to Train A Wi-Fi Extended Naïve Bayes Model.</title>
      </sec>
      <sec id="sec-7-7">
        <title>In addition, to mimic a real indoor environment with multiple pedestrians, in this study we have a</title>
        <p>pedestrian walk around the client to create a false-positive dynamic likelihood to the CCTV
measurements as shown in Fig. 7. A false-positive static likelihood at a single location was also added
post-experiment to the CCTV measurement. The former represents a pedestrian walking, and the latter
represents a pedestrian standing, commonly seen in all indoor environments. This was performed to test
whether the correct correspondence to the client can be found using the proposed method.</p>
        <p>The positioning results are plotted onto a bird’s-eye view of the laboratory in Fig. 8. There are two
performance metrics used: mean and standard deviation (SD) of the 2D positioning error.</p>
        <p>The PDR requires an initial position and was set to the ground truth initial position. The results show
that the PDR (yellow marker) begins with great positioning accuracy but starts to accumulate drift error
as it progresses as shown after time epoch 20. This has led to a significant cumulative positioning mean
error of 2.4m. Therefore, it needs to be integrated with an absolute positioning measurement to correct
its cumulative errors. The Wi-Fi Extended Naïve Bayes point positioning (cyan marker) is on average</p>
      </sec>
      <sec id="sec-7-8">
        <title>3.81m to the ground truth (black marker) as analyzed in Figs. 8, 9 and Table II. This is mainly due to</title>
        <p>the change in the physical environment from dynamic objects, which are common in many indoor
environments, thus creating NLOS and multipath signals different from the signals captured previously
at the location. The integration of the Wi-Fi Extended Naïve Bayes and PDR (blue marker) is relatively
accurate from time 0 to 14 seconds was perhaps due to the more unique Wi-Fi signatures then compared
to the open space counterpart at time 20 seconds and after, leading to 3.64m positioning inaccuracy.</p>
      </sec>
      <sec id="sec-7-9">
        <title>The collected Wi-Fi signals from the CCTV &amp; PDR PF, were then used as the support set of the Wi-Fi</title>
      </sec>
      <sec id="sec-7-10">
        <title>One-Shot positioning, improving the accuracy to 1.89m mean error and 1.56 standard deviation. It was then integrated with PDR to provide 1.98m positioning accuracy with a smaller 1.11m standard deviation. Figure 9: Positioning Error of The Proposed Positioning System and Other Positioning Systems.</title>
      </sec>
      <sec id="sec-7-11">
        <title>The CCTV positioning solution is shown in the green marker, where a correct correspondence is</title>
        <p>required for positioning. The proposed method (red marker) integrates the PDR with CCTV positioning
solution via PF. Consequently, the PDR can estimate the correspondence for the CCTV, whereas the</p>
      </sec>
      <sec id="sec-7-12">
        <title>CCTV can correct the positioning error of the PDR. The results of the proposed method provide an accurate and continuous positioning of 0.29m mean error. The collected Wi-Fi fingerprints were then used as the support set of the Wi-Fi One-Shot positioning, improving the accuracy to 1.89m.</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusions</title>
      <sec id="sec-8-1">
        <title>This paper proposes a Wi-Fi and CCTV integrated indoor positioning solution for a 2D position estimation. In short, pedestrians were detected, and a particle filter was used to estimate the correct correspondence to the user. The estimated state can then be used to update a Wi-Fi one-shot model. The potential advantages of the proposed method are:</title>
      </sec>
      <sec id="sec-8-2">
        <title>The integration of Wi-Fi, CCTV and PDR improves positioning accuracy compared to Wi-Fi standalone positioning.</title>
      </sec>
      <sec id="sec-8-3">
        <title>The Wi-Fi one-shot positioning can improve upon the traditional Wi-Fi positioning due to the use of real-time Wi-Fi data.</title>
      </sec>
      <sec id="sec-8-4">
        <title>In this paper, a laboratory indoor environment was used to test the proposed method, however, in the real world there would be additional challenges that needs to be addressed including:</title>
        <p>•
•
•</p>
      </sec>
      <sec id="sec-8-5">
        <title>Up-to-date BIM model that reflects the building to calculate the accurate extrinsic parameters of the CCTV cameras.</title>
      </sec>
      <sec id="sec-8-6">
        <title>Areas with no Wi-Fi or CCTV coverage.</title>
      </sec>
      <sec id="sec-8-7">
        <title>True negatives detection of pedestrians, in other words failure to detect pedestrians.</title>
      </sec>
      <sec id="sec-8-8">
        <title>Considering the preliminary results presented in this paper, we believe the proposed method can provide accurate positioning and to support various indoor applications, that can be extended to digital twin.</title>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>9. Future Works</title>
      <sec id="sec-9-1">
        <title>We will conduct a real-time experiment in a large-scale mall in Hong Kong to validate the proposed method. Several potential future developments on the proposed method are suggested.</title>
        <p>•
•
•
•</p>
      </sec>
      <sec id="sec-9-2">
        <title>To make use of state-of-the-art visual trackers that can provide higher weighting to correspondences.</title>
      </sec>
      <sec id="sec-9-3">
        <title>To make use of 3D object detection algorithms that are more robust to positioning estimation compared to 2D object detection [20].</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hameed</surname>
          </string-name>
          and
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <article-title>"Survey on indoor positioning applications based on different technologies,"</article-title>
          <source>in 2018 12th International Conference on Mathematics, Actuarial Science, Computer Science and Statistics (MACS)</source>
          ,
          <fpage>24</fpage>
          -
          <lpage>25</lpage>
          Nov.
          <year>2018</year>
          2018, pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          , doi: 10.1109/MACS.
          <year>2018</year>
          .
          <volume>8628462</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Renaudin</surname>
          </string-name>
          et al.,
          <article-title>"Evaluating Indoor Positioning Systems in a Shopping Mall: The Lessons Learned From the IPIN 2018 Competition,"</article-title>
          <source>IEEE Access</source>
          , vol.
          <volume>7</volume>
          , pp.
          <fpage>148594</fpage>
          -
          <lpage>148628</lpage>
          ,
          <year>2019</year>
          , doi: 10.1109/ACCESS.
          <year>2019</year>
          .
          <volume>2944389</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Xue</surname>
          </string-name>
          et al.,
          <article-title>"Locate the Mobile Device by Enhancing the WiFi-Based Indoor Localization Model,"</article-title>
          <source>IEEE Internet of Things Journal</source>
          , vol.
          <volume>6</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>8792</fpage>
          -
          <lpage>8803</lpage>
          ,
          <year>2019</year>
          , doi: 10.1109/JIOT.
          <year>2019</year>
          .
          <volume>2923433</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , M. Liu, and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>"Research on Wi-Fi indoor positioning in a smart exhibition hall based on received signal strength indication,"</article-title>
          <source>EURASIP Journal on Wireless Communications and Networking</source>
          , vol.
          <year>2019</year>
          , no.
          <issue>1</issue>
          , p.
          <fpage>275</fpage>
          ,
          <year>2019</year>
          /12/17 2019, doi: 10.1186/s13638-019-1601-3.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E. R.</given-names>
            <surname>Magsino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. W. H.</given-names>
            <surname>Ho</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Situ</surname>
          </string-name>
          ,
          <article-title>"The effects of dynamic environment on channel frequency response-based indoor positioning,"</article-title>
          <source>in 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC)</source>
          ,
          <fpage>8</fpage>
          -
          <lpage>13</lpage>
          Oct.
          <year>2017</year>
          2017, pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          , doi: 10.1109/PIMRC.
          <year>2017</year>
          .
          <volume>8292442</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Acharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khoshelham</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <article-title>Real-time detection and tracking of pedestrians in CCTV images using a deep convolutional neural network</article-title>
          .
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Snider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nassehi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <article-title>"Characterising the Digital Twin: A systematic literature review,"</article-title>
          <source>CIRP Journal of Manufacturing Science and Technology</source>
          , vol.
          <volume>29</volume>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>52</lpage>
          ,
          <year>2020</year>
          /05/01/ 2020, doi: https://doi.org/10.1016/j.cirpj.
          <year>2020</year>
          .
          <volume>02</volume>
          .002.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>M. J. L. H. H. Y. H. L.-T. A. S. L. M. Lee</surname>
          </string-name>
          ,
          <article-title>"BIPS: Building Information Positioning System," presented at the</article-title>
          <source>In Proceedings of the 2021 International Conference on Indoor Positioning and Indoor Navigation (IPIN)</source>
          ,
          <year>Online</year>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M. J. L.</given-names>
            <surname>Lee</surname>
          </string-name>
          , L.-T. Hsu,
          <string-name>
            <given-names>H.-F.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>"Semantic-Based VPS for Smartphone Localization in Challenging Urban Environments,"</article-title>
          <year>2020</year>
          . [Online]. Available: http://arXiv.org/abs/ https://arxiv.org/ct?url=
          <source>https%3A%2F%2Fdx.doi.org%2F10.1109%2FTRO</source>
          .
          <year>2021</year>
          .
          <volume>3075644</volume>
          &amp;v=a8cc9
          <fpage>408</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.-J.</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. Y.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>"Few-Shot Transfer Learning for Device-Free Fingerprinting Indoor Localization,"</article-title>
          <year>2022</year>
          . [Online]. Available: http://arXiv.org/abs/.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Blundell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lillicrap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Wierstra</surname>
          </string-name>
          ,
          <article-title>"Matching Networks for One Shot Learning,"</article-title>
          <year>2016</year>
          . [Online]. Available: http://arXiv.org/abs/.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Alizadehsalehi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hadavi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>"From BIM to extended reality in AEC industry," Automation in Construction</article-title>
          , vol.
          <volume>116</volume>
          , p.
          <fpage>103254</fpage>
          ,
          <year>2020</year>
          /08/01/ 2020, doi: https://doi.org/10.1016/j.autcon.
          <year>2020</year>
          .
          <volume>103254</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <article-title>"Learning Camera Localization via Dense Scene Matching,"</article-title>
          <year>2021</year>
          . [Online]. Available: http://arXiv.org/abs/.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Truong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Danelljan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. V.</given-names>
            <surname>Gool</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Timofte</surname>
          </string-name>
          ,
          <article-title>"Learning Accurate Dense Correspondences</article-title>
          and When to Trust Them,"
          <year>2021</year>
          . [Online]. Available: http://arXiv.org/abs/.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>NVIDIA</surname>
          </string-name>
          .
          <article-title>"PeopleNet." https://catalog</article-title>
          .ngc.nvidia.com/orgs/nvidia/models/tlt_peoplenet (accessed
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bochkovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and H.
          <string-name>
            <surname>-Y. M. Liao</surname>
          </string-name>
          ,
          <article-title>"YOLOv4: Optimal Speed and Accuracy of Object Detection,"</article-title>
          <year>2020</year>
          . [Online]. Available: http://arXiv.org/abs/.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>N.</given-names>
            <surname>Patel</surname>
          </string-name>
          .
          <article-title>"Dead Reckoning, a location tracking app for Android smartphones." GitHub</article-title>
          . https://github.com/nisargnp/DeadReckoning (accessed
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Tapo</surname>
          </string-name>
          , "Pan/Tilt Home Security
          <string-name>
            <surname>Wi-Fi</surname>
            <given-names>Camera</given-names>
          </string-name>
          ,"
          <year>2022</year>
          . [Online]. Available: https://www.tapo.com/en/product/smart-camera/tapo-c210/.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <fpage>Find3</fpage>
          .
          <article-title>"Framework for Internal Navigation and Discovery." https://github</article-title>
          .com/schollz/find3 (accessed
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>