AW-FLoc: An Adaptive Weighted Fusion Localization
Algorithm Based on CSI and Multidirectional Image
Wen Liu1 , Mingjie Jia1 , Zhongliang Deng1 , Shude Li1 and Yingeng Zhang1
1
    School of Electronics Engineering, Beijing University of Posts and Telecommunications, Beijing 100089, China


                                         Abstract
                                         Localization algorithms play an important role in various location-based services (LBS), including produc-
                                         tion, public services, living, entertainment, and social networks. In order to improve the accuracy of the
                                         indoor positioning algorithm, this paper proposes an adaptive weight-based channel state information
                                         (CSI) and multi-azimuth image fusion positioning method to solve the limitation of single-sensor position-
                                         ing. In order to reduce the image dimension and weaken the interference of low-texture images on fusion
                                         features, an image feature fusion algorithm based on texture degree evaluation (TDE-IFF) is proposed. In
                                         TDE-IFF, the texture evaluation algorithm assigns weights to images of different orientations and fuses
                                         image features of opposite orientations based on the weights to obtain a vectorized representation of
                                         multi-orientation images. On this basis, this paper proposes an adaptive weight-based fusion location
                                         framework (AW-FLoc), which measures CSI features and multi-directional image features through a
                                         gating network to adaptively enhance the effective signal features of different fingerprint points. This
                                         improves the accuracy of the fusion positioning algorithm. Experiments show that the fusion positioning
                                         based on AW-FLoc achieves better positioning performance, and the average error in the comprehensive
                                         office scene is 0.61m.

                                         Keywords
                                         Indoor localization, adaptive weighted, fusion, localization


1. Introduction
Location algorithms play an important role in a variety of location-based services (LBS), includ-
ing production, public services, living, entertainment, and social networks. With the intelligent
development of the information industry, LBS has seen wider development in rescue and relief,
logistics tracking, and high-end manufacturing cite [1] [2] . Therefore, there is an urgent need
for highly reliable, high-precision indoor positioning technology to accomplish the task of
positioning in complex indoor environments [3].
   In terms of how to improve the accuracy of indoor positioning algorithms, it has been shown
that multi-source fusion is more advantageous than single-source positioning. The reason for
this is that multiple sources provide a richer set of signal characteristics, whereas single sources
inevitably suffer from feature loss, signal distortion, and signal-to-noise ratio degradation due
to interference, occlusion, and non-visual range caused by the complex indoor environment.
Therefore, single-source positioning algorithms can achieve a certain degree of positioning
IPIN 2022 WiP Proceedings, September 5 - 7, 2022, Beijing, China
$ liuwen@bupt.edu.cn (W. Liu); jiamingjie@bupt.edu.cn (M. Jia); dengzhl@bupt.edu.cn (Z. Deng);
lishude123654@126.com (S. Li); zyg99@126.com (Y. Zhang)
 0000-0002-6450-1969 (W. Liu); 0000-0002-6862-7905 (M. Jia)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
accuracy, but it is difficult to ensure the stability of the positioning. Multi-source localization
methods provide a more adequate signal and can further improve localization accuracy and
stability. Therefore, positioning methods based on multi-source signal fusion are gradually
becoming a key focus of research in positioning technology.
   As the frequency response signal of the channel, CSI can reflect the signal changes in the
indoor environment, including signal reflection, diffraction, and power changes, with higher
granularity, and compared to the received signal strength, CSI is an essential description of
the propagation process of wireless signals in space. Although CSI-based localization has been
publicly reported [4] to have worked with positioning accuracy up to the meter level, it still
suffers from insufficient stability and mismatching in complex indoor environments. Images, on
the other hand, are widely used in positioning and navigation due to their low acquisition cost.
Although image-based indoor positioning methods have the advantages of rich location features
and high stability, there are repetitive regions in the image signals collected from different
locations, resulting in low feature differentiation. In summary, in the field of indoor positioning,
both image and CSI signals contain rich location features and can form complementary roles.
Therefore, it is theoretically possible to implement a fusion algorithm for stable and highly
accurate positioning services using both signals.
   Existing fusion multi-source fusion localization methods usually use model integration or
feature fusion. The former lacks the interaction between different features and has the problem
of insufficient fusion depth. Although the latter increases the interaction between features,
the current feature fusion method is relatively simple, and it is difficult to take advantage of
the positioning advantages of different signal sources. Moreover, the fused features lack the
evaluation of different signals, and it is easy to introduce noise signals, resulting in insufficient
positioning accuracy of the algorithm.
   The most important contribution of this paper is to design an algorithm framework suitable
for multi-source fusion positioning and apply it to the fusion positioning of CSI and image
signals. The detailed innovation points are as follows:

   1. Given the large dimension of multi-directional image features and the introduction of
      extra noise in low-texture images. In this paper, a multi-directional image fusion method
      based on texture degree evaluation(TDE-IFF) is proposed. The method first scores the
      texture degree of the image and uses the score as the weight of the image features.
      Afterward, the method performs image feature fusion of opposite orientations based on
      texture score to enhance the spatial perception ability of the network on each coordinate
      axis.
   2. For the problem of lack of contribution evaluation of multi-source signals, this paper
      proposes a fusion localization model with adaptive weights (AW-FLoc). The model first
      analyzes the contribution of different signals to positioning and proposes an adaptive
      weighting strategy based on the gating mechanism to realize the complementary advan-
      tages of different signals, thereby generating more effective location features.
   3. In addition, this paper verifies the effectiveness of the above two methods through
      comparative experiments and ablation experiments.
2. Related works
In the data representation of CSI, we use a representation method based on a multi-head self-
attention mechanism and efficient CSI fusion (MHSA-EC) [5]. The method uses a multi-head
attention model to increase the aggregation ability of CSI non-adjacent features and combines
the extracted features with effective CSI. Effective CSI can add distance constraint information
to the CSI signal representation and increase the stability of the positioning algorithm. This
paper mainly focuses on the fusion localization of features, so the output of this method is
directly applied to the feature extraction of CSI.
   Indoor images contain rich location information, so the representation methods of images have
a significant impact on the fusion positioning accuracy. In terms of image feature representation,
Xiao A et al. [6] used the Faster R-CNN algorithm to identify the static target uploaded by
the mobile phone and established a collinear equation according to the control point on the
static target and the geographical position of the observer to solve the final position of the
photographer. [7]proposed to apply transfer learning to image indoor localization to generate
robust image features. The algorithm uses the image data collected by the robot as the training
set to train the VGG-16 network parameters, and after the model training is completed, the
penultimate layer features of the network are used for matching, and good results have been
achieved in the positioning task. At present, image representation methods based on deep
learning have been widely used in image localization tasks. However, indoor localization based
on multi-directional images still has the problem that high feature dimensions and low-quality
images will introduce noise. There is a lack of an efficient unified vector representation for
representing multi-directional images. In this paper, TDE-IFF algorithm is proposed, which
effectively reduces the interference of low-quality images on fusion features and facilitates
subsequent multi-sensor fusion.
   Compared with the decision-level fusion method, the feature-level fusion method has more
in-depth interaction between features, which is beneficial to the final model decision. This
method is widely used in the field of multi-source fusion localization. Localization methods
based on feature-level fusion often map the interrelated features of different signal sources
into space to form a richer feature representation for the final location decision. Chiou et
al. [8] proposed a method of building a graph network to aggregate multi-directional image
features to achieve mutual fusion of image features, and verified the improvement of localization
performance by this method through experiments; Jiao et al. [9] talked about HOG features and
PAC features in images. Perform fusion and verify that the fusion features help improve the
stability of the positioning system. The above methods are only for image feature information,
and the lack of other information to introduce leads to limited positioning accuracy. Huang et
al. [10] proposed a new fingerprint database construction method, which fused geomagnetic
signals and CSI features at the feature level to generate a new fingerprint database, and based
on this, proposed a multi-scale KNN matching algorithm for matching and localization. (MDS-
KNN), achieving a localization accuracy of 1.7m in the test. Although this method introduces
multi-source signals, the fusion method is relatively simple, lacks further analysis of different
signals, and the fusion features do not make tradeoffs and judgments for each signal, which
makes it difficult to improve the discrimination of fusion features.
   In addition, some scholars have restricted the fusion features by introducing gated signals in
the fields of time series signal processing, aiming to further improve the effectiveness of the
fusion features. The Gated Recurrent Unit (GRU) network [11] performs a weighted fusion
of long and short-term information by introducing a gating signal, thereby enhancing the
network’s feature extraction capability for time series. The MMOE algorithm [12] constrains
multiple expert networks through a gating mechanism, and the output features of each expert
network are fused according to weights to enhance the performance of the recommendation
algorithm. Click-through rate prediction selection enhanced deep network (Gate-Net) [13]
by introducing a gating mechanism to the weighted fusion of different feature embedding
representations, so as to better improve click-through rate prediction performance. Therefore,
the gating mechanism helps to improve the discrimination of fused features and enhance their
effectiveness of fused features. Based on the inspiration of the above research, this paper
applies the gating mechanism to the multi-source signal localization task and proposes a fusion
framework with adaptive weights. By evaluating the contribution of different signals, the
framework enhances effective signals and reduces the influence of invalid signals, thereby
increasing the positioning accuracy.


3. Fusion and localization based on AW-FLoc
In this paper, a multi-source signal fusion localization framework (AW-FLoc) is proposed to
achieve high-precision indoor localization. To ensure the fusion effect, the algorithm first
builds a separate feature representation method for the multi-directional image signal and
the CSI signal. The TDE-IFF algorithm is proposed to fuse multi-directional image features.
The AW-FLoc algorithm fuses and fine-tunes CSI features and image features. In addition, the
MHSA-EC network is used for feature extraction for the CSI signal framework. The frame is
shown in figure 1.


Multi-source signal     Feature extraction and
                                                                   Fusion & Positioning
                            aggregation


     Multidirectional
                              TDE-IFF              image feature
         image
                                                                        AW-FLoc           Localization

     CSI Amplitude           MHSA-EC                CSI feature


Figure 1: Framework of fusion indoor localization based on AW-FLoc.
3.1. TDE-IFF algorithm
The TDE-IFF algorithm is a fusion representation algorithm for multi-directional image features.
In this paper, images from four directions of east, west, north and south are selected as input,
and the fused image features are output by TDE-IFF algorithm.

3.1.1. Problem analysis
In order to illustrate the advantages of the TDE-IFF algorithm, we first analyze the problem of
multi-directional image feature fusion with two subordinate points.
   (1) Low-texture images introduce noise
   In image feature representation, low-texture images can introduce noise interference. As
shown in figure 2, in an indoor scene, two images in opposite directions are collected from a
reference point close to the wall. It can be seen that, due to the proximity to the wall, the image
collected in the southern position is only a picture of the wall, lacking texture and position
information. The northern position image has a larger viewing angle and is rich enough to
capture larger indoor scene texture and location information. If these two image features
are directly fused, it will not only be difficult to increase the information but will lead to the
introduction of noise into the image features with more texture information. Therefore, an
algorithm is needed to score the texture degree of the image, to reduce the interference of low
texture signals to the fusion features.

                 Indoor scene map
                                    N


                                                                           camera
                                                                          fingerprint point


                Multidirectional image
                                         images of the north   images of the south


Figure 2: Differences in texture between different images


   (2) The position on the coordinate axis is strongly correlated with the image features along
that orientation
   In indoor scenes, image features oriented along a certain axis (x or y axis) tend to exhibit
strong correlations with positions in that axis (x or y axis). The image changes of different
fingerprint points in the same direction are shown in figure 3:
                  Indoor scene map
                                           N


                              y-axis                   point 3               point 5
                                                                                                     camera
                                                                                                     fingerprint point


                                                                   x-axis
                 Multidirectional image
                                          North image of point 3            North image of point 5


Figure 3: Image variation of different fingerprint points along the x-axis


   It can be seen from figure 3 that when the camera is far away from the landmark, the surface
objects such as doors and cabinets become significantly smaller in the image. To further describe
the relationship between this image feature and physical location, this paper model the problem
in figure 4.


                                               image


Figure 4: The relationship between the distance of the object in the image and the actual position


   In figure 4, O is the focus of the camera; AB is the two endpoints of the actual object, and ab
is the position of the two endpoints in the image; G is the midpoint of AB, and g is the midpoint
of ab; OG is the distance between the camera and the midpoint. distance. So the following
equations can be constructed:
                                                 𝑜𝑔
                                         𝑂𝐺 =        * 𝐴𝐵
                                                 𝑎𝑏
For a given object AB, the distance OG between the camera and the object and the height ab
of the object projected in the image satisfy the above formula. For images taken along the
x-axis, when the object AB is at the center of the image, OG can represent the position of
the camera on the x-axis. Furthermore, it can be confirmed that the image features along the
x-axis are strongly correlated with the distance along the x-axis, and the same is true for the
south-oriented images along the x-axis. Since the image features in two opposite directions
along the x-axis show a strong correlation with the x-axis. Therefore, fusing the image features
of two opposite orientations (north-south orientation) on the x-axis can theoretically further
increase the positioning capability of the algorithm on the x-axis. The same is true for the
y-axis.

3.1.2. TDE-IFF algorithm
Based on the above analysis, an algorithm is needed that can aggregate image features of
opposite orientations on the same coordinate axis (x-axis or y-axis), and at the same time
can reduce the interference of low-texture image features on fusion features. For the multi-
directional image feature aggregation problem, the block diagram of the model built in this
project is shown in figure 5:


               Indoor scene map
                                    N


                              y-axis                                                      camera
                                                                                         fingerprint point


                                                           x-axis
             TDE-IFF algorithm
                                                 x-axis                        y-axis

                image orientation      North image   South image      west image east image


                   image input

                                                          Image Texture Evaluation


              image representation


                  fused image
                 representation

                                          x-axis fusion feature         y-axis fusion feature


Figure 5: TDE-IFF algorithm


   First, in the physical space, the x-axis and the y-axis are grouped image [𝑖𝑚𝑎𝑔𝑒𝑥 , 𝑖𝑚𝑎𝑔𝑒𝑥, ][𝑖𝑚𝑎𝑔𝑒𝑦 , 𝑖𝑚𝑎𝑔𝑒𝑦, ],
𝑥 and 𝑥, are two opposite directions on the x-coordinate axis, 𝑦 and 𝑦 , are the same. The images
of opposite orientations are put into a group, and then the image features [𝐹𝑥 , 𝐹𝑥, ][𝐹𝑦 , 𝐹𝑦, ] are
obtained through the image representation algorithm. Image features in opposite orientations
are progressively aggregated through TDE-IFF:

                              𝐹 𝑢𝑠𝑖𝑜𝑛𝐹 𝑒𝑎𝑡𝑥 = 𝜆𝐹 𝑥 + (1 − 𝜆)𝐹𝑥,

                              𝐹 𝑢𝑠𝑖𝑜𝑛𝐹 𝑒𝑎𝑡𝑦 = 𝛽𝐹 𝑦 + (1 − 𝛽) 𝐹𝑦,
𝐹 𝑢𝑠𝑖𝑜𝑛𝐹 𝑒𝑎𝑡𝑥 is the fused image feature in two directions on the x-axis, and 𝐹 𝑢𝑠𝑖𝑜𝑛𝐹 𝑒𝑎𝑡𝑦
is the fused image feature in the two directions on the y-axis. The parameters 𝜆 and 𝛽 are
the scores for the image texture features, which are used to evaluate the texture features of
images in different orientations, and aim to reduce the feature weight of low-texture images.
The evaluation of image texture features is obtained by calculating the variance of the image
gray-scale histogram, taking 𝑖𝑚𝑎𝑔𝑒𝑥 as an example, First, the gray-scale histogram statistics
are performed on the input image:
                                                    𝑛𝑖
                                          𝑝 (𝑧𝑖 ) =
                                                    𝑛
where 𝑛 is the total number of pixels, 𝑛𝑖 is the number of pixels of the gray level 𝑧𝑖 , and 𝑝 (𝑧𝑖 )
is the frequency corresponding to the gray level 𝑧𝑖 . Then calculate the histogram variance:
                                              𝐿−1
                                              ∑︁
                                𝑢 (𝑖𝑚𝑎𝑔𝑒) =         (𝑧𝑖 − 𝑚)2 𝑝 (𝑧𝑖 )
                                                0

                                              𝐿−1
                                              ∑︁
                                        𝑚=          𝑝 (𝑧𝑖 ) 𝑧𝑖
                                               0

where 𝑚 is the mean frequency of the histogram, 𝐿 is the total number of gray levels, and
𝑢(𝑖𝑚𝑎𝑔𝑒) is the histogram variance of the 𝑖𝑚𝑎𝑔𝑒 . In this paper, 𝑢(𝑖𝑚𝑎𝑔𝑒) is used as the
evaluation of the image texture, and the parameters 𝜆 and 𝛽:

                                             𝑢(image𝑥 )
                                 𝜆=
                                       𝑢(image𝑥 ) + 𝑢(image𝑥, )
                                          (︀       )︀
                                         𝑢 image𝑦
                                𝛽 = (︀      )︀  (︀      )︀
                                   𝑢 image𝑦 + 𝑢 image𝑦,
According to the parameters calculated by this method, the interference of low-texture picture
features can be eliminated to a certain extent. The TDE-IFF algorithm can effectively fuse
multi-directional map line features and enhance the localization performance of the fusion
algorithm.

3.2. AW-FLoc algorithm
In the field of multi-source fusion, previous algorithms often ignore that the contribution of
different signals to the location dimension is different. The significance of assigning weights
according to different coordinate axis is that the signals strongly correlated with the position of
the coordinate axis can be enhanced, and other irrelevant or weakly correlated signals can be
reduced at the same time.
                                     x-axis
                     N


            y-axis                                                 transmitter
                                                         point 9
                                                                   fingerprint point


                                          d


Figure 6: The relationship between distance and coordinate axis in narrow and long environment


   In the CSI representation, this paper introduces effective CSI as distance constraint infor-
mation, and this distance information contributes differently to different coordinate axis in a
narrow and long space. Figure 6 is an example:
   As shown in Figure 6, the fingerprint point 9 is a point far away from the transmitting end, and
the corresponding relationship between this point and the transmitting end is represented by a 𝑑
vector, where the angle between 𝑑 and the x-axis is 𝜃. Since the environment is long and narrow,
𝜃 is a small angle, assuming that the cosine distance of the included angle is used as the method
of correlation evaluation, 𝑐𝑜𝑠𝜃 is larger. In other words, the distance d has a strong correlation
with the x-axis distance. And this distance information exists in the fusion representation of
CSI, so in this case, the representation has different contributions to different coordinate axis,
which also needs to be further measured. For image features of different orientations, the image
features aggregated on the x-axis are more closely related to the position of the x-axis because
the feature changes of landmarks in this dimension can be observed, and the same is true on
the y-axis.
   The above example illustrates that the signal has different contribution degrees to the posi-
tioning of different coordinate axes, which is not only for the CSI signal but also not only for
the distance information. For the fusion positioning framework, what is needed is a method
that can adaptively evaluate the contribution of different signals to the positioning and the
adaptive evaluation of the contribution of the signals to the positioning of different coordinate
axes. This can greatly improve the positioning accuracy and universality of the algorithm.
   Based on the above analysis, this paper designs the following multi-source fusion framework
network and proposes a block diagram as shown in figure 7.
   Taking the fusion representation and CSI features of multi-directional images as an example,
the algorithm firstly takes the well-trained features as input and performs further features,
dimensionality reduction, and extraction for the input. At the same time, considering that
the algorithm locates the positions of the x and y axis respectively, separate gating networks
and position determinators are designed for the two coordinate axis. The gating network uses
the original feature as the original feature, evaluates the contribution of each feature to the
dimension, and outputs the corresponding weight. The feature will be weighted and fused based
            Output                                x-axis                  y-axis
                                                coordinate              coordinate

       Location Decision
                                               X-axis position        Y-axis position
                                             feature extraction     feature extraction
                                                                                   gating signal
                                gating signal

       Adaptive Weighted
            Fusion

                      x-axis gating layer                                                    y-axis gating layer
                                                   x-axis                 y-axis
                                                                CSI
             Input                                 fusion                 fusion
                                                  feature     feature    feature


Figure 7: The AW-FLoc algorithm


on the weight, and input to the position determiner corresponding to the given dimension. The
position decider is composed of multiple linear layers and activation functions and outputs the
position of the corresponding coordinate axis according to the fusion feature.
   (1)Input feature processing
   The features of the network include x-axis fused image features, CSI features based on MHSA-
EC, and y-axis fused image features, denoted as [𝐹 𝑢𝑠𝑖𝑜𝑛𝐹 𝑒𝑎𝑡𝑥 , 𝐹 𝑒𝑎𝑡𝑐𝑠𝑖 , 𝐹 𝑢𝑠𝑖𝑜𝑛𝐹 𝑒𝑎𝑡𝑦 ]. To
ensure the consistency of feature dimensions, further feature extraction and compression are
performed on the aggregated image features. The CSI fusion feature considers the uniformity
of the scale, and the layer normalization process is performed here, and the processed features
are [𝑠𝑝𝑎𝑐𝑒𝐹 𝑒𝑎𝑡𝑥 , 𝐹ˆ︂
                     𝑒𝑎𝑡𝑐𝑠𝑖 , 𝑠𝑝𝑎𝑐𝑒𝐹 𝑒𝑎𝑡𝑦 ].
   (2)Gating network and feature fusion
   Considering that each feature has different contributions to different spatial coordinate axis,
two gating layers are introduced to evaluate the contribution adaptively. The gating network
takes the original features as input, and performs adaptive weight scoring according to the
input features:

                 𝑔𝑘 = 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥(𝑊𝑔𝑘 [𝐹 𝑢𝑠𝑖𝑜𝑛𝐹 𝑒𝑎𝑡𝑥 , 𝐹 𝑒𝑎𝑡𝑐𝑠𝑖 , 𝐹 𝑢𝑠𝑖𝑜𝑛𝐹 𝑒𝑎𝑡𝑦 ])

                                                               𝑒𝑥𝑝 (𝑥𝑖 )
                                            𝑠𝑜𝑓 𝑡𝑚𝑎𝑥 (𝑥𝑖 ) = ∑︀
                                                               𝑗 𝑒𝑥𝑝 (𝑥𝑗 )

   Where 𝑊𝑔𝑘 is the linear layer weight of the gated network,𝑘corresponds to the two tasks of
x-axis positioning and y-axis positioning (ie, 𝑘 = 0 corresponds to the x-axis positioning task,
𝑘 = 1corresponds to the y-axis positioning task), 𝑔 𝑘 will return a three-dimensional vector
[𝑔1 , 𝑔2 , 𝑔3 ], the value of each dimension is in the (0,1) interval, corresponding to the contribution
value of each feature. The adaptive weighted fusion algorithm fuses the contribution value with
the feature:

                𝑙𝑜𝑐𝑓 𝑒𝑎𝑡𝑘 = 𝑔1 * 𝑠𝑝𝑎𝑐𝑒𝐹 𝑒𝑎𝑡𝑥 + 𝑔2 *ˆ︂
                                                   𝐹 𝑒𝑎𝑡𝑐𝑠𝑖 + 𝑔3 * 𝑠𝑝𝑎𝑐𝑒𝐹 𝑒𝑎𝑡𝑦
The input of the decider of each axis is 𝑙𝑜𝑐𝑓 𝑒𝑎𝑡𝑘 ∈ [𝑙𝑜𝑐𝑓 𝑒𝑎𝑡0 , 𝑙𝑜𝑐𝑓 𝑒𝑎𝑡1 ], corresponding to the
fused features, and then through multiple linear layers and activation functions to complete the
fitting of each position axis. The gradient descent algorithm of the model uses Adam, and the
loss function is set to cross-entropy.


4. Experiments
4.1. Experiment environments
This part mainly verifies the fusion localization performance based on AW-FLoc. The experiment
includes CSI and image acquisition equipment, and the two types of equipment correspond in
figure 8 and figure 9, respectively. The CSI acquisition equipment includes a transmitter and a
receiver, both of which are equipped with Intel 5300 network cards, and parse the CSI data by
changing the underlying driver. The transmitter uses one antenna, and the receiver is equipped
with three antennas. Ubuntu 16.04 system is installed on each device. In the received data
packet, signals such as the number of antennas, noise, CSI, and RSSI are included. The image
acquisition equipment used four low-cost cameras. Four cameras are fixed in the four directions
of the box and connected with the same laptop computer through the data line respectively.
Therefore, the multi-directional image data can be collected by a single computer operation.


Figure 8: Transmitter (left) and receiver (right)


   The experimental scene is a comprehensive office scene, as shown in figure 10. The complex
indoor environment and rich multipath effects can be used as a typical comprehensive indoor
environment. The scene is composed of an office area, a meeting room, and an external corridor.
The three environments are isolated from each other. The office area and the conference room
are isolated by the glass, and the corridor is isolated from the other two rooms by walls. There
are many desks, chairs and computers in the office area and conference room. The total area
of the scene is 152.9𝑚2 , of which the meeting room area is 8.4*1.8𝑚2 , and the laboratory area
is 16.4*4.4𝑚2 . 59 reference points and 59 test points were established in the scene, and the
distance between the reference points was 1.2m. The data collection included 59 reference
position points and 59 test points, 2000 CSI packets and 40 multi-directional images collected at
each position.
Figure 9: Four-directional camera: top view (left) side view (right)


4.2. The evaluation of TDE-IFF
To verify the positioning performance of TDE-IFF, this part mainly compares TDE-IFF with the
positioning method based on VGG-16 proposed by Wozniak P et al in 2018 from positioning
accuracy and stability [14].
   It can be seen from figure 11 that in terms of overall positioning accuracy, the algorithm
performance of TDE-IFF is better. In the low positioning error interval of 0-4.1m, there is a
higher degree of confidence. The maximum error is 8.5m for the TDE-IFF method and 10.6m for
the VGG-based positioning method. The samples are in the corners of the scene, which makes
them prone to mismatching with each other, which in turn increases the positioning error of
long-distance points. The TDE-IFF algorithm will have better control in this regard.
   Table 1 compares the average positioning error and standard deviation of the two methods.
The average error of TDE-IFF is 0.94m, which is lower than the 1.09m of the VGG method, and
the positioning accuracy is improved by 16%. In terms of stability, TDE-IFF is 1.02m, which is 9%
less than the 1.11m of the VGG method. Combined with the CDF map, the TDE-IFF algorithm
can improve the accuracy and stability of the positioning.

Table 1
Image Localization Errors of Different Methods
                                     Method      TDE-IFF    VGG
                                     mean/m        0.94     1.09
                                      std/m        1.02     1.11


4.3. The evaluation of AW-FLoc
To verify the performance of the AW-FLoc algorithm, this section conducts ablation experiments
with or without the gating layer and compares it with the MDS-KNN algorithm [10].
  (1)Adaptive weight comparison experiment
                   N


                            Laboratory


                       N


                                                                                    N

                            Meeting Room


                       N
                                                                    Laboratory


                                   Meeting room


                             Transmitter          Reference point      Test point


Figure 10: Comprehensive office scene


   In order to verify the adaptive weights, this paper compares the AW-FLoc method with the
fusion localization algorithm without adaptive weights (w.o.gate AW-FLoc). The weight of the
signal is set to 1, and the experimental results are shown in figure 12(a). In the interval of 0-5.3m,
the fusion positioning algorithm based on adaptive weight has higher confidence. Although
in the interval of 5.3m-6m, the confidence of the network without adaptive weight is slightly
ahead, overall, The performance of AW-FLoc is better, and the positioning error is also smaller.
This can prove the effectiveness of the adaptive weights.
   This section selects the algorithm based on MDS-KNN proposed by Huang et al [10]. as a
comparison item to verify the localization ability of the network. The positioning results of
the experiment are shown in the figure 12(b).In the low error interval of 0-2.2m, the AW-FLoc
algorithm slightly leads the MDS-KNN algorithm. In the range of 2.2m-8.5m, the AW-FLoc
algorithm has higher confidence and better positioning performance. When the confidence
level reaches 1, the error of the AW-FLoc algorithm is 8.5m, while the error of the MDS-KNN
Figure 11: The CDF of TDE-IFF


algorithm is 10.2m. In terms of positioning accuracy, the experiment shows that compared with
the simple feature fusion in MDS-KNN, the features of the AW-FLoc algorithm are more effective,
and due to the introduction of the adaptive algorithm, the fused features are more prominent
and effective information, making different fingerprints point The feature discrimination is
greater and the localization performance is better.


                        (a)                                           (b)


Figure 12: The CDFs of different methods.


  (2)Compare with other methods
  To further verify the stability of the AW-FLoc algorithm, the average positioning error
and standard deviation indicators are further compared here, as shown in Table 2. AW-FLoc
has the best accuracy and positioning stability. Compared with the MDS-KNN system, in a
comprehensive office scenario, the accuracy is improved by 20%, and the stability is improved
by 17%. Experiments show that the AW-FLoc method outperforms other existing methods in
localization accuracy. In addition, the adaptive weight and TDE-IFF algorithm have great gains
in the improvement of the overall positioning performance, which verifies the necessity of the
two methods for the multi-source heterogeneous signal fusion positioning algorithm.

Table 2
Fusion Localization Errors of Different Methods
                                 Method     AW-FLoc    MDS-KNN
                                 mean/m       0.61     0.73
                                  std/m       0.73     0.85


5. Conclusion
In this paper, we propose a fusion localization framework based on AW-FLoc to improve
localization accuracy and robustness. This paper first proposes the TDE-IFF algorithm, which
constructs the fusion representation of multi-directional images, reduces the interference of
low texture on the fusion representation through the texture degree evaluation algorithm,
and provides high-quality image features for subsequent fusion positioning methods. On
the basis of TDE-IFF, this paper proposes a fusion localization framework of AW-FLoc. The
algorithm adaptively weighs CSI and image features for different fingerprint points, enhances
the proportion of effective signals in the fusion representation, improves the signal-to-noise ratio
of the fusion representation, and further optimizes the localization capability of the algorithm.
Experiments show that in a comprehensive office scenario, TDE-IFF effectively improves the
positioning accuracy, with an average error of 0.94m. AW-FLoc is further improved on this
basis, with an average error of 0.61m. Compared with the MDS-KNN method, the accuracy is
improved by 20%, and the stability is improved by 17%.


6. Acknowledgments
This work was financially supported by the National Natural Science Foundation of China under
Grant No.61871054.


References
 [1] H. Huang, G. Georg, Current trends and challenges in location-based services, International
     Journal of Geo-Information 7 (2018) 199.
 [2] H. T. Hoi, V. T. Le, Location-based services, ABC Journal of Advanced Research 8 (2019)
     89–94.
 [3] D. Lvmberoooulos, R. R. Choudhury, J. Liu, S. Sen, X. Yang, V. Handziski, Microsoft indoor
     localization competition: Experiences and lessons learned, Getmobile Mobile Computing
     Communications 18 (2015) 24–31.
 [4] Z. Deng, Csi amplitude fingerprinting for indoor localization with dictionary learning,
     Entropy 23 (2021).
 [5] W. Liu, M. Jia, Z. Deng, C. Qin, Mhsa-ec: An indoor localization algorithm fusing the
     multi-head self-attention mechanism and effective csi, Entropy 24 (2022). URL: https:
     //www.mdpi.com/1099-4300/24/5/599. doi:10.3390/e24050599.
 [6] A. Xiao, R. Chen, D. Li, Y. Chen, D. Wu, An indoor positioning system based on static
     objects in large indoor scenes by using smartphone cameras, Sensors 18 (2018) 2229.
 [7] S. Xu, W. Chou, H. Dong, A robust indoor localization system integrating visual localization
     aided by cnn-based image retrieval with monte carlo localization, Sensors 19 (2019) 249.
 [8] M.-J. Chiou, Z. Liu, Y. Yin, A.-A. Liu, R. Zimmermann, Zero-shot multi-view indoor
     localization via graph location networks, in: Proceedings of the 28th ACM International
     Conference on Multimedia, 2020, pp. 3431–3440.
 [9] J. Jiao, X. Wang, Z. Deng, Build a robust learning feature descriptor by using a new image
     visualization method for indoor scenario recognition, Sensors 17 (2017) 1569.
[10] X. Huang, S. Guo, Y. Wu, Y. Yang, A fine-grained indoor fingerprinting localization based
     on magnetic field strength and channel state information, Pervasive and mobile computing
     41 (2017) 150–165.
[11] R. Dey, F. M. Salem, Gate-variants of gated recurrent unit (gru) neural networks, in: 2017
     IEEE 60th international midwest symposium on circuits and systems (MWSCAS), IEEE,
     2017, pp. 1597–1600.
[12] J. Ma, Z. Zhao, X. Yi, J. Chen, L. Hong, E. H. Chi, Modeling task relationships in multi-task
     learning with multi-gate mixture-of-experts, in: Proceedings of the 24th ACM SIGKDD
     International Conference on Knowledge Discovery & Data Mining, 2018, pp. 1930–1939.
[13] T. Huang, Q. She, Z. Wang, J. Zhang, Gatenet: gating-enhanced deep network for click-
     through rate prediction, arXiv preprint arXiv:2007.03519 (2020).
[14] P. Wozniak, H. Afrisal, R. G. Esparza, B. Kwolek, Scene recognition for indoor localization
     of mobile robots using deep cnn, in: International Conference on Computer Vision and
     Graphics, Springer, 2018, pp. 137–147.