<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of Applied Engineering Research ISSN</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">0973-4562</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/MIUCC52538.20</article-id>
      <title-group>
        <article-title>Forest and XGBoost Based Fingerprinting Using M MSE: An Approach to Data-Centric AI to Enhance Indoor Wi-Fi Localization Systems.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mohamed A.El Ghany</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mariame Niang</string-name>
          <email>mariame.niang@gmail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philippe Canalda</string-name>
          <email>philippe.canalda@femto-st.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>François Spies</string-name>
          <email>francois.spies@univ-fcomte.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Massa Ndong</string-name>
          <email>massandong@mail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ibra Dioum</string-name>
          <email>ibra.dioum@esp.sn</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Idy Diop</string-name>
          <email>idy.diop@esp.sn</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Data-Centric Artificial Intelligence, Minimum Mean Square Error (MMSE).</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of FEMTO-ST Institute/UMR CNRS 6174 Montbeliard</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>German University in Cairo</institution>
          ,
          <addr-line>3611, Cairo</addr-line>
          ,
          <country country="EG">Egypt</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Higher Polytechnic School Cheikh Anta Diop University of Dakar</institution>
          ,
          <addr-line>5005, Dakar</addr-line>
          ,
          <country country="SN">Senegal</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Indoor Positioning</institution>
          ,
          <addr-line>Wi-Fi signals, Fingerprinting approach, Machine Learning, Extreme</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University Cheikh Anta Diop of Dakar</institution>
          ,
          <addr-line>5005, Dakar</addr-line>
          ,
          <country country="SN">Senegal</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>University Virtual of Senegal</institution>
          ,
          <addr-line>Dakar</addr-line>
          ,
          <country country="SN">Senegal</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <issue>2017</issue>
      <fpage>4</fpage>
      <lpage>7</lpage>
      <abstract>
        <p>The indoor localization problem consists in identifying the Cartesian coordinates of an object or a personal asset in the buildings, malls, hospitals, campuses, factories, etc. To solve this problem, we consider a Wi-Fi-based localization method called fingerprinting, a two-step process, where a radio map of the monitored area is constructed by collecting signal strength from known locations. An unknown location is then predicted using this radio map as a reference. In this paper, we first propose an adapted Random Forest (RF) and Extreme Gradient Boosting (XGB) algorithms. This adaptation, combined with Minimum Mean Square Error (MMSE), improves the accuracy problem caused by the change of environment and extends the concept by adding a signal processing functionality as an edge cloud feature to address a dynamic cooperation clustering. By embedding the Wi-Fi Access Point (WAP) with multiple antennas, the signals sent by the Mobile User Equipment (MUE) can be processed to improve the accuracy of the bootstrap. Adding Minimum Mean Square Error (MMSE) is a kind of datacentric approach because it yields high-quality data as input. The noise inherent in the location data is reduced and thus the performance of the MMSE-aided RF and XGB improved. This enhancement is further extended by sharing data between WAPS. Thus, the MMSE processing and the sharing of such processed data between performance. The performance of these methods is evaluated through robust and extensive experiments in real-time indoor areas, with regular and reproducible scenarios. We found an interesting outcome that the proposed approach can offer better time-2-market compared to the traditional, non-Machine-Learning-based indoor positioning system approach. Gradient Boosting (XGB), Random Forest (RF), Received Signal Strength Indicator (RSSI),</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>The rapid growth of the Internet of Things (IoT), resulted in a wide range of services, including</title>
      </sec>
      <sec id="sec-1-2">
        <title>Location Based Services (LBS). Generally, localization refers to the process of obtaining the same region or the geographical location of a user or a device. Enabling accurate location-based services depends on the availability of location information. Localization systems can be categorized into</title>
        <p>2022 Copyright for this paper by its authors.
outdoor localization and indoor localization. The Global Positioning System (GPS) is the main
technology used to determine the position in outdoor localization. However, its accuracy deteriorates
in the indoor environment due to the poor penetration of GPS signals inside buildings, a lot of power
consumption, and the multipath effects on the propagating signals [1]. There is an urgent need to address
precise indoor localization. Nowadays, indoor localization is highly used in our daily life. It is used in
tracking the location inside a building, malls, hospitals, campuses, factories, etc. Several techniques are
employed for localization parameter measurements, including Time of Arrival (ToA) [2], Time</p>
      </sec>
      <sec id="sec-1-3">
        <title>Difference of Arrival (TDoA) [3], Received Signal Strength Indicator RSSI [4], Angle of Arrival</title>
        <p>(AoA), and Time of Flight (ToF) [5]. These approaches suffer from many challenges, including poor
accuracy, high computational complexity, multipath effect, shadowing, fading, and delay distortion.</p>
      </sec>
      <sec id="sec-1-4">
        <title>The fingerprints method achieves great attention recently due to its promising results with various ways</title>
        <p>of making predictions. In fingerprinting, a database is first built with data collected from a thorough
measurement of the field in the offline stage. Then, the position of a mobile user can be estimated by
comparing the newly received test data with that in the database, the online phase.</p>
      </sec>
      <sec id="sec-1-5">
        <title>Besides, Wi-Fi fingerprinting localization is one of the methods based on RSSI [6,7,8,9,10],</title>
      </sec>
      <sec id="sec-1-6">
        <title>Euclidean distance [11], based on RSSI ranging [12], trilateration [13], etc. Compared to other indoor</title>
        <p>localization methods, Wi-Fi fingerprinting localization technology has some advantages including low
hardware requirements and wide scope of application. At the same time, the technology needs to
cooperate with more advanced algorithms to ensure higher positioning precision [14]. However, indoor
localization using Wireless Local Area Network (WLAN) fingerprinting faces several challenges
including propagation effects, which degrades the localization accuracy [15].</p>
      </sec>
      <sec id="sec-1-7">
        <title>The rest of the paper is organized as follows. Section 2 gives a brief about the state of the art. Section</title>
      </sec>
      <sec id="sec-1-8">
        <title>3 presents our proposed localization methods. Section 4 presents the localization performance of the algorithm in different ways, and section 5 the conclusion of the work.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. State of the art of the previous works</title>
      <sec id="sec-2-1">
        <title>With the rapid growth in Machine Learning (ML) systems, similar approaches need to be developed</title>
        <p>in the context of ML engineering, which handles the unique complexities of the practical applications
of ML. This is the domain of MLOps. It is a set of standardized processes and technology capabilities
for building, deploying, and operationalizing ML systems rapidly and reliably. In recent years, ML
algorithms such as K-Nearest Neighbor (KNN) [16], Random Forest (RF) [17], XGB [18,19], Support</p>
      </sec>
      <sec id="sec-2-2">
        <title>Vector Machine (SVM) [ 20,21], KNN, a rules-based classifier (JRip), Decision Tree (DT), RF, and</title>
      </sec>
      <sec id="sec-2-3">
        <title>SVM [22], KNN, WKNN [23], RF, and XGB [24] have been applied to the RSSI fingerprinting</title>
        <p>positioning technique and have achieved better location results.</p>
        <p>When the structure and layout of the indoor environment change, the indoor wireless
communication environment also changes, which leads to a large gap between the new environment
and the established positioning fingerprint. However, the establishment process of the fingerprint is
very time-consuming and laborious. It is not economical or realistic to update all positioning
fingerprints regularly and frequently, which will greatly improve the maintenance cost of the RSSI
location fingerprinting system. Several methods to reduce the inaccuracies in location measurements
are proposed in the literature [25]. There is no regular test in the work we have seen. In our previous
work [24], reproducing these tests can bias the experiments. To assess the bias of machine learning
methods, carrying out more regular and reproducible tests will make it possible to resolve these
questions.</p>
      </sec>
      <sec id="sec-2-4">
        <title>It is possible to improve the position system performance by using fingerprint techniques that</title>
        <p>employ multipath information in an ML framework, which operates a dataset generated in real-time
using MMSE. In this work, we consider the RSSI between the transmitter and the receiver as the
localization attribute. This is because the RSSI-based approach poses minimum requirements on the
Wi-Fi technology of the requisite modules. RF and XGB algorithms combined with MMSE are
proposed to minimize both the measurement noise and resolve the accuracy problem caused by the
change of environment for indoor localization tasks. The method first uses RF and XGB algorithms to
establish an indoor positioning model, which can achieve indoor positioning. When the environment
changes, a further MMSE method is used to improve the initial positioning. However, Data-centric
approaches to solving AI problems have been dominant in applications where large and high-quality
datasets are available. Such approaches aim to improve model performance through the development
of more complex architectures.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <p>The fingerprint map is built where it contains the data points covering the whole area to be used by
the algorithms to predict the position. Each data point has the RSSI values from four fixed APs and
their position. The whole area is 9.5 m x 9.25 m, as shown in Figure 1. A point was taken every 0.2 m
from the x-axis and every 0.5 m from the y-axis starting from the origin unless there were obstacles like
walls or furniture that prevented taking the point. This approach for the fingerprint map resulted in
having 700 data points covering the whole area. Our approach was to increase the number of data points
and decrease the spacing between them to increase the accuracy in predicting the location. We have as
input a list of 700 points. For each measurement point, we have 20 RSSI values then we calculate the
mean of the 20 points as RSSI. (m). However, the RSSI values are very fluctuating so the mean is not
enough to characterize the precision. To improve accuracy, the mean (m) and the MMSE are combined.
We performed a point density analysis for the different scenarios. For this, we carried out different
scenarios depending on the size of the training and testing. First, we divided our data at 10 %, we have
70 for training and 630 testing points evenly distributed along with x coordinates at 0.2 m doing 1 of 2
along x and by doing 1 out of 5 according to the y coordinates at 0.5 m to respect the pitch
homogeneously, that is to say, take the diagonal. At 33 %, we divided our database by 3 by doing 1 out
of 3 along x and 1 out of 1 along y which gives 233 for training and 467 testing points respecting the
step between the coordinates x and y. At 66 %, we used 2/3 of our database, i.e.467 for training and
233 testing points. At 80 %, we divided our base by 4/5 using the fourth points for training and fifth
points for testing resulting in 560 for training and 140 testing points. Then, we added a random
positioning algorithm as a reference algorithm to compare the quality of our proposal compared to the
random one. For this, we took a random point among the 700 and we calculate the distance of this point
from real coordinates which gives us a distance of 7.5 m. We also used the midpoint algorithm, another
benchmark algorithm. The midpoint is the central point which corresponds to the 350 points of our
database and we calculate the distance from this point to the 699 remaining points then we calculate the
average. We found a distance of 3.5 m for the midpoint. Finally, we calculated the Confidence Interval
(IC) for each test point, a statistical result by calculating the mean and the standard deviation. For this,
we give a confidence interval on these values. We used the following formula to calculate the IC. If X
is a random variable defined on Ω of unknown expectation m and standard deviation б and if  ̅ is the
mean of the values observed on a sample of size n, IC at the confidence threshold α for the parameter
m is:

 = [ ̅ - t 
√</p>
      <p>√
,  ̅ + t
 ] where  ( ) =
 +1
2</p>
      <p>In MLOps, the model training lets efficiently and cost-effectively run powerful algorithms for
training RF and XGB with MMSE models. Model training should be able to scale with the size of both
the models and the datasets that are used for training. The testing model capability lets us understand
how newly trained models perform. It enhances the reliability of our ML releases by helping to decide
whether to reject poorly performing models and promote well performing ones. In the process of serving
predictions, once our model is deployed to an indoor environment, the model service starts accepting
prediction requests and providing responses with predictions.The testing data is used to evaluate the
predictions generated by the ML model. The predicted locations will be compared to the actual positions
of the test points able to evaluate the performance of different algorithms.</p>
      <sec id="sec-3-1">
        <title>The offline phase is divided into different parts. Firstly, the RSSI reading was taken using an android</title>
        <p>app called Wi-Fi Fingerprint installed on HTC One X9. This RSSI value can be fluctuated due to the
shadowing effect. Adding MMSE an approach of data-centric AI at each WAP mitigate the effect of
environmental variation by reducing the noise in the data. This new fingerprint map was saved in an
excel sheet CSV file to be used by the algorithm and sent to Python. Secondly, in the online phase, a</p>
      </sec>
      <sec id="sec-3-2">
        <title>Wi-Fi module ESP can read the values from APs and send this value to Firebase. Firebase database is</title>
        <p>specifically used because it is easy to be integrated with the Wi-Fi module and has also a library defined
in Python making it easy to deal with the Firebase [26] database. Finally, Python IDE ‘Spyder’was used
to access the data in the excel sheet. The dataset is divided into training and testing. The training data
is used to train the machine learning model to predict the position and the testing data is used to evaluate
the predictions generated by the machine learning model, as shown in Figure 2.
for APs with their coordinates such as</p>
      </sec>
      <sec id="sec-3-3">
        <title>M ( ,  ) the coordinates of the mobile user</title>
      </sec>
      <sec id="sec-3-4">
        <title>A variety of speech enhancement approaches have been proposed. They differ in the statistical</title>
        <p>model, distortion measure, and in the manner in which the signal estimators are being implemented.</p>
      </sec>
      <sec id="sec-3-5">
        <title>Perhaps the simplest scenario is obtained when the signal and noise are assumed statistically</title>
        <p>independent Gaussian processes, and the MSE distortion measure is used. For this case, the optimal
estimator of the clean signal is obtained by the Wiener filter. Since speech signals are not strictly
stationary, a sequence of Wiener filters is designed and applied to vectors of the noisy signal. MMSE
estimation under Gaussian assumptions leads to linear estimation in the form of Wiener filtering. Noise
Reduction using MMSE can be used where the enhancement of noisy speech signals is essentially an
estimation problem in which the clean signal is estimated from a given sample function of the noisy
signal. The goal is to minimize the expected value of some distortion measure between the clean and
estimated signals. For this approach to be successful, a perceptually meaningful distortion measure must
be used, and a reliable statistical model for the signal and noise must be specified. At present, the best
statistical model for the signal and noise, and the most perceptually meaningful distortion measure, are
not known.</p>
      </sec>
      <sec id="sec-3-6">
        <title>Due to the shadowing effect which deteriorates the MSE of localization. The MMSE estimation of</title>
        <p>Wireless Sensor Networks (WSN) is investigated. This MMSE algorithm can be used to locate the
coordinates of unknown node values and also minimize location errors. Their simulation results show
that the distance variance of distances between reference nodes and unknown nodes increases the MSE
of localization [27]. In this paper, to calculate the MMSE, we use the method proposed in [28] by using
1 ( 1 ,  1 ),  2 ( 2 ,  2 ),  3 ( 3,  3 ),  4 ( 4 ,  4 ) and
( −  1)
2 + ( −  1)</p>
        <p>2 =  12
( −  2)
2 + ( −  2)</p>
        <p>2 =  22
( −  3)
( −  4)
2 + ( −  3)</p>
        <p>
          2 =  32
2 + ( −  4)
2 =  42
( )
( )
( )
( )
, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
        </p>
      </sec>
      <sec id="sec-3-7">
        <title>After subtraction of the equations ( )  ( ) then ( )  ( ), we will have the following systems:</title>
        <p>
          { 12 −  22 − 2 ( 1− 2) +  12 −  22 − 2 ( 1−  2)=  12 −  22
{ 22 −  32 − 2 ( 2− 3) +  22 −  32 − 2 ( 2−  3)=  22 −  32
This can be written as a linear equation and becomes:
bX=a such as b= [ ]; a=[ 12 −  22 +  12 −  22

 22 −  32 +  22 −  32 −−   2212 −−   3222 ];X=[
2 ( 1 −  2)
2 ( 2 −  3)
2y ( 1 −  2)]
 22 −  32 +  22 −  32 −−   2212 ++   3222 ] ,(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
        </p>
      </sec>
      <sec id="sec-3-8">
        <title>Distance measurements can be disturbed by noise or obstacles, which makes distances instead, distances are used with measurement errors and the equation becomes:</title>
        <p>̂  = √(   −  ̂ )2 + (   −  ̂ )2
for i=1,.,n. n is the number of AP.</p>
        <p>
          The Squaring and rearranging these terms yields the following equation for each access point
measurement
( ̂ −  1)
( ̂ −  2)
2 + ( ̂ −  1)
2 + ( ̂ −  2)
( ̂ −  3)
2 + ( ̂ −  3)
2 = ̂ 12(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
2 = ̂ 22(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
2 = ̂ 32(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
        </p>
        <p>
          ,(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
        </p>
        <p>
          = (   )−1  Z (
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
        </p>
      </sec>
      <sec id="sec-3-9">
        <title>However, Federated learning (FL) is a distributed learning framework. As described in [31], FL</title>
        <p>requires end-users’ devices with low computation power to send in their local pretrained machine
learning model to a sink. The sink will concatenate the models into a global model to perform ML tasks.</p>
      </sec>
      <sec id="sec-3-10">
        <title>The models received at the sink are affected by noise, and the sink needs to mitigate the noise to effectively use the local models. Similarly, MMSE is used in our proposed approach to Data-centric AI to suppress the noise of the received measurement used in the fingerprinting.</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3.2.1. Data-centric AI with MMSE</title>
      <sec id="sec-4-1">
        <title>Due to the training datasets which impact the performance of the ML, this paper explores the</title>
        <p>concept of data-centric explanations for ML systems that describe the training data to the end-user.
Their results show that data-centric explanations have the potential to impact how users judge the
trustworthiness of a system and to assist users in assessing fairness [32]. A data-centric approach to AI
provides a systematic way to improve data, build data consensus, and clean up inconsistent data. This
is usually overlooked and data collection is treated as a one-time task. The data-centric approach is
more rewarding and calls for a move towards data centrism. To make MLOps systematic, it uses firstly
a model-centric view to collect what data it can develop a model good enough to deal with the noise in
the data and hold the data fixed and iteratively improve the model. Secondly, it uses a data-centric view
witch the consistency of the data is paramount. However, using tools to improve the data quality will
allow multiple models to do well but to hold the code fixed and iteratively improve the data. MLOps’
most important task is to make high-quality data available through all stages of the ML project lifecycle
example prediction serving [33]. In wireless signal processing applications, where the RSSIs values are
usually noisy, a potentially more fruitful approach is MMSE as an approach to data-centric AI one that
focuses on improving the data to make simpler wireless network locations perform better. The idea is
to enhance signal data by improving removing noise. This idea can be extended to include transforming
signals into a wireless network where key features become more prominent and easier to use. However,
with a data-centric view, there is significant room for improvement in problems with noise.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3.2.2. Random Forest MMSE</title>
      <sec id="sec-5-1">
        <title>RF contains several DTs on various subsets of the given dataset and takes the average to predict the</title>
        <p>location and the accuracy of the dataset compared to other algorithms in ML such as SVM, KNN, etc.
During training, a set of labeled training points can be used to optimize the parameters of the tree, and
for testing the same unlabeled test input data is pushed through each component tree. At each internal
position, a test is applied and the data point is sent for a prediction. To extend the concept by adding a
signal processing functionality as an Edge cloud feature to implement a dynamic cooperation clustering,
the MMSE algorithm at each WAP to enhance the quality of the bootstrapped data and share that
enhanced bootstrap with the neighboring WAPS in demand, and this MMSE is combined to the random
forest.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Proposed RF. (MMSE) algorithm for dynamic cooperation clustering. 1. For k=1 to B</title>
        <p>•
•
•</p>
      </sec>
      <sec id="sec-5-3">
        <title>Draw N sample points from the collected data from the MUEs and the neighboring WAPS to form a bootstrap at the designated WAP</title>
      </sec>
      <sec id="sec-5-4">
        <title>Applied the MMSE to the data collected from the MUEs to reduce the noise</title>
        <sec id="sec-5-4-1">
          <title>Grow a random forest tree</title>
          <p>to the bootstrapped data by recursively repeating the
following steps for each terminal node of the tree until the minimum size 
is reached
•
•</p>
        </sec>
      </sec>
      <sec id="sec-5-5">
        <title>Select m variables at random from the p variables</title>
      </sec>
      <sec id="sec-5-6">
        <title>Pick the best variable/split-point among the m (iii) Split the node into two daughter</title>
        <p>nodes
2. Output the ensemble of trees {  }1 .</p>
        <p>The prediction of a new location from the u=input data x is given by the regression

1
 ̂ ( ) =</p>
        <p>∑</p>
        <p>=1   ( )
  
( ) = 

{  ( )}1 .
b-th random forest tree, then</p>
        <sec id="sec-5-6-1">
          <title>The classification is given by the majority vote as follows: Let   ( ) be the class prediction of the</title>
        </sec>
      </sec>
      <sec id="sec-5-7">
        <title>With the proposed RF. (MMSE) the algorithm, each WAP applies the RF locally using its data and</title>
        <p>the data received from the neighboring WAPS to construct the bootstrap. The contribution to this
scheme is the sharing of data by the WAPS which enables a dynamic cooperation clustering. The data
shared between WAPs is already processed with MMSE to reduce the noise. It further makes the size
of the bootstrap variable at each WAP. The cluster of WAPS exchanging data is of a variable size too.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>3.2.3. XGBoost MMSE</title>
      <p>XGB is a software library. It split the X and Y data into a learning and testing set. The training set
will be used to prepare the XGB model and the testing set will be used to make the predictions, from
which it can evaluate the performance of the model. For this, it will use the train test split function from
the scikit-learn library. It also specifies a seed for the random number generator so that we always get
the same split of data each time. The format of the positions of the training data also needs to be
modified for the fit function to work Finally. To improve the location accuracy caused by the change
in environment, we propose to use XGB. (MMSE). The method first uses the XGB algorithm to
establish an indoor positioning model. When the environment changes, further combine the MMSE
method to improve the initial positioning.</p>
    </sec>
    <sec id="sec-7">
      <title>4. Evaluation of performance</title>
      <p>point.</p>
      <p>The performance of our developed system is evaluated in terms of localization accuracy. In MLOps,
to evaluate the performance capability let’s assess the effectiveness of our model, interactively during
experimentation. For this, we need to visualize and compare performances of different models, compute
pre-defined or custom evaluation metrics for our model on different slices of the data and track
trainedmodel predictive performance across different continuous-training executions. This can help to enable
model behavior interpretation using various explainable AI techniques. To evaluate the performance,
the different localization algorithms are tested in simulation and compared, as shown in table 1. In all
cases, the same training data was used to make the machine learning model. The MSE is used to measure
the accuracy of the localization algorithms.</p>
      <p>MSE=</p>
      <p>
        ∑ (Y −  ̂ ) 2 ,(
        <xref ref-type="bibr" rid="ref9">9</xref>
        ) where Y and  ̂ are the actual and estimate coordinates at n-th references
      </p>
    </sec>
    <sec id="sec-8">
      <title>4.1.1. Simulation description</title>
      <sec id="sec-8-1">
        <title>For the simulation, we took all the test points for each percentage to sweep the whole space. That is</title>
        <p>to say take 630 test points for 10 %, as shown in Figure 3, 467 test points for 33 %, as shown in Figure</p>
      </sec>
      <sec id="sec-8-2">
        <title>4, 233 testing points for 66 %, as shown in Figure 5 and 140 test points for 80 %, as shown in Figure 6.</title>
      </sec>
      <sec id="sec-8-3">
        <title>So, for testing, we have other possibilities for each percentage. We have 9 possibilities at 10 %, 3</title>
        <p>possibilities at 33 %, 2 possibilities at 66 %, and 2 possibilities at 80 %. These experimental results
show that at 10 %, the accuracy between RF. (m) and RF. (MMSE) is improved by 66 % and 48 %
between XGB. (m) and XGB. (MMSE). At 33 %, there is a 79 % improvement in accuracy between</p>
      </sec>
      <sec id="sec-8-4">
        <title>RF. (m) and RF. (MMSE) and 80 % between XGB. (m) and XGB. (MMSE). At 66 %, there is a 22 % improvement in accuracy between RF. (m) and RF. (MMSE) and 28 % between XGB. (m) and XGB. (MMSE). At 80 %, there is a 27 % improvement in accuracy between RF. (m) and RF. (MMSE) and 29 % between XGB. (m) and XGB. (MMSE).</title>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>4.1.2. Discussion of the experimental results</title>
      <sec id="sec-9-1">
        <title>Analysis of our experimental data revealed that most location errors occurred due to attribution of</title>
        <p>too much relevance for low RSSI values, that is to say, corresponding to a weak reception, which would
present fluctuations that can be further amplified by the presence of interior obstacles, can cause the
coordinates of a point of distant affect the estimation. We compared the performance of the XGB and</p>
      </sec>
      <sec id="sec-9-2">
        <title>RF algorithm by using MMSE with the state-of-the-art in terms of accuracy. The experiment is done in</title>
        <p>a real-time environment, with a regular and reproducible scenario. Different scenarios of the test are
done with different training and testing with regular distribution. The accuracy of RF.(m), XGB. (m),
RF. (MMSE) and XGB. (MMSE) are respectively 2.26 m, 2.36 m, 1.60 m, and 1.88 m at 10 %.</p>
      </sec>
      <sec id="sec-9-3">
        <title>At 33 %, we have 233 for training and 467 for testing, the accuracy of RF.(m), XGB. (m), RF.</title>
        <p>(MMSE) and XGB. (MMSE) are 2.01 m, 2.17 m,1.22 m, and 1.37 m respectively. At 66 %, we have
467 for training and 233 for testing. The accuracy of RF.(m), XGB. (m), RF. (MMSE) and XGB.
(MMSE) are respectively 1.25 m,1.30 m,1.03 m, and 1.08 m. At 80 %, this means that we divided our
data into 560 for training and 140 for testing. The accuracy of RF.(m), XGB. (m), RF. (MMSE) and
XGB. (MMSE) are respectively 1 m, 1.11 m, 0.73 m and 0.82 m. These results show that RF. (MMSE)
and XGB. (MMSE) give the highest accuracy than RF.(m), XGB. (m). These results confirm the interest
of ML. But, the analysis of knowing which is the most efficient algorithm varies according to the
training set compared to the testing set that is needed at 70 %, this is where we obtain the best result.
such algorithms using RF or XGB vary, we do not have the same performance and above all the quality
of the accuracy is really different. What seems more reasonable is the results we obtain today rather
than in the initial test which according to the non-reproducible tests we have a bias which is very
important of 2 % compared to the previous paper.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>5. Conclusion</title>
      <p>In this work, we performed an implementation, evaluation, and analysis of machine learning
algorithms such as Random Forest and Extreme Gradient Boosting in an indoor environment. These
algorithms are combined with MMSE, a data-centric approach to AI, to reduce the noise data and
improve accuracy. This indoor location approach resulted in having 700 data points by using an app
called Wi-Fi Fingerprint installed on the phone. Various regular and reproducible test sets were carried
out. These regular tests are useful to evaluate the ML algorithms and to have a more real and
reproducible. As part of an indoor experiment, XGB and RF combined with MMSE give better results
at 80 % or 560 learning data and 140 test data with an accuracy of 0.72 m and 0.80 m respectively. The
experimental results show that the proposed algorithms RF. (MMSE) and XGB. (MMSE) still achieve
good positioning effect even in environmental changes compared to other algorithms, which makes it a
good algorithm for the indoor location.</p>
    </sec>
    <sec id="sec-11">
      <title>6. Acknowledgements</title>
      <sec id="sec-11-1">
        <title>This work is the results of the research project funded by the International Development Research</title>
      </sec>
      <sec id="sec-11-2">
        <title>Centre (IDRC) and Swedish International Development Cooperation Agency (SIDA), Artificial</title>
      </sec>
      <sec id="sec-11-3">
        <title>Intelligence for Development (AI4D) Africa Scholarship Fund Manager- Africa Center for Technology</title>
      </sec>
      <sec id="sec-11-4">
        <title>Studies (ACTS). This work was supported by the French government’s “Eiffel excellence scholarship”, program. [grant number N° P769615J-2021]</title>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>7. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Paul</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Wan</surname>
          </string-name>
          , “
          <article-title>Wi-Fi Based Indoor Localization and Tracking Using Sigma-Point Kalman Filtering Methods</article-title>
          ,” Position, Location and
          <string-name>
            <given-names>Navigation</given-names>
            <surname>Symposium</surname>
          </string-name>
          ,
          <year>2008</year>
          IEEE/ION, pp.
          <fpage>646</fpage>
          -
          <lpage>659</lpage>
          , United States of America, 5-8 May
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , “
          <article-title>TOA localization for multipath and NLOS environment with virtual stations,”</article-title>
          <source>EURASIP Journal on Wireless Communications and Networking</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W.</given-names>
            <surname>Gerok</surname>
          </string-name>
          , J. Peissig, “
          <article-title>TDOA assisted RSSD based localization using UWB and directional antennas</article-title>
          ,” Leibniz Universität Hannover, Thomas Kaiser, Universität DuisburgEssen, Germany,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kokkinis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kanaris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Liotta</surname>
          </string-name>
          , S.Stavrou, “
          <source>RSS Indoor Localization Based on a Single Access Point, ” Sensors</source>
          <year>2019</year>
          ,
          <volume>19</volume>
          , 3711. https://doi.org/10.3390/s19173711.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.U.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Arablouei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.D.</given-names>
            <surname>Hoog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kusy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jurdak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          , “
          <article-title>Estimating Angleof-Arrival and Time-of-Flight for Multipath Components Using WiFi Channel State Information</article-title>
          , ”
          <source>Sensors</source>
          <year>2018</year>
          ,
          <volume>18</volume>
          , 1753. https://doi.org/10.3390/s18061753.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.Y.</given-names>
            <surname>Lam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.C. S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Nie</surname>
          </string-name>
          , K. Liu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Xue</surname>
          </string-name>
          , “
          <article-title>Data Rate Fingerprinting: A WLAN-Based Indoor Positioning Technique for Passive Localization,” IEEE Sensors Journal</article-title>
          , Aug.
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kokkinis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kanaris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Liotta</surname>
          </string-name>
          and
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Stavrou “RSS Indoor Localization Based on a Single Access Point</article-title>
          ,” Department of Electrical Engineering, Eindhoven University of Technology, 5600 Eindhoven,
          <article-title>The Netherlands</article-title>
          ,
          <source>Journal Sensors</source>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhu</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          , “
          <article-title>Wireless Localization Based on RSSI Fingerprint Feature Vector</article-title>
          ,'' College of Computer and Information Engineering, Xiamen University of Technology, China, Hindawi Publishing Corporation
          <source>International Journal of Distributed Sensor Networks Volume</source>
          <year>2015</year>
          ,
          <string-name>
            <surname>Article</surname>
            <given-names>ID</given-names>
          </string-name>
          528747, 7 pages
          <fpage>http</fpage>
          ://dx.doi.org/10.1155/
          <year>2015</year>
          /528747.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.N</given-names>
            <surname>Padmanabhan</surname>
          </string-name>
          , “
          <article-title>Radar: An in-building RF-based user location and tracking system ''</article-title>
          ,
          <source>In Proc. IEEE Infocom</source>
          , Israel;
          <year>2000</year>
          . p.
          <fpage>775</fpage>
          -
          <lpage>784</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Youssef</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Agrawala</surname>
          </string-name>
          , “
          <article-title>The Horus WLAN locationdetermination system''</article-title>
          ,
          <source>Conference: Proceedings of the 3rd International Conference on Mobile Systems</source>
          , Applications, and Services, Seattle, Washington, USA,
          <year>June 2005</year>
          . 7]
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Qiu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , “
          <article-title>A New Algorithm for Indoor RSSI Radio Map Reconstruction,” Department of Shenzhen Key Laboratory of Spatial Smart Sensing</article-title>
          and Services, Shenzhen University, School of Geodesy and
          <article-title>Geomatics and Collaborative Innovation Center for Geospatial Technology</article-title>
          , Wuhan University, School of Environmental Science and Spatial Informatics, China University of Mining and Technology, 2018
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Peng</surname>
          </string-name>
          , “
          <article-title>Robust Localization Algorithm Based on the RSSI Ranging Scope</article-title>
          ,” School of Electronic Information Engineering, Suzhou Vocational University, Publishing Corporation International Journal of Distributed Sensor Networks, China, Jan.
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>O.</given-names>
            <surname>Pathak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Palaskar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Palkar</surname>
          </string-name>
          , M.Tawari,“
          <article-title>Wi-Fi Indoor Positioning System Based on RSSI Measurements from Wi-Fi Access Points -</article-title>
          A
          <string-name>
            <surname>Tri-lateration</surname>
            <given-names>Approach</given-names>
          </string-name>
          ,''
          <source>International Journal of Scientific &amp; Engineering Research</source>
          , Volume
          <volume>5</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>4</given-names>
          </string-name>
          ,
          <string-name>
            <surname>April</surname>
            <given-names>-</given-names>
          </string-name>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y. Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          , “
          <article-title>Wi-Fi indoor positioning in a smart exhibition hall based on received signal strength indication</article-title>
          ,''
          <source>EURASIP Journal on Wireless Communications and Networking''</source>
          (
          <year>2019</year>
          )
          <year>2019</year>
          :275 https://doi.org/10.1186/s13638-019-1601-3
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Khalajmehrabadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gatsis</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Akopian</surname>
          </string-name>
          , IEE, “
          <article-title>Modern WLAN Fingerprinting Indoor Positioning Methods</article-title>
          and Deployment Challenges,'' IEEE Communications Surveys &amp; Tutorials, Oct.
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.S.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jang</surname>
          </string-name>
          , “
          <article-title>An Accurate Fingerprinting based Indoor Positioning Algorithm</article-title>
          ,'' Department of Computer Science, Sangmyung University, Seoul, South Korea, International
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>