CSUIAPDet: Indoor Wi-Fi Dataset for Access Point Location Determination based on Low-Cost Robot⋆ Di Tian1,2, Dangjun Zhao1,2,∗ 1 School of Automation, Central South University, South Lushan Road 932, 410083 Changsha, China 2 Hunan Provincial Key Laboratory of Optic-Electronic Intelligent Measurement and Control, South Lushan Road 932, 410083 Changsha, China Abstract Locations of indoor Access Points (APs) play a critical role in indoor Wi-Fi localization. The existing manual site survey for AP location determination is relatively accurate but labor-intensive and time-consuming, especially in large-scale scenarios. Hence, the existing state-of-the-art methods prefer to perform inversion methods based on received signal strength (RSS) at Reference Points (RPs) to estimate the locations of APs. However, due to the absence of common datatsets, comparing these methods on an equal footing is challenging. Therefore, a dataset that covers multiple scenarios from a single room to the whole wide-range floor is first presented in this paper. The dataset is automatically constructed by a low-cost small indoor mobile robot at Central South University, Changsha, China. The dataset contains diversity elements: the occupancy grid map, the real global locations of the pre-deployed APs in each scenario, and the measurements at each reference point. As a result, it is convenient for scholars to propose new methods and execute comparisons. In addition, dataset baselines for AP location determination under different methods are demonstrated. The dataset will be available at https://github.com/tdcsu/CSUIndoorAPDet. Keywords AP Location determination, Wi-Fi indoor localization, Dataset 1 1. Introduction The localization technique takes prime importance in lots of application areas, such as transportation[1], measurement industry[2], factory production[3], and commercial services promotion[4]. The developed GPS (Global Position System)[5] provides sufficient accuracy on outdoor localization tasks. However, it is usually degraded by the obstruction from buildings, leaving accurate indoor localization a problem[6]. Hence, lots of indoor localization methods have been proposed. These methods are generally sensor-based and WLAN-based [7]. The former utilizes data from extra sensors such as cameras or LiDAR (Light Detection and Ranging) to compute location results, which is inconvenient for daily usage. The latter, also named wireless localization, takes the convenience of the widely pre-deployed WLAN to construct an indoor localization system, which greatly reduces the investment in external devices, especially in large-scale indoor scenarios. There are two typical frameworks in indoor wireless localization: the range-based[8] and the fingerprint-based[9]. The first framework mainly localizes the users through a trilateral localization algorithm. This algorithm heavily relies on the locations of the APs and the estimated distances from the APs to the signal-receiving points. The second framework estimates users' locations based on the received signal strengths coupled with the corresponding locations (fingerprints) and the database. In this case, the prior-known locations of APs help to locate the users in a sub-range in the dataset to expedite the solution process[10]. Therefore, the location determination of APs is critical to the effectiveness of the indoor wireless localization system. Proceedings of the Work-in-Progress Papers at the 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN-WiP 2024) ∗ Corresponding author. tiandi_csu@csu.edu.cn (D. Tian); zhao_dj@csu.edu.cn (D. Zhao) 0000-0001-6193-6994 (D. Tian); 0000-0001-9286-1999 (D. Zhao)) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings The existing AP location determination methods until now are the model-free methods[11] and the model-based methods [12]. The series of model-free methods are derived from the centroid methods. The model-based methods are the optimization problems constructed based on the Log- distance Path Loss (LDPL) model[13]. These methods are all data-driven and sensitive to the measurements of RSS and the corresponding location values. Unfortunately, the data tested by these methods are not open-access, which is not conducive for researchers to make objective comparisons of existing methods, which hinders the development of the community. Besides, the manual data collection methods in these works above are time-consuming and labor-intensive. Therefore, inspired by the current situation and to promote the development of the research community, we propose the CSUIndoorAPDet: the first open-source wireless dataset for AP location determination. The main contribution of this paper is the first comprehensive dataset for indoor AP location determination. The characteristics of this dataset and our work are as follows: • First, our dataset is almost automatically constructed by robots based on SLAM and navigation rather than through manual site surveys. Hence, the process is labor-saving and convenient. • Then, the dataset total covers 1060m2 space and multiple scenarios including a single room and the entire floor. Therefore, the dataset can be utilized to test the applicability of the algorithm. • Next, the usage example of our dataset on AP location determination is provided. The attached baseline shows clear comparisons among the existing methods, which helps the scholars' improvement in this field. • Moreover, the information elements of our dataset are categorized by their attributes. These categories can be used for diversity problems, including not only the AP location determination, but the indoor layout generation, AP selection, or WLAN RSS-based localization. The rest of this paper is organized as follows: Section 2 reviews the related work. Section 3 describes the collection methodology of our dataset. Section 4 details the components of the dataset. Section 5 presents a usage example of our dataset for AP location determination therefore the baseline of existing methods. Section 6 discusses our work, and Section 7 concludes this paper. 2. Related Work The AP location determination methods are mainly fed with the RSS measurements and corresponding location values to estimate the locations of interested APs. The existing methods can be divided into the model-free and the model-based. The model-free methods mainly estimate the AP locations by the centroid method. The pure centroid AP location determination method, proposed by Bulusu et al.[14], only use the location values of detected measurements to predict the location of APs. It is inaccurate because the unevenly distributed RSS levels are not included. According to this, the RSS levels were merged into the centroid method by Blumenthal et al. [15] to develop the weight centroid method, which is better than the pure centroid method. Nevertheless, these methods assume that the measured values are well distributed around the AP location, which is not reliable without prior knowledge. Moreover, these methods don't take the transmission characteristic of the indoor wireless signal into consideration. The model-based methods utilize the log-distance path loss(LDPL) model to estimate the locations of APs. The LDPL model describes the propagation tendency in 3D space. It is an experienced model that describes the relation between the distances from the locations of APs to the corresponding RPs and the RSS levels at these RPs. Therefore, the AP location determination problem can be transformed into the nonlinear least-square optimization by using LDPL as the observation model, which was proposed by J. Koo et al [12], but only tested in simulation. Y. Zhuang et al. [16] extended this method and provided a real field experiment. However, the existing methods related to AP location determination are completely demonstrated on private datasets. Their dataset was constructed from a manual site survey, which is labor- intensive and time-consuming. Consequently, it is inconvenient for scholars to make comprehensive comparisons between their methods and the existing methods, hindering the development of indoor wireless localization technology. Hence, to promote further development in this field, we provide the CSUIndoorAPDet dataset. 3. Collection Methodology This section demonstrates the methodology of data collection using an automatic wheeled robot. The components of the robot are detailed in 3.1. The data acquisition workflow is then introduced in 3.2. 3.1. System and Equipment Figure 1: (a) The setup of our automatic robot and (b) the workflow of data collection. The automatic robot is shown in Fig. 1 (a). Omni-wheeled bases are chosen for their flexibility and suitability in the indoor environment. The STM32F103RCT6 serves as the motor control unit (MCU). And the inertial measurement unit equipped here is MPU6050. It is coupled with the SIMINICS LiDAR to achieve mapping, localization, and navigation based on the NVIDIA Jetson TX2 development board. And the robot operation system (ROS) we use is ROS Melodic on Ubuntu 18.04. In addition, a Wi-Fi module is inserted on the TX2 to sniff the Wi-Fi signal. 3.2. workflow of data collection The data collection workflow is illustrated in Fig. 1 (b). There are three sequential modules: indoor spatial map construction, goal labeling, and navigation with recording. The indoor occupancy grid map is constructed in the first module. The navigation goals are labeled on the indoor map in the second module. Based on the map and navigation goals, the robot records RSS and location values automatically and sequentially in the third module. The dataset is finally compiled by these measurements. 3.2.1. Indoor Spatial Map Construction We chose the Cartographer SLAM[17] for indoor map construction based on 2D LiDAR. Constructing maps for indoor space manually is accurate, but it can be time-consuming and labor- intensive, especially in large-scale indoor scenarios. The SLAM technology can also generate accurate indoor layout representation, and it performs nearly real-time even when faced with large indoor areas. In addition, two-dimensional location information is sufficient to meet the most development needs of indoor positioning systems. Therefore, 2D LiDAR SLAM is suitable. The maps constructed are occupancy grid style and stored as PGM format files in our dataset. 3.2.2. Goal Labeling Figure 2: The illustration of goal labeling based on our rosnode Navigation goals are labeled by the human-computer interaction strategy we developed. The whole operation for goals labeling is shown in Fig. 2. First, the rospkg map_server takes the PGM format maps as input and publish them on the rostopic /map. The map can then be visualized by the software rivz. When the operator uses the mouse to click at the green point in the map, the rosnode publish_point transfers this action into the rostopic /clicked_point. After that, the rosnode goal_labeling translates the messages under the /clicked_point into markers. The location values of these markers are finally saved sequentially as goals in a matrix file. For more information about the ROS, please refer to its official website ROS wiki: http://wiki.ros.org. 3.2.3. Navigation with Recording Figure 3: The left: the whole process and data stream of the module navigation and recording. The right: the site experiment of navigation and recording in scene 1, the red boxes indicate the navigation map with goals and the robot, respectively. This module details the Wi-Fi RSS levels recording in the indoor environment based on the navigation strategy. The navigation strategy is constructed on the algorithms of localization and path planning. The localization method applied is Adaptive Monte Carlo Localization. It is mainly fed with the map and the real-time LiDAR scan to compute the robot's location on the map. The path planning methods are composed of the global planning algorithm Dijkstra and the local planning algorithm Time Elastic Band (TEB). The global panning algorithm utilizes the obstacle information presented in the map to generate a viable global path from the current location to the desired destination. The local path planning makes adjustments to the global path based on the real- time LiDAR scan. Based on the methods above, the robot can travel around between near goals. Therefore, every measurement including the RSS level and the corresponding location values can be recorded by our Wi-Fi sniffing rosnode. The whole process and data stream of the navigation and recording is depicted in Fig. 3. 4. Dataset Presentation In this section, the detailed elements information of the dataset in each scenario are completely presented. The element tree of this dataset is depicted in Fig.4. The abbreviations and meanings for each member are as follows: Figure 4: The element tree of this dataset. • OGM: 2D Occupancy Grid Map of the corresponding experiment site. • PAP: information of the pre-deployed APs. • EAP: information of the existing APs. • PRP: reference points of the pre-deployed APs. • ERP: reference points of the existing APs. 4.1. Scenarios and OGM As presented in Fig. 5, the dataset was captured in the Science Education Building and the Shenghua Building in the Main Campus of Central South University(CSU). Three typical indoor scenarios are included: a single room scene 1, an open corridor scene 2, and a whole floor with rooms scene 3. Their corresponding OGM are separately depicted in sub-figure (c), (d), and (e). 4.2. Information of APs The type of PAP we deployed is TP-Link TL-WDA6332RE, shown in Fig. 5. The number of PAP in each scene (Shown in Table 1) varies due to the layout in the actual scene. Ground truth locations of the PAP in every scene are recorded by the laser ranger in Fig. 5. Ground Truth can be used to quantitatively evaluate the accuracy of AP location estimation methods. Their 2D coordinates values are separately listed in Table 2. Their spatial distributions are marked as red dots in sub-figure (c), (d), and (e) of Fig. 5, respectively. Figure 5: (a), (b) the illustrations of three scenes in the Google Map. (c), (d) and (e) the OGMs of these scenes, red dots in each OGM indicate the locations and the numbers of the PAPs, of which locations are obtained through a laser ranger and listed in Table 1. The unit of OGM is meter. Table 1 The Number of The PAP and The EPA in Each Scene Scene ID Scene 1 Scene 2 Scene 3 PAP Number 6 5 4 EAP Number 122 528 387 Table 2 PAP Location Ground Truth in Each Scene (unit: meter) Scene ID Scene 1 Scene 2 Scene 3 AP 1 [-0.685,0.800] [1.333,1.389] [0.979,7.500] AP 2 [10.750,-2.320] [-9.647,11.620] [0.979,1.800] AP 3 [8.558,-6.083] [13.958,22.621] [-24.070,-12.730] AP 4 [4.348,-5.953] [-3.550,33.171] [-31.970,-3.580] AP 5 [1.485,-5.843] [2.509,45.070] — AP 6 [-0.785,-3.900] — — Apart from the pre-configured APs, the existing APs are the most detectable in the indoor environment. Unfortunately, the determination of their locations is extremely difficult. It is not only caused by their large amount but also subjected to personal authorization. Therefore, the ground truth of their locations is not included in the dataset. The main included information consists of two main parts: (1) their number in each scene, which is shown in Table 1; (2) the corresponding RSS at each RP, which is presented in Section 4.3. 4.3. Information of Reference Points The reference points consist of a series of measurements. The numbers of measurements in each scenario are 2489, 4230, and 4495, respectively, which are listed in Table 3. Each measurement contains an RSS level and a corresponding coordinate in the local map. The RSS levels of all measurements are signals of the PAPs or the EAPs. According to this, the RPs are divided into two categories and named PRP and ERP, respectively. For the coordinates, they are generated by the path planning method on the trajectory or they are directly the goals we set. As previously mentioned in section 3.2.3, the robot records its locations and the RSS levels only once at each coordinate on the path. The robot was set to record 10 times at every navigation goal. However, due to signal propagation issues such as obstruction from obstacles., there may be a difference between the actual measured times and the set value. Although these RPs can all be fed as input of the mentioned location determination methods, amplitude difference and attenuation during propagation are inevitable. The former is mainly caused by the multipath effect, and the latter is brought by the mixed occlusion from pedestrians, building walls, and the indoor layout. To tackle these, mean values of RSS levels at each RP are recommended. Table 3 The Number of Measurements in Each Scene Scene 1 Scene 2 Scene 3 2489 4230 4495 5. Usage Example This section presents the usage example of our dataset for AP location determination. Some existing methods are tested as the baseline for further comparisons. Note that the objective of this work is not to propose a new indoor AP location determination method, but a common wireless dataset built in a labor-saving strategy, which can be used to test and compare the AP location determination methods. The example is presented in Scene 1. We have tested the model-free methods including the centroid method[14], the weight-centroid method[15], and the model-based method containing the NNLS[12] and the NNLS-f[16]. Here the information of the pre-deployed AP (PAP) is used to estimate the location value of the corresponding AP. The RSS levels are first filtered by a gaussian filter before solving the problem. The qualitative results are illustrated in Fig. 6. In each sub-figure, the ground truth of AP locations (green points) and the estimations (red points) of them are drawn in the 2D map. For model-free methods, the results in (a) and (b) indicate that although the weight-centroid method produces better results than the centroid method, the results with these two methods are unsatisfactory. The reason is the measurements are usually not very evenly distributed around the AP without prior knowledge of field applications. The results of model-based methods including (c) Non-Linear Least Squares (NLLS), (d) Non-Linear Least Squares, and Multi-Level Quality Control (NLLS-M) are much better than the model-free methods. However, the results of some APs are unacceptable. Besides, the comparisons with these methods indicate there should be much improvement under new methods provided by researchers in the near future. To describe more precisely, the quantitative results on AP location determination errors are stated based on Table 4. The metric selected here is the Euclidean distance between the ground truth AP locations and the estimated values from existing methods. In Table 4, as described before, the centroid method (C) and the weighted centroid (C-W) method perform worse than other methods for most AP. The NLLS and NLLS-M can produce better results than the former two for most AP. But the solution is not stable. For example, the location error with NLLS for AP-5 is 0.54m but 121.79m for AP-6. The qualitative findings provide a more precise depiction of the strengths and weaknesses inherent in the existing methodologies, thereby offering preliminary results for subsequent research by scholars. Figure 6: The qualitative results of the tested baselines. Table 4 The errors(unit: meter) for AP location determination of existing methods on Euclidean Distance AP C[14] C-W[15] NLLS[12] NNLS-M[16] AP 1 5.62 5.25 0.87 1.29 AP 2 5.94 5.51 9.72 9.73 AP 3 5.57 5.46 1.08 11.3 AP 4 4.02 3.94 1.30 0.89 AP 5 5.12 5.01 0.54 0.26 AP 6 5.93 5.39 121.79 1.24 Mean 5.37 5.09 22.50 4.12 6. Conclusion This paper provides a new dataset, CSUIndoorAPDet, mainly for indoor AP location determination. First, the collection methodology and workflow of the dataset have been detailed, including the system, equipment, and three sequential modules. After that, the elements of the dataset and the corresponding main information are presented. Next, the usage example and comparisons are applied among the existing methods on one of the scenarios in the dataset, which provides the research community with a baseline of the dataset for further comparisons. In addition, the whole process for RSS level recording is automatical and labor-saving to prevent most path loss caused by the human body. Therefore, researchers who major in the related fields can use this dataset to test their newly provided methods, including AP location determination, AP selection and fingerprint dimension reduction, fingerprint-based Wi-Fi localization, and range-based Wi-Fi localization. References [1] J. Santa, B. Ubeda, R. Toledo and A. F. G. Skarmeta, "Monitoring the Position Integrity in Road Transport Localization Based Services," IEEE Conference on Vehicular Technology (VTC), Montreal, QC, Canada, 2006, pp. 1-5, doi: 10.1109/VTCF.2006.575. [2] Y. Long, Y. Gong, Z. Xiao, and Q. Liu, "Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks," IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 5, pp. 2486-2498, May 2017, doi: 10.1109/TGRS.2016.2645610. [3] T. H. Chiang, Z. H. Sun, H. R. Shiu, K. C. J. Lin and Y. C. Tseng, "Magnetic Field-Based Localization in Factories Using Neural Network With Robotic Sampling," in IEEE Sensors Journal, vol. 20, no. 21, pp. 13110-13118, 1 Nov.1, 2020, doi: 10.1109/JSEN.2020.3003404. [4] Yin, C., Ding, S. and Wang, J. "Mobile marketing recommendation method based on user location feedback," Human-centric Computing and Information Sciences, vol. 9, no. 1, 2019, doi: 10.1186/s13673-019-0177-6. [5] A. El-Rabbany, Introduction to GPS: The Global Positioning System. Norwood, MA, USA: Artech House, 2002. [6] M. Zhou, Y. Li, M. J. Tahir, X. Geng, Y. Wang and W. He, "Integrated Statistical Test of Signal Distributions and Access Point Contributions for Wi-Fi Indoor Localization," IEEE Transactions on Vehicular Technology, 70, no. 5, pp. 5057-5070, May 2021, doi: 10.1109/TVT.2021.3076269. [7] N. El-Sheimy and Y. Li, "Indoor navigation: State of the art and future trends," Satellite Navigation, vol. 2, no. 1, p. 7, 2021. [8] B. T. Fang, "Trilateration and extension to Global Positioning System navigation," Journal of Guidance, Control, and Dynamics, vol. 9, no. 6, pp. 715–717,1986, doi: 10.2514/3.20169. [9] N. Singh, S. Choe and R. Punmiya, "Machine Learning Based Indoor Localization Using Wi-Fi RSSI Fingerprints: An Overview," IEEE Access, vol. 9, pp. 127150-127174, 2021, doi: 10.1109/ACCESS.2021.3111083. [10] D. Quezada-Gaibor, J. Torres-Sospedra, J. Nurmi, Y. Koucheryavy and J. Huerta, "Lightweight Wi-Fi Fingerprinting with a Novel RSS Clustering Algorithm," 2020 International Conference Indoor Positioning Indoor Navigation ( IPIN), Lloret de Mar, Spain, 2021, pp. 1-8, doi: 10.1109/IPIN51156.2021.9662612. [11] Henri Nurminen, Marzieh Dashti, Robert Piché, "A Survey on Wireless Transmitter Localization Using Signal Strength Measurements", Wireless Communications and Mobile Computing. 2017, Article ID 2569645, 12 pages, 2017, doi:10.1155/2017/2569645. [12] J. Koo and H. Cha, "Localizing WiFi Access Points Using Signal Strength," in IEEE Commun. Lett., vol. 15, no. 2, pp. 187-189, February 2011, doi: 10.1109/LCOMM.2011.121410.101379. [13] K. N. R. S. V. Prasad and V. K. Bhargava, "RSS Localization Under Gaussian Distributed Path Loss Exponent Model," IEEE Wireless Communications Letters, vol. 10, no. 1, pp. 111-115, Jan. 2021, doi: 10.1109/LWC.2020.3021991. [14] N.Bulusu, J.Heidemann, andD. Estrin, "GPS-less low-cost outdoor localization for very small devices," IEEE Personal Communications, vol.7, no.5, pp. 28–34, 2000. [15] J. Blumenthal, R. Grossmann, F. Golatowski, and D. Timmermann, " IEEE International Symposium on Intelligent Signal Processing," Proc of IEEE Int. Symp. Intelligent Signal Process., Alcala de Henares, Spain, October 2007. [16] Y. Zhuang, Y. Li, H. Lan, Z. Syed and N. El-Sheimy, "Wireless Access Point Localization Using Nonlinear Least Squares and Multi-Level Quality Control," IEEE Wireless Communications Letters, vol. 4, no. 6, pp. 693-696, Dec. 2015, doi: 10.1109/LWC.2015.2483509. [17] W. Hess, D. Kohler, H. Rapp, and D. Andor, "Real-time loop closure in 2D LIDAR SLAM," 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 2016, pp. 1271-1278, doi: 10.1109/ICRA.2016.7487258.