Analysing visitor flow using a Bluetooth positioning system Pieter van den Ham1 , Bert Bredeweg1 and Maartje Raijmakers2 1 Faculty of Science, University of Amsterdam, Amsterdam, The Netherlands 2 Educational Studies, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands {p.e.vandenham@gmail.com, b.bredeweg@uva.nl, m.e.j.raijmakers@vu.nl} This contribution proposes a Bluetooth fingerprinting-based system that can be used to analyse participant movement within a public space [2]. Several classification algorithms, such as Naive Bayes, k-Nearest Neighbors and SVM, are compared to determine which algorithm is the best fit for the system. The data collected by the system is able to provide metrics such as time spent in a location and movement patterns. We conducted two experiments in a science museum, with and without regular visitors, to analyse its performance. Finally, several suggestions are provided on how this system may be improved. Until Bluetooth 4.0 was introduced in 2010, the de-facto standard for indoor localization was a technique known as “Wi-Fi fingerprinting”, an algorithm that used the received signals and their strengths to map the user to a known, pre- recorded location. Bluetooth 4.0 specifies a subsystem known as Bluetooth Low Energy, a protocol that was built specifically for usage in the context of Internet of Things with improved energy usage and lower scan times. Similarly to Wi- Fi fingerprinting, Bluetooth fingerprinting works by observing incoming signals and classifying this signal-vector (fingerprint) to known fingerprints at refer- ence locations. Under ideal conditions (low interference, 1 beacon per 30m2 ), an accuracy of fewer than 2.5 meters can be achieved 90% of the time, which is a significant improvement over Wi-Fi fingerprinting (8.5 meters 95% of the time) [1]. Received Signal Strength Indicator (RSSI) fingerprinting seems to be the most promising technique because of its high theoretical accuracy [1]. RSSI fingerprinting localization systems differ mainly in what classifiers they use to classify RSSI vectors. The approach we followed utilises a machine learning algorithm such that classify(u) yields a location, ideally alongside a probability, that most likely contains the RSSI feature vector u. However, all supervised machine learning algorithms must first be trained using manually-labelled RSSI vectors in order for it to make accurate estimations. Therefore, fingerprinting-based systems must be split into two distinct phases: a training phase, during which a database is constructed of labelled training data to be fed to a machine learning algorithm (classifier), and a “live” phase, during which that trained classifier is used to provide “live” probability estimations for a given RSSI vector. Estimote’s Proximity Beacons were used as the system’s Bluetooth beacons. The packets that are emitted by these beacons are received by a mobile device Copyright c 2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 Pieter van den Ham, Bert Bredeweg and Maartje Raijmakers running Android. A central server acts as a central database that allows mobile devices to upload the packets they receive for analysis. The final and most important concern is building a classifier that can accu- rately translate RSSI-vectors into locations. Scikit Learn (a Machine Learning Python library; https://www.scikit-learn.org/) allows us to build this function- ality using pipelines. A pipeline, in a Scikit Learn context, comprises a series of preprocessing and classification operations, combined into one entity. A pipeline has to be fitted to training data before it can transform unlabelled test data into locations. A diagram representing the pipeline used for RSSI classification can be found in figure 1. When used on a small dataset of 393 samples, it was found that the system yields a 95% accuracy, after accounting for bias with k-fold cross validation (k = 5). Pipeline Impute missing values Feature Min-Max Scaling Feature Union Classifier Location vector Missing Indicator Fig. 1. The RSSI vector classification pipeline. The experiments confirmed that small, close together sections were more difficult to classify than large sections that are relatively isolated, mainly due to Bluetooth’s susceptibility to noise [3]. Furthermore, the system proved to be accurate enough for tracking and analysis purposes and will be used to analyse visitor flow in a science museum. References 1. Faragher, R., Harle, R.: Location fingerprinting with bluetooth low energy beacons. IEEE Journal on Selected Areas in Communications 33(11), 2418–2428 (11 2015). https://doi.org/10.1109/JSAC.2015.2430281 2. van den Ham, P., Bredeweg, B., Raijmakers, M.: Analysing visitor flow using a Blue- tooth positioning system (2019), http://scriptiesonline.uba.uva.nl/scriptie/692291 3. Kouyoumdjieva, S.T., Karlsson, G.: Experimental Evaluation of Pre- cision of a Proximity-based Indoor Positioning System. Tech. rep., https://www.arubanetworks.com