Analysis and Comparison of Publicly Available Databases for Urban Mobility Applications Dina Bousdar Ahmed1 , Estefania Munoz Diaz1 1 Institute of Communications and Navigation, German Aerospace Center (DLR), Wessling, Germany Abstract The challenges of multimodal applications can be addressed with machine learning or artificial intelligence methods, for which a database with large amounts of good quality data and ground truth is crucial. Since generating and publishing such a database is a challenging endeavour, there are only a handful of them available for the community to be used. In this article, we want to analyze three of these databases and compare them. We assess these databases regarding the ground truth that they provide, e.g. labels of the means of transport, and assess how much unlabelled data they publish. We compare these databases regarding the number of hours of data, and how these hours are distributed among different means of transport and activities. Finally, we assess the data in each public database regarding crucial aspects such as the stability of the sampling frequency, the minimum sampling frequency required to observe certain means of transport or activities and, how much lost data these databases have. One of our main conclusions is that accurately labelling data and ensuring a stable sampling frequency are two of the biggest challenges to be addressed when generating a public database for urban mobility. Keywords Machine learning, artificial intelligence, data mining, smartphone, passenger, dataset, vehicle, localization, detection, means of transport, transport mode. 1. Introduction Digital technology is becoming a lever to integrate, aggregate and facilitate multimodality in urban mobility. In this article, we follow Klinger’s definition of multimodality as “the situational combination of transport modes” [1]. Multimodality is a means to achieve cleaner, less polluted cities, as well as to improve the efficiency of passenger flow in cities and foster economic wealth. In order to enable urban multimodality, it is necessary to address key issues like how to efficiently plan a passenger’s trip making use of the different means of transport available locally [2]. In fact, it should be possible to take into account the passenger’s preferences, e.g. using a bike-sharing system instead of a bus, or even take into account a passenger’s physical limitations. Some of the challenges to enable urban mobility applications can be addressed with machine learning or artificial intelligence technologies [3, 4, 5]. In this article, we define machine learning as a set of methods that automatically detect patterns in data, and they are then used the uncover patterns to predict future data, or to perform other kinds of decision-making under uncertainty IPIN 2021 WiP Proceedings, November 29 – December 2, 2021, Lloret de Mar, Spain " dina.bousdarahmed@dlr.de (D. Bousdar Ahmed); estefania.munoz@dlr.de (E. Munoz Diaz)  0000-0003-4681-394X (D. Bousdar Ahmed) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Figure 1: Visualization of the five main elements of the urban mobility ecosystem. [6]. In contrast, we define artificial intelligence as the area of computer science that aims at creating intelligent machines that work and react like humans. In order to enable machine learning or artificial intelligence in urban mobility applications, the first requirement is the availability of large amounts of data to learn the latent patterns behind the various aspects of urban mobility [7]. Therefore, it is key to invest effort in collecting not only data, e.g., accelerometer, turn rate or GNSS data from smartphones, but also the corresponding ground truth, e.g., labels of a passenger’s choice of means of transport during a multimodal trip. The generation of public databases with large amounts of accurately labelled data is an expensive and challenging endeavour. Only a handful of research institutions and companies have the means of taking on that endeavour but the majority of the research institutions or companies would benefit from these public databases. In fact, the generation of public databases that become the benchmark for the development of urban mobility applications will enable the comparison of different systems. Public databases also foster the standardization of the evaluation of systems for urban mobility applications; among them, passenger positioning [8] and the detection of the mean of transport. There exists already some databases with data that could potentially be used for urban mobility applications. Unfortunately, only a handful of these databases are public [9, 10, 11]. Most of these databases are private [12, 13, 10], which hinders the progress in the development of applications for urban mobility. To the best of our knowledge, there is no work that analyzes and compares publicly available urban mobility databases. We believe that such an analysis and comparison is of interest as some databases target similar applications, e.g. the recognition of the transport mode [9, 10, 11]. The community will benefit from a comparison that exemplifies how one can analyze public databases, identify their strengths and weaknesses and, ultimately, make an informed decision regarding which database to use for the development of an urban mobility application. The goal of this article is to analyze and compare publicly available databases that contain data which could potentially be used to develop urban mobility applications. With our work, we aim at providing an example of how databases can be assessed regarding different criteria, e.g., the sensor data in the database, the means of transport or activities considered, the type of ground truth provided or the amount of missing data within the database. More specifically, we have the following objectives: • define urban mobility applications and the criteria to assess the databases for such appli- cations and • select publicly available urban mobility databases analyze them regarding the criteria listed above. • compare the selected public databases and highlight, through our analysis, the key factors that should be taken into account when creating a public database. 2. Overview of Urban Mobility Applications In this section, we provide first our concept of the urban mobility ecosystem to then summarize some of the key applications in urban mobility. Based on these concepts, we present the criteria that we use to select public databases for urban mobility as well as the criteria to analyze and compare these databases. 2.1. The Urban Mobility Ecosystem and Its Applications This subsection provides an overview of urban mobility applications. First, we present our concept of the urban mobility ecosystem, see Figure 1, and its five main elements: • Passenger • Infrastructure • Means of transport • Transport operators • Traffic and land-use regulators The passenger is the user of the transport network and transport hubs and thus the central element of the urban mobility ecosystem. Smartphones, which are associated to passengers, play a key role in the urban mobility ecosystem. Smartphones are already ubiquitous in developed countries and are spreading fast in developing countries. Thanks to smartphones, digital technologies for urban mobility are developed with a passenger-centric approach. The passenger makes use of the infrastructure, which is the built environment that supports the operation of the means of transport. The infrastructure compiles transport hubs and stops as well as tracks, catenaries, lamp posts, roads, sidewalks, bicycle lanes and the like. Means of transport encompass the different vehicles or ways to move from an origin to a destination. We classify means of transport in walking, non-motorized, and motorized vehicles. Both, non-motorized and motorized vehicles, can be divided in shared and private vehicles. Examples of non-motorized vehicles that might be shared are scooters and bicycles. The shared motorized vehicles’ offer is as rich as the private variety, e.g. motorbikes, e-scooters, e-bikes and cars among others. Under the public motorized vehicles, the most common are buses, suburban trains, subway and trams. Transport operators can be classified as well in public and private and they are responsible for designing and making available the means of transport, transport network and the transport hubs to the passengers, taking into account the passengers’ needs. The traffic and land-use regulators are as well part of the urban mobility ecosystem: how land is used and how cities are designed helps to determine the urban infrastructure and what kind of means of transport are most adequate. For instance, single-family homes on large lots increase the need for cars, whereas high-rise apartments with limited parking create the conditions for people to prioritize public means of transport like subways, buses or taxis. Traffic and land-use regulators are also responsible for regulating the operation of the transport operators, thus influencing the design of the means of transport in urban environments. 2.2. Criteria to Select, Analyze and Compare Databases This subsection introduces the criteria we have used to select, analyze and compare public urban mobility databases. We focus on those databases recorded with smartphones and that contain and label one or more of the following features: • different carrying modes, • different means of transport and • different activities. In addition, we consider only publicly urban mobility databases that are downloadable at the time of writing this article. We will analyze the selected databases taking into account the following criteria: • Criterion 1: Labels and positioning ground truth. • Criterion 2: Size of the databases in number of samples and number of users. • Criterion 3: Carrying mode of the smartphone, used means of transport and activities considered. • Criterion 4: Sensors recorded and sampling frequency. 3. Selection and Analysis of Publicly Available Urban Mobility Databases For this article, we have considered the 17 databases listed in Table 1 in [10], the 21 databases listed in Table VIII in [8], and additionally the databases in [14, 11, 15, 16]. Given the criteria in Section 2.2, we have selected three databases out of the 42 mentioned previously for this article: • The Geolife Trajectory Dataset [17, 9], which contains GPS data collected from smart- phones or GNSS receivers from volunteers in Asia. The volunteers travelled throughout different means of transport and carried out different types of activities, e.g. shopping or sports. • The Sussex-Huawei Locomotion Dataset [10, 18], which contains sensor data, e.g. from inertial sensors, magnetometers, GNSS etc., from three volunteers that carried out the tests in the United Kingdom. The trajectories contain data from different means of transport, activities like still, walking or running and the volunteers used four smartphones simultaneously to record data. • The Transport Mode Detection (TMD) Dataset [11, 19], which contains sensor data, e.g. from inertial sensors and magnetometers, collected with a smartphone by volunteers in Italy. The trajectories expand throughout different means of transport, and include data from standing still and walking. The selected databases are not only public but also contain data recorded with smartphones and labels of the carrying modes and the activities. The remaining databases have not been selected because they are either not public or cannot be downloaded, do not contain smartphone data or do not contain data or labels of different activities and means of transport. In the following sections, we analyze each of these databases according to the criteria described above. 3.1. Criterion 1: Labels and Positioning Ground Truth The ground truth included within a database will condition the type of application for which a database is suitable. In this article, we are interested in applications that detect the mean of transport or the activity of the passenger as well as positioning applications. Therefore, we focus on the analysis of the following type of ground truth: labels of the carrying mode, the means of transport and the activity and positioning ground truth, which is summarized in Table 1. Table 1 Details regarding criterion 1: labels and positioning ground truth. We specify the type of label and positioning ground truth provided by each publicly available database. Carrying Means of Activity Ground truth Title mode label transport label label position Geolife GPS Trajectory - yes - - Dataset Sussex-Huawei Locomotion yes yes yes yes (GPS logs) Dataset TMD Dataset - yes yes - While all considered databases include labels of the used means of transport, only the Sussex- Huawei Locomotion Dataset and TMD Dataset include activity labels. Therefore, all databases would enable the development of algorithms that detect the carrying mode whereas only the Sussex-Huawei Locomotion Dataset and the TMD Dataset enable the development of activity recognition methods. Regarding the labels of the smartphone carrying mode, the Sussex-Huawei Locomotion Dataset is the only publicly available database, out of the three considered, that includes labels of the carrying mode. Therefore, this database enables the development of algorithms that Figure 2: Percentages of labels of each means of transport and activity. detect the carrying mode or algorithms that consider how the different smartphone carrying modes effect other applications, e.g. the classification of the means of transport or multimodal positioning, i.e. positioning in different means of transport. The Sussex-Huawei Locomotion Database and the Geolife GPS Trajectory Dataset include positioning information based on global navigation satellite system (GNSS) measurements. However, GNSS measurements cannot be considered ground truth in urban environments due to the presence of multipath or absence of satellite view in urban canyons, indoors or under- ground. Thus, these measurements cannot be used as reference to evaluate the performance of smartphone-based positioning systems. Nonetheless, GNSS measurements can be used to obtain a coarse estimate of the trajectory followed by the users. Figure 2 presents the total percentage of each label in each dataset. The first highlight is that the TMD Dataset does not provide unlabelled data. This fact, however, is not an indication that there was no unlaballed lata but rather that the authors of the database only published the data for which they had a label. The second highlight is that the Sussex-Huawei Locomotion Dataset has at least three times more unlabelled data, i.e. a null label in Figure 2, than labelled data of any other means of transport within their database. The Geolife Trajectory Dataset has approximately 50% of unlabelled data, yet we need to highlight that it is the largest dataset among the three here reviewed. The fact that the Sussex-Huawei Locomotion Dataset and the Geolife Trajectory Dataset contain so much unlabelled data is yet another example of the challenge that data labelling poses in the creation of public databases for urban mobility. For instance, a significant effort is required to appropriately label all the activities or means of transport within the dataset. Otherwise, it is possible to skip batches of data where no label can be provided, and thus making the unlabelled batch unsuitable to be used in any training. Figure 3 shows an example of the acceleration norm of the smartphone introduced in the Figure 3: Example of data labelling of the Sussex-Huawei Locomotion Dataset. Only three labels are shown: null, still and walking. The horizontal green lines indicate unlabelled time frames that could contain relevant information of mean of transport or activity. front pocket of the trousers of the Sussex-Huawei Locomotion Dataset. We plot also three types of labels provided in this dataset: null, for no information, still for a passenger standing still and walking. The horizontal green lines in Figure 3 show clearly that during these time frames the passenger was probably walking or in some mean of transport. However, this period of time is labelled as null by the authors. The reason for the lack of a label may have been a mistake during the data recording which caused the authors to discard this batch of data rather than label it. 3.2. Criterion 2: Size of the Database Table 2 presents the size of the databases regarding different metrics, namely the amount of hours of data, the number of users and the average number of hours per user. The latter has been introduced to ease the comparison of the size of the databases. We can see significant differences among the sizes of the databases, e.g. the number of hours of data in the Geolife GPS Trajectory Dataset is two orders of magnitude greater than the number of hours of data in the TMD dataset. A similar fact occurs between the number of users in the Geolife GPS Trajectory Dataset and the Sussex-Huawei Locomotion Dataset. The TMD dataset follows a strategy of a few number of users, i.e. three, and a large number of hours per user, i.e. 27.8 hours per user, whereas the latter follows a strategy of a great number of users, e.g. 13, and a few hours per user, i.e. 2.4 hours per user. Considering the number of hours per user, we can see that the Geolife GPS Trajectory Dataset collected in average 270.8 hours per user whereas the Sussex-Huawei Locomotion Dataset or the TMD Dataset collected one or two orders of magnitude less of data per user. In fact, if we Table 2 Size of the databases in terms of hours of data, number of users and average hours per user. The values are given per device and not the total number of devices used during the test. Hours of Database No. of users Hours/ user data Geolife GPS Trajectory Dataset 48203 178 270.8 Sussex-Huawei Locomotion 83 3 27.8 Dataset TMD Dataset 31 13 2.4 consider an average of 9 hours of data collection per day, the Geolife GPS Trajectory Dataset collected data for approximately one month for each user. In contrast, the Sussex-Huawei Locomotion Dataset collects data for 3 days and the TMD Dataset for less than a day. An important aspect to highlight is the inconsistency in the amount of hours published by the Sussex-Huawei Locomotion Dataset. The authors claim that they publish approximately 700 hours of data, i.e. taking into account all the data from the four smartphones that each user carries during each day of measurements. However, according to our calculations, the down- loadable version of the Sussex-Huawei Locomotion Dataset contains approximately 332 hours of data, i.e. 83 hours of data per day times 4 smartphones carried by each user. Therefore, we believe there is an inconsistency between the amount of data that the authors claimed to have published and the actual amount of data published. 3.3. Criterion 3: Carrying Mode, Means of Transport and Activities Table 3 Details regarding criterion 3: the carrying mode, the means of transport used and the activities done. Carrying Means of Title Device Activity mode transport GNSS logger Bike, bus, car, taxi, Walking, Geolife GPS Trajectory or - train, airplane, sports, Dataset smartphone others shopping Sussex-Huawei Bag, chest, Bike, bus, car, Still, walking, Smartphone Locomotion Dataset hand, hip subway, train running TMD Dataset Smartphone - Bus, car, train Still, walking Table 3 details the carrying modes of the smartphone, the means of transport and the passen- ger activities considered in each public database. In the Sussex-Huawei Locomotion Dataset the users carry simultaneously four different smartphones in four carrying modes, namely bag, Figure 4: Percentages of data normalized to the total amount of data of the means of transport and activities chosen, i.e. still, walking, car, bus and train. chest, hand and hip. In the Geolife GPS Trajectory Dataset and the TMD Dataset the users carry only one smartphone and the carrying mode is not specified. Regarding the means of transport, all databases consider the bus, the car (or taxi) and the train. The Geolife GPS trajectory dataset considers bicycles and airplains as well and the Sussex- Huawei Locomotion Dataset considers bicycles and other means of transports like subway. None of the selected databases include scooters or trams. Provided that the focus is urban mobility, it seems natural that the two main activities considered are standing still, e.g. while waiting for a bus or train to arrive, and walking. This is the case for both, the Sussex-Huawei locomotion dataset and the TMD Dataset. The Geolife GPS Trajectory Dataset only considers walking and other interesting activities like shopping or sports, however they do not provide labels of the passenger activity activity, as Table 1 indicates. Figure 4 shows the percentage of time of each mean of transport common to both the Sussex- Huawei Locomotion Dataset and the TMD Dataset. For a fair comparison, we have normalized the percentages of the Sussex-Huawei Locomotion Dataset and the Geolife Trajectory Dataset to the duration of the common means of transport, namely car, bus and train, and activities, namely still and walking. In addition, we have considered only the labeled data. According to Figure 4, both databases are approximately balanced regarding the duration of each means of transport. The largest difference is the amount of bus label, where the Sussex- Huawei Locomotion Dataset contains approximately twice the amount of labelled data than the TMD Dataset. 3.4. Criterion 4: Sensors and Sampling Frequency Table 4 lists the sensors used to record each database, which are a key aspect to determine what application can be implemented with each database. The Geolife GPS Trajectory Dataset includes only GPS data collected from either a smartphone or a dedicated GPS receiver. In (a) Still (b) Walking (c) Car (d) Bus (e) Train Figure 5: Normalized frequency distribution of the norm of the acceleration. contrast, the Sussex-Huawei Locomotion Dataset includes the data from all available smartphone sensors. The TMD Dataset includes also all the sensor in a smarthpone, with the exception of the pressure, temperature, altitude and GNSS sensors. Table 4 Features regarding the sensor data included in each public database. Title Fs Sensor data Geolife GPS Trajectory Dataset - GPS Sussex-Huawei Locomotion Inertial, magnetic, orientation, gravity, linear 100 Hz Dataset acceleration, pressure, altitute, temperature, GNSS Inertial, magnetic, orientation, linear acceleration, TMD Dataset 20 Hz pressure One of the main features when recording data is the sampling frequency, since it determines which dynamics are observable through the data. The two dabases we are comparing have a fundamental difference: the sampling frequency of the Sussex-Huawei Locomotion Dataset is one order of magnitude greater than the sampling frequency of the TMD Dataset. This makes the TMD Dataset unsuitable for a frequency analysis of the accelerometer, gyroscope or magnetometer data in the means of trasport that entail motion. In Figure 5, we show the normalized frequency distribution of the five motion modes for the Sussex-Huawei Locomotion dataset. The most variable distribution is the one of walking,where we observe the main frequency component around 1-1.5 Hz, which corresponds approximately to a walking speed of one step per second. The remaining motion modes, including still, have a similar frequency profile. This indicates that while the user is inside a moving vehicle, there is no recognisable pattern in how the vehicle accelerates with the exception of the bus. In the case of the bus, Figure 5 shows two frequency peaks around 45 Hz. The frequency distribution of the norm of the gyroscope shows similar results to ones of Figure 5. Figure 6 shows an example of the sampling time of two trajectories. The Sussex-Huawei Locomotion Dataset presents a stable sampling time of 10 ms, i.e. a sampling frequency of 100 Hz. In contrast, the TMD Dataset presents an unstable sampling time, which does not correspond to the 20 Hz claimed by the authors. This variability in the sampling time hinders Figure 6: Sampling time of the first 500 samples of two example trajectories of the Sussex-Huawei Locomotion Dataset and the TMD Dataset. The samping Sussex-Huawei Locomotion Dataset presents a constant sampling frequency of 10 ms. the use of the TMD Dataset for applications that require a stable sampling time. An alternative is to downsample the TMD Dataset to, e.g. 10 Hz, which may imply the impossibility of observing certain dynamics of some means of transport, which are already unobservable at 20 Hz, see Figure 5. It is crucial to point out that not all the data within the databases is usable. That is, there is missing data usually due to the sampling frequency of the smartphones. As Figure 7 shows, the TMD Dataset contains a large percentage of missing data in each sensor axis when compared to the missing data in the Sussex-Huawei Locomotion Dataset. In fact, in order to generate this plot we downsampled the data in the TMD Dataset to 10 Hz since the original sampling frequency of 20 Hz was not consistent. 4. Conclusion In this article, we have defined our concept of urban mobility, identified its main elements and key applications. We have reviewed three public databases regarding four main criteria: the ground truth of each database, the size of the database, the carrying modes, means of transport and activities considered in each database and the sensors recorded. One of the advantage of our work is that the analysis is transferable to other types of databases. This article allows us to draw four main conclusions. Firstly, it is crucial to develop a robust concept for data collection. In doing so, we avoid basic flaws like missing the recording of labels or ensure that the data collected is consistent regarding sampling frequency. In our work, we have observed that one of the databases actually missed the publications of its labels and that Figure 7: Percentages of NaN values in data in the Sussex-Huawei Locomotion Dataset and the TMD Dataset. another one claims to have published more data than it actually has. Second, the ground truth remains one of the biggest challenges in public databases. The community needs to dedicate more effort into developing systematic concepts to collect ground truth, thus increasing the usability of the database. Third, authors should make sure to clean the data before making it public. This effort would facilitate the use of the database by the community. Finally, the successful development of machine learning and artificial intelligence methods for urban mobility is tightly coupled to the availability of public databases. At this point in time, we believe that urban mobility has barely made its way into big data. References [1] T. Klinger, Moving from monomodality to multimodality? changes in mode choice of new residents, Transportation Research Part A: Policy and Practice 104 (2017) 221–237. doi:10.1016/j.tra.2017.01.008. [2] Eurpean Commission, A concept for sustainable urban mobility plans., https: //eur-lex.europa.eu/resource.html?uri=cellar:82155e82-67ca-11e3-a7e4-01aa75ed71a1. 0011.02/DOC_4&format=PDF, 2013. Accessed on 28th May 2021. [3] R. Abduljabbar, H. Dia, S. Liyanage, S. A. Bagloee, Applications of artificial intelligence in transport: An overview, Sustainability 11 (2019) 189. doi:10.3390/su11010189. [4] S. Richoz, L. Wang, P. Birch, D. Roggen, Transportation mode recognition fusing wearable motion, sound and vision sensors, IEEE Sensors Journal (2020) 1–1. doi:10.1109/jsen. 2020.2987306. [5] Y. Qin, H. Luo, F. Zhao, C. Wang, J. Wang, Y. Zhang, Toward transportation mode recogni- tion using deep convolutional and long short-term memory recurrent neural networks, IEEE Access 7 (2019) 142353–142367. doi:10.1109/access.2019.2944686. [6] K. Murphy, Machine learning : A probabilistic perspective, MIT Press, Cambridge, Mass, 2012. [7] I. H. Witten, Data mining : Practical machine learning tools and techniques, Morgan Kaufmann, Burlington, MA, 2011. [8] D. Bousdar Ahmed, L. E. Diez, E. Munoz Diaz, J. J. Garcia Dominguez, A survey on test and evaluation methodologies of pedestrian localization systems, IEEE Sensors Journal 20 (2020) 479–491. doi:10.1109/jsen.2019.2939592. [9] Y. Zheng, X. Xie, W.-Y. Ma, GeoLife: A collaborative social networking service among user, location and trajectory, IEEE Data(base) Engineering Bulletin (2010). URL: https://www.microsoft.com/en-us/research/publication/ geolife-a-collaborative-social-networking-service-among-user-location-and-trajectory/. [10] H. Gjoreski, M. Ciliberto, L. Wang, F. J. O. Morales, S. Mekki, S. Valentin, D. Roggen, The University of Sussex-Huawei locomotion and transportation dataset for multimodal analytics with mobile devices, IEEE Access 6 (2018) 42592–42604. doi:10.1109/access. 2018.2858933. [11] C. Carpineti, V. Lomonaco, L. Bedogni, M. D. Felice, L. Bononi, Custom dual transportation mode detection by smartphone devices exploiting sensor diversity, in: 2018 IEEE Interna- tional Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), IEEE, 2018. doi:10.1109/percomw.2018.8480119. [12] M.-C. Yu, T. Yu, S.-C. Wang, C.-J. Lin, E. Y. Chang, Big data small footprint: The design of a low-power classifier for detecting transportation modes, PVLDB 7 (2014) 1429–1440. [13] S. Hemminki, P. Nurmi, S. Tarkoma, Accelerometer-based transportation mode detection on smartphones, in: Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems - SenSys '13, ACM Press, 2013. doi:10.1145/2517351.2517367. [14] A. Bayev, I. Chistyakov, A. Derevyankin, I. Gartseev, A. Nikulin, M. Pikhletsky, RuDaCoP: The dataset for smartphone-based intellectual pedestrian navigation, in: 2019 International Conference on Indoor Positioning and Indoor Navigation (IPIN), IEEE, 2019. doi:10.1109/ ipin.2019.8911823. [15] J. Scott, R. Gass, J. Crowcroft, P. Hui, C. Diot, A. Chaintreau, CRAWDAD dataset cam- bridge/haggle (v. 2009-05-29), Downloaded from https://crawdad.org/cambridge/haggle/ 20090529, 2009. doi:10.15783/C70011. [16] N. Akki, New York City bike share dataset, https://www.kaggle.com/akkithetechie/ new-york-city-bike-share-dataset, 2017. [17] Y. Zheng, H. Fu, X. Xie, W.-Y. Ma, Q. Li, Geolife GPS trajectory dataset - User Guide, geolife gps trajectories 1.1 ed., 2011. URL: https://www.microsoft.com/en-us/research/publication/ geolife-gps-trajectory-dataset-user-guide/, geolife GPS trajectories 1.1, https://www. microsoft.com/en-us/research/publication/geolife-gps-trajectory-dataset-user-guide/. [18] L. Wang, H. Gjoreski, M. Ciliberto, S. Mekki, S. Valentin, D. Roggen, Enabling reproducible research in sensor-based transportation mode recognition with the Sussex-Huawei dataset, IEEE Access 7 (2019) 10870–10891. doi:10.1109/access.2019.2890793. [19] C. Carpineti, V. Lomonaco, L. Bedogni, M. D. Felice, L. Bononi, TMD dataset, http://cs. unibo.it/projects/us-tm2017/download.html, 2017.