On feature selection and evaluation of transportation mode prediction strategies Mohammad Etemad Amílcar Soares Institute for Big Data Analytics, Dalhousie University Institute for Big Data Analytics, Dalhousie University Halifax, NS, Canada Halifax, NS, Canada etemad@dal.ca amilcar.soares@dal.ca Stan Matwin∗ Luis Torgo Institute for Big Data Analytics, Dalhousie University Faculty of Computer Science, Dalhousie University Halifax, NS, Canada Halifax, NS, Canada stan@cs.dal.ca ltorgo@dal.ca ABSTRACT for authorities and the public, may reduce the fuel consumption Transportation modes prediction is a fundamental task for de- and commute time, and may provide more pleasant moments for cision making in smart cities and traffic management systems. residents and tourists. Since a trajectory is a collection of geolo- Traffic policies based on trajectory mining can save money and cations captured through time, extracting features that show the time for authorities and the public. It may reduce the fuel con- behavior of a trajectory is of prime importance. The number of sumption, commute time, and more pleasant moments for resi- features that can be generated for trajectory data is significant. dents and tourists. Since the number of features that may be used However, some of these features are more important than others to predict a user transportation mode can be substantial, finding for the transportation mode prediction task. Selecting the best a subset of features that maximizes a performance measure is subset of features not only saves processing time but also may worth investigating. In this work, we explore a wrapper and an increase the performance of the learning algorithm. The features information retrieval methods to find the best subset of trajectory selection problem and the trajectory classification task were se- features for a transportation mode dataset. Our results were com- lected as the focus of this work. The contributions of this paper pared with two related papers that applied deep learning methods. are listed below. The results showed that our work achieved better performance. • We investigated several classifiers using their default pa- Furthermore, two types of cross-validation approaches were in- rameters values and selected the one with the best perfor- vestigated, and the performance results show that the random mance. cross-validation method may provide overestimated results. • Using two distinct feature selection approaches, we in- vestigated the best subset of features for transportation KEYWORDS modes prediction. Trajectory mining, feature selection, cross-validation • After finding the best subset of features for some classifiers, we compare our results with the works of [3] and [6]. The results showed that our approach performed better than 1 INTRODUCTION the others from the literature. Trajectory mining is a very hot topic since positioning devices • Finally, we investigate the differences between the two are now used to track people, vehicles, vessels, natural phenom- methods of cross-validation used by the literature on trans- ena, and animals. It has applications including but not limited to portation mode prediction. The results show that the ran- transportation mode detection [3, 6, 7, 31, 33], fishing detection dom cross-validation method may suggest overestimated [4], tourism [8], vessels monitoring [5], and animal behaviour results in comparison to user-oriented cross-validation. analysis [9]. There are also a number of topics in this field that The rest of this work is structured as follows. The related works need to be investigated further such as high performance trajec- are reviewed in section 2. The basic concepts and definitions are tory classification methods [3, 6, 20, 31, 33], accurate trajectory provided in section 3 and the proposed framework is presented segmentation methods [28, 30, 34], trajectory similarity and clus- in section 4. We provide our experimental results in section 5. tering [10, 17], dealing with trajectory uncertainty [15], active Finally, the conclusions and future works are shown in section 6. learning [29], and semantic trajectories [2, 22, 24]. These topics are highly correlated and solving one of them requires to some extent exploring more than one. 2 RELATED WORKS As one of the trajectory mining applications, transportation Feature engineering is an essential part of building a learning mode prediction is a fundamental task for decision making in algorithm. Some of the algorithms artificially extract features smart cities and traffic management systems. Traffic policies that using representation learning methods; On the other hand, some are designed based on trajectory mining can save money and time studies select a subset from the handcrafted features. Both meth- ods have advantages such as faster learning, less storage space, ∗ Institute for Computer Science, Polish Academy of Sciences, Warsaw and Postcode, performance improvement of learning, and generalized models Poland building [18]. These two methods are different from two perspec- © 2019 Copyright held by the owner/author(s). Published in Proceedings of the tives. First, artificially extracting features generates a new set Published in the Workshop Proceedings of the EDBT/ICDT 2019 Joint Conference of features by learning, while feature selection chooses a subset on CEUR-WS.org, March 26, 2019: Distribution of this paper is permitted under the terms of the Creative Commons of existing handcrafted ones. Second, selecting handcrafted fea- license CC-by-nc-nd 4.0. tures constructs more readable and interpretable models than artificially extracting features [18]. This work focuses on the 3 NOTATIONS AND DEFINITIONS handcrafted feature selection task. Definition 3.1. A trajectory point, li ∈ L, so that li = (x i , yi , ti ), Feature selection methods can be categorized into three gen- where x i is longitude and it varies from 0◦ to ±180◦ , yi is latitude eral groups: filter methods, wrapper methods, and embedded and it varies from 0◦ to ±90◦ , and ti (ti < ti+1 ) is the capturing methods [12]. Filter methods are independent of the learning algo- time of the moving object, and L is the set of all trajectory points. rithm. They select features based on the nature of data regardless of the learning algorithm [18]. On the other hand, wrapper meth- A trajectory point can be assigned by some features that de- ods are based on a kind of search, such as sequential, best first, or scribe different attributes of the moving object with a specific branch and bound, to find the best subset that gives the highest time-stamp and location. The time-stamp and location are two score on a selected learning algorithm [18]. The embedded meth- dimensions that make trajectory point spatio-temporal data with ods apply both filter and wrapper [18]. Feature selection methods two important properties: (i) auto-correlation and (ii) heterogene- can be grouped based on the type of data as well. The feature ity [1]. These features make the conventional cross validation selection methods that use the assumption of i.i.d. (independent less suitable [27]. and identically distributed) are conventional feature selection Definition 3.2. A raw trajectory, or simply a trajectory τ , is methods [18] such as laplacian methods [14] and spectral feature a sequence of trajectory points captured through time, where selection methods [32]. They are not designed to handle hetero- τ = (li , li+1 , .., ln ), li ∈ L, i ≤ n. geneous or auto-correlated data. Some feature selection methods have been introduced to handle heterogeneous data and stream Definition 3.3. A sub-trajectory is one of the consecutive sub- data that most of them working on graph structure such as [11]. sequences of a raw trajectory generated by splitting the raw Conventional feature selection methods are categorized in four trajectory into two or more sub-trajectories. groups: similarity-based methods like laplacian methods[14], In- For example, if we have one split point, k, and τ1 is a raw formation theoretical methods [26], sparse learning methods such trajectory then s 1 = (li , li+1 , ..., lk ) and s 2 = (lk +1 , lk +2 , ..., ln ) as [19], and statistical based methods like chi2 [21]. Similarity- are two sub trajectories generated by τ1 . based feature selection approaches are independent of the learn- ing algorithm, and most of them cannot handle feature redun- Definition 3.4. The process of generating sub-trajectories from dancy or correlation between features[21]. Likewise, statistical a raw trajectory is called segmentation. methods like chi-square cannot handle feature redundancy, and We used a daily segmentation of raw trajectories and then seg- they need some discretization strategies[21]. The statistical meth- mented the data utilizing the transportation modes annotations ods are also not effective in high dimensional space[21]. Since to partition the data. This approach is also used in [6] and [3]. our data is not sparse and sparse learning methods need to over- come the complexity of optimization methods, and they were Definition 3.5. A point feature is a measured value F p , assigned not a candidate for our experiments. On the other hand, infor- to each trajectory points of a sub trajectory s. mation retrieval methods can handle both feature relevance and redundancy[21]. Furthermore, selected features can be gener- F p = (fi , fi+1 , .., fn ) (1) alized for learning tasks. Information gain, which is the core p Notation 1 shows the feature F for sub trajectory s. For example, of Information theoretical methods, assumes that samples are speed can be a point feature since we can calculate the speed independently and identically distributed. Finally, the wrapper of a moving object for each trajectory point. Since we need two method only sees the score of the learning algorithm and tries to trajectory points to calculate speed, we assume the speed of the maximize the score of the learning algorithm. first trajectory point is equal to the speed of the second trajectory The most common evaluation metric reported in the related point. works is the accuracy of the models. Therefore, we use the ac- curacy metric to compare our work with others from literature. Definition 3.6. A trajectory feature is a measured value Ft , Since the data was imbalanced, we reported the f-score as well assigned to a sub trajectory, s. to give equal importance to precision and recall. Despite the fact that most of the related work applied the accuracy metric, it Σfk is calculated using different methods including random cross- Ft = (2) n validation, cross-validation with dividing users, cross-validation Equation 2 shows the feature Ft for sub trajectory s. For ex- with mix users and simple division of the training and test set ample, the speed mean can be a trajectory feature since we can without cross-validation. The latter is a weak method that is used calculate the speed mean of a moving object for a sub trajectory. only in [35]. The random cross-validation or the conventional p The Ft is the notation for all trajectory features that generated cross-validation was applied in [31], [20] , and [3]. [33] mixed the speed training and test set according to users so that 70% of trajectories using point feature p. For example, Ft represents all the of a user goes to the training set and the rest goes to test set. Only trajectory features derived from speed point feature. Moreover, speed [6] performed the cross-validation by dividing users between the Fmean denotes the mean of the trajectory features derived from training and test set. Because trajectory data is a kind of data with the speed point feature. spatial and temporal dimensions, users can also be placed in the same semantic hierarchical structure such as students, worker, 4 THE FRAMEWORK visitors, and teachers, a conventional cross-validation method In this section, the sequence of steps of q framework with eight could provide overestimated results as studied in [27]. steps are explained (Figure 1). The first step groups the trajectory points by Trajectory id to create daily sub-trajectories (segmenta- tion). Sub-trajectories with less than ten trajectory points were discarded to avoid generating low-quality trajectories. Figure 1: The steps of the applied framework to predict transportation modes. Point features including speed, acceleration, bearing, jerk, features and local trajectory features. Global features, like the bearing rate, and the rate of the bearing rate were generated Minimum, Maximum, Mean, Median, and Standard Deviation, in step two. The features speed, acceleration, and bearing were summarize information about the whole trajectory and local tra- first introduced in [34], and jerk was proposed in [3]. The very jectory features, percentiles ( 10, 25, 50, 75, and 90), describe first point feature that we generated was duration. This is the a behavior related to part of a trajectory. The local trajectory time difference between two trajectory points. This feature gives features extracted in this work were the percentiles of every us essential information including some of the segmentation point feature. Five different global trajectory features were used position points, loss signal points, and is useful in calculating in the models tested in this work. In summary, we computed point features such as speed, and acceleration. The distance was 70 trajectory features ( 10 statistical measures including five calculated using the haversine formula. Having duration and global and five local features calculated for 7 point features) distance as two point features, we calculate speed, acceleration for each sample trajectory. In Step 4, two feature selection ap- and jerk using Equation 3, 4 , and 5 respectively. A function to proaches were performed, wrapper search and information re- calculate the bearing (B) between two consecutive points was trieval feature importance. According to the best accuracy results also implemented and is detailed in Equation 6, where ϕ i , λi is for development set , a subset of top 19 features was selected in the start point, ϕ i+1 , λi+1 the end point. step 5. The code implementation of all these steps is available at Distancei https://github.com/metemaad/TrajLib. Si = (3) In step 6, the framework deals with noise in the data option- Durationi ally. This means that we ran the experiments with and without (Si+1 − Si ) this step. Finally, we normalized the features (step 7) using the Ai+1 = (4) ∆t Min-Max normalization method to avoid saturation, since this (Ai+1 − Ai ) method preserves the relationship between the values to trans- Ji+1 = (5) form features to the same range and improves the quality of the ∆t classification process [13]. Another possible method is Z normal- Bi+1 = atan2(sin λi+1 − λi cos ϕ i+1 , ization; however, finding the best normalization method was out (6) cos ϕ i sin ϕ i+1 − sin ϕ i cos ϕ i+1 cos λi+1 − λi ) of the scope of this work. Two new features were introduced in [7], named bearing rate, and the rate of the bearing rate. Applying equation 7, we com- puted the bearing rate. 5 EXPERIMENTS In this section, we detail the four experiments performed in (Bi+1 − Bi ) this work. In this work, we used the GeoLife dataset [34]. This Br at e(i+1) = (7) ∆t dataset has 5,504,363 GPS records collected by 69 users, and is la- Bi and Bi+1 are the bearing point feature values in points beled with eleven transportation modes: taxi (4.41%); car (9.40%); i and i + 1. ∆t is the time difference. The rate of the bearing train (10.19%); subway (5.68%); walk (29.35%); airplane (0.16%); rate point feature is computed using equation 8. Since extensive boat (0.06%); bike (17.34%); run (0.03%); motorcycle (0.006%); and calculations are done with trajectory points, it was necessary an bus (23.33%). Two primary sources of uncertainty of the Geolife efficient way to calculate all these equations for each trajectory. dataset are device and human error. This inaccuracy can be cate- Therefore, the code was written in a vectorized manner in Python gorized in two major groups, systematic errors and random errors programming language which is faster than other online available [16]. The systematic error occurs when the recording device can- python versions of the bearing calculation. It can be possible to not find enough satellites to provide precise data. The random gain more performance using other languages like C/C++. error can happen because of atmospheric and ionospheric effects. (Br at e(i+1) − Br at e(i) ) Furthermore, the data annotation process has been done after Br r at e(i+1) = (8) each tracking as [34] explained in the Geolife dataset documen- ∆t After calculating the point features for each trajectory, the tra- tation. As humans, we are all subject to fail in providing precise jectory features were extracted in step three. Trajectory features information; it is possible that some users forget to annotate the were divided into two different types including global trajectory trajectory when they switch from one transportation mode to show that the random forest performs better than other models (µ accur acy = 0.8189, σ = 0.10%) on the development set. The results of cross validation f-score, presented in figure 3, show that the random forest performs better than other models (µ f 1 = 0.8179, σ = 0.12%) on the development set. The second best model was XGBoost (µ accur acy = 0.8245, σ = 0.11%). The XGBoost was ranked the second because a paired T-Test indicated that the random forest classifier results were not statistically significantly higher than the XGBoost classifier results, but since it has a higher variance than random forest, we decided to rank random forest as first. In the other hand, paired t-tests indicated that the random forest classifier results were statistically significantly higher than the SVM, decision tree, Neural Network, and Adaboost classifiers results. Figure 2: Among the trained classifiers random forest achieved the highest mean accuracy. another. For example, the changes in the speed pattern might be a representation of human error. Moreover, we divide data into two folds: development set and validation set. These two folds divided in a way that each user can be either in development set or validation set. Therefore, there is no overlap in terms of users. This division is applied for user-oriented cross validation. We divide the validation fold to five folds to do the cross validation and using this fold to compare our results with related work. The best classifier using their default input parameters (Sec- tion 5.1) was found in our first experiment (check scikit-learn documentation1 for the classifiers default parameters values). Tuning the classifiers parameters may lead to find a better classi- Figure 3: Among the trained classifiers random forest fier, but doing a grid search is expensive and does not change the achieved the highest mean F-score. framework. In our second experiment (Section 5.2), the wrapper and information theoretical methods are used to search the best subset of our 70 features for the transportation modes prediction 5.2 Feature selection using wrapper and task. The third experiment (Section 5.3) is a comparison between information theoretical methods [6] and [3] and our implementation. In the last experiment (Sec- The second experiment aims to select the best features for trans- tion 5.4), the type of cross validation was investigated. portation modes prediction task for the Geolife dataset. In order to avoid using non-parametric statistical tests, we We select one method from filter category which is informa- repeat the experiments with different seeds and collect more tion theoretical method to see the effect of the heterogeneity of than 30 samples for performing the statistical tests. According to data on feature selection method. Another method was selected central limit theorem, we can assume these samples follow the from wrapper category which is the full search wrapper method. normal distribution. Therefore, t-test results are reported. Filter methods suffer from having i.i.d assumption, while wrap- per methods do not. Therefore, comparing these two methods 5.1 Classifier selection shows the importance of taking into account the heterogeneity In this experiment, we investigated among six classifiers, which of features of trajectory data. classifier is the best. The experiment settings use to conventional We selected the wrapper feature selection method because cross-validation and to perform the transportation mode predic- it can be used with any classifier. Using this approach, we first tion task showed on [3]. XGBoost, SVM, decision tree, random defined an empty set for selected features. Then, we searched forest, neural network, and adaboost are six classifiers that were all the trajectory features one by one to find the best feature to applied in the reviewed literature [7, 31, 33, 35].2 The dataset is append to the selected feature set. The maximum accuracy score filtered based on labels that have been applied in [3] (e.g., walk- was the metric for selecting the best feature to append to selected ing, train, bus, bike, driving) and no noise removal method was features. After, we removed the selected feature from the set of applied. The classifiers mentioned above were trained, and the features and repeated the search for union of selected features accuracy metric was calculated using random cross-validation and next candidate feature in the feature set. We selected the similar to [20], [31], and [3]. This experiment was repeated for labels applied in [6] and the same cross-validation technique. eight randomly selected seeds (8, 65, 44, 7, 99, 654, 127, 653) to The results are shown in figure 4. The results of this method generate more than 30 result samples that make safe to assume suggest that the top 19 features get the highest accuracy. There- a normal distribution for results based on central limit theorem. fore, we selected this subset as the best subset for classification The results of cross validation accuracy, presented in figure 2, purposes using the random forest algorithm. 1 https://scikit-learn.org/stable/supervised_learning.html#supervised-learning Information theoretical feature selection is one of the meth- 2 available on https://github.com/metemaad/trajpred ods widely used to select essential features. Random forest is a Figure 4: Accuracy of random forest classifier for incremental appending features ranked by random forest feature im- portance. classifier that has embedded feature selection using information task by human because of using large samples and scrutinizing theoretical metrics. We calculated the feature importance using the data to fine clean it. However, “we cannot do better than random forest. Then, each feature is appended to the selected bayes error unless we are overfitting". [23]. Having noise in GPS feature set and calculating the accuracy score for random forest data and human error, as we discussed, suggest that the avoidable classifier. The user-oriented cross-validation was used here, and bias is more than five percent. This ground truth was our base to the target labels are similar to [6]. Figure 5 shows the results of exclude papers that reported more than 95% of accuracy. cross-validation for appending features with respect to the impor- Thus, we compare our accuracy per segment results, repeated tance rank suggested by the random forest. We chose the wrapper for 8 different seeds, against [6] mean accuracy, 67.9%. A one- approach results since it produces statistically significant higher sample T-test indicated that our accuracy results (70.97%) are accuracy score. higher and statistically significantly better than [6]’s results (67.9%), p=0.0182. The label set for [3]’s research is walking, train, bus, bike, 5.3 Comparison with the related work taxi, subway, and car so that the taxi and car are merged and In this third experiment, we filtered transportation modes which called driving. Moreover, subway and train merged and called have been used by [6] for evaluation. We divided the validation the train class. We filtered the Geolife data to get the same sub- fold into the training and test folds in a way that each user sets as [3] reported based on that. Then, we randomly selected can appear only either in the training or test fold. The top 19 80% of the data as the training and the rest as test set, we ap- features were selected to be used in this experiment which is plied five-fold cross-validation and repeated this for 8 different the best features subset mentioned in section 5.2. Therefore, we seeds. The best subset of features was the same as the previous approximately divided 80% of the data as training and 20% of the experiment (Section 5.2). Running the random forest classifier data as the test set. with 50 estimators, using SKlearn implementation [25], results We selected [6] because this is the only paper that divided on a mean accuracy of 87.16% for the five-fold cross-validation. the dataset in a way that isolated users in training and test set. A one-sample T-test indicated that our accuracy results (87.16%) Moreover, This research applied the handcrafted features and are higher and statistically significantly better than [3]’s results interpretable classifiers, while [3] did not isolated users and used (84.8%), p=2.27e-12. representation learning features. Therefore, these two research We avoided using the noise removal method in the above are in the two ends and spectrum and comparing our results with experiment because we believe we do not have access to labels of theirs and may provide insights for validating our results. the test dataset and using this method only increases our accuracy We assume the bayes error is the minimum possible error unrealistically. and human error is near to the bayes error [23]. Avoidable bias is defined as the difference between the training error and the human error. Achieving the performance near to the human 5.4 Effects of types of cross-validation performance in each task is the primary objective of the research. To visualize the effect of type of cross-validation on transporta- The recent advancements in deep learning lead to achieving some tion modes prediction task, we set up a controlled experiment. performance level even more than the performance of doing the We used the same classifiers and same features to calculate the Figure 5: Accuracy of random forest classifier for incremental appending best features cross-validation accuracy on the whole dataset. Only the type of cross-validation is different in this experiment, one is random, and another is user-oriented cross-validation. Figure 6 shows that there is a considerable difference between the cross-validation accuracy results of user-oriented cross-validation and random cross-validation. Figure 7: The F-score cross-validation results for user ori- ented cross-validation and random cross-validation 6 CONCLUSIONS In this work, we reviewed some recent transportation modes prediction methods and feature selection methods. We proposed Figure 6: The accuracy cross-validation results for user ori- a framework for transportation modes prediction and four experi- ented cross-validation and random cross-validation ments were conducted to cover different aspects of transportation modes prediction. First, the performance of six recently used classifiers for the Furthermore, figure 7 shows that there is a considerable differ- transportation modes prediction was evaluated. The results showed ence between the cross-validation f-score results of user-oriented that the random forest classifier performs the best among all the cross-validation and random cross-validation. evaluated classifiers. The SVM was the worst classifier, and the These results indicate that random cross-validation provides accuracy result of XGBoost was competitive with the random overestimated accuracy and f-score results. Since the correla- forest classifier. tion between user-oriented cross-validation results is less than In the second experiment, the effect of features using two random cross-validation, proposing a specific cross-validation different approaches, the wrapper method and information theo- method for evaluating the transportation mode prediction is a retical method were evaluated. The wrapper method shows that topic that needs attention. we can achieve the highest accuracy using the top 19 features. speed Both approaches suggest that the Fp90 (the percentile 90 of [15] Sungsoon Hwang, Cynthia VanDeMark, Navdeep Dhatt, Sai V Yalla, and Ryan T Crews. 2018. Segmenting human trajectory data by movement states the speed as defined in section 3) is the most essential feature while addressing signal loss and signal noise. International Journal of Geo- among all 70 introduced features. This feature is robust to noise graphical Information Science (2018), 1–22. since the outlier values do not contribute to the calculation of [16] Jungwook Jun, Randall Guensler, and Jennifer Ogle. 2006. Smoothing methods to minimize impact of global positioning system random error on travel percentile 90. distance, speed, and acceleration profile estimates. Transportation Research In the third experiment, the best model was compared with Record: Journal of the TRB 1, 1972 (2006), 141–150. the results showed in [6] and [3]. The results show that our [17] Hye-Young Kang, Joon-Seok Kim, and Ki-Joune Li. 2009. Similarity measures for trajectory of moving objects in cellular space. In SIGAPP09. 1325–1330. suggested model achieved a higher accuracy. Our applied features [18] Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, are readable and interpretable in comparison to [6] and our model Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. CSUR 50, 6 (2017), 94. has less computational cost. [19] Zechao Li, Yi Yang, Jing Liu, Xiaofang Zhou, Hanqing Lu, et al. 2012. Un- Finally, we investigate the effects of user-oriented cross-validation supervised feature selection using nonnegative spectral analysis.. In AAAI, and random cross-validation in the last experiments. The results Vol. 2. [20] Hongbin Liu and Ickjai Lee. 2017. End-to-end trajectory transportation mode showed that random cross-validation provides overestimated classification using Bi-LSTM recurrent neural network. In Intelligent Systems results in terms of the analyzed performance measures. and Knowledge Engineering (ISKE), 2017 12th International Conference on. IEEE, We intend to extend this work in many directions. The spa- 1–5. [21] Huan Liu and Rudy Setiono. 1995. Chi2: Feature selection and discretization tiotemporal characteristic of trajectory data is not taken into of numeric attributes. In Tools with artificial intelligence, 1995. proceedings., account in most of the works from literature (e.g. autocorrelation seventh international conference on. IEEE, 388–391. [22] B. N. Moreno, A. Soares Júnior, V. C. Times, P. Tedesco, and Stan Matwin. 2014. and heterogeneity). Fine tuning the classification models with Weka-SAT: A Hierarchical Context-Based Inference Engine to Enrich Trajec- grid search and automatic (e.g. Genetic Algorithms, Racing algo- tories with Semantics. In Advances in Artificial Intelligence. Springer Interna- rithms, and meta-learning) methods. We also intend to deeply tional Publishing, Cham, 333–338. https://doi.org/10.1007/978-3-319-06483-3_ 34 investigate the effects of cross-validation and other strategies like [23] Andrew Ng. 2016. Nuts and bolts of building AI applications using Deep holdout in trajectory data. Finally, space and time dependencies Learning. NIPS. can also be explored to tailor features for transportation means [24] Christine Parent, Stefano Spaccapietra, Chiara Renso, Gennady Andrienko, Natalia Andrienko, Vania Bogorny, Maria Luisa Damiani, Aris Gkoulalas- prediction. Divanis, Jose Macedo, Nikos Pelekis, Yannis Theodoridis, and Zhixian Yan. 2013. Semantic Trajectories Modeling and Analysis. ACM Comput. Surv. 45, 4, Article 42 (Aug. 2013), 32 pages. ACKNOWLEDGMENTS [25] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. The authors would like to thank NSERC (Natural Sciences and Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Engineering Research Council of Canada) for financial support. Machine Learning in Python. MLR (2011). [26] Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min- REFERENCES redundancy. IEEE Transactions on pattern analysis and machine intelligence 27, [1] Gowtham Atluri, Anuj Karpatne, and Vipin Kumar. 2017. Spatio-Temporal 8 (2005), 1226–1238. Data Mining: A Survey of Problems and Methods. arXiv arXiv:1711.04710 [27] David R Roberts, Volker Bahn, Simone Ciuti, Mark S Boyce, Jane Elith, Gu- (2017). rutzeta Guillera-Arroita, Severin Hauenstein, José J Lahoz-Monfort, Boris [2] Vania Bogorny, Chiara Renso, Artur Ribeiro de Aquino, Fernando de Schröder, Wilfried Thuiller, et al. 2017. Cross-validation strategies for data Lucca Siqueira, and Luis Otavio Alvares. 2014. Constant–a conceptual data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40, model for semantic trajectories of moving objects. Transactions in GIS 18, 1 8 (2017), 913–929. (2014), 66–88. [28] A. Soares Júnior, B. N. Moreno, V. C. Times, S. Matwin, and L. A. F. Cabral. [3] Sina Dabiri and Kevin Heaslip. 2018. Inferring transportation modes from GPS 2015. GRASP-UTS: an algorithm for unsupervised trajectory segmentation. trajectories using a convolutional neural network. Transportation Research International Journal of Geographical Information Science 29, 1 (2015), 46–68. Part C: Emerging Technologies 86 (2018), 360–371. [29] A. Soares Júnior, C. Renso, and S. Matwin. 2017. ANALYTiC: An Active [4] Erico N de Souza, Kristina Boerder, Stan Matwin, and Boris Worm. 2016. Learning System for Trajectory Classification. IEEE Computer Graphics and Improving fishing pattern detection from satellite AIS using data mining and Applications 37, 5 (2017), 28–39. https://doi.org/10.1109/MCG.2017.3621221 machine learning. PloS one 11, 7 (2016), e0158248. [30] A. Soares Júnior, V. Cesario Times, C. Renso, S. Matwin, and L. A. F. Cabral. [5] Renata Dividino, Amilcar Soares, Stan Matwin, Anthony W Isenor, Sean 2018. A Semi-Supervised Approach for the Semantic Segmentation of Trajec- Webb, and Matthew Brousseau. 2018. Semantic Integration of Real-Time tories. In 2018 19th IEEE International Conference on Mobile Data Management Heterogeneous Data Streams for Ocean-related Decision Making. In Big Data (MDM). 145–154. and Artificial Intelligence for Military Decision Making. STO. https://doi.org/ [31] Xiao. 2017. Identifying Different Transportation Modes from Trajectory Data 10.14339/STO-MP-IST-160-S1-3-PDF Using Tree-Based Ensemble Classifiers. ISPRS 6, 2 (2017), 57. [6] Yuki Endo, Hiroyuki Toda, Kyosuke Nishida, and Akihisa Kawanobe. 2016. [32] Zheng Zhao and Huan Liu. 2007. Spectral feature selection for supervised Deep feature extraction from trajectories for transportation mode estimation. and unsupervised learning. In Proceedings of the 24th international conference In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, on Machine learning. ACM, 1151–1157. 54–66. [33] Yu Zheng, Yukun Chen, Quannan Li, Xing Xie, and Wei-Ying Ma. 2010. Un- [7] Mohammad Etemad, Amílcar Soares Júnior, and Stan Matwin. 2018. Predicting derstanding transportation modes based on GPS data for web applications. Transportation Modes of GPS Trajectories using Feature Engineering and TWEB 4, 1 (2010), 1. Noise Removal. In Advances in AI: 31st Canadian Conf. on AI, Canadian AI [34] Yu Zheng, Quannan Li, Yukun Chen, Xing Xie, and Wei-Ying Ma. 2008. Un- 2018, Toronto, ON, CA, Proc. 31. Springer, 259–264. derstanding mobility based on GPS data. In UbiComp 10th. ACM, 312–321. [8] Shanshan Feng, Gao Cong, Bo An, and Yeow Meng Chee. 2017. POI2Vec: [35] Qiuhui Zhu, Min Zhu, Mingzhao Li, Min Fu, Zhibiao Huang, Qihong Gan, Geographical Latent Representation for Predicting Future Visitors.. In AAAI. and Zhenghao Zhou. 2018. Transportation modes behaviour analysis based [9] Sabrina Fossette, Victoria J Hobson, Charlotte Girard, Beatriz Calmettes, on raw GPS dataset. International Journal of Embedded Systems 10, 2 (2018), Philippe Gaspar, Jean-Yves Georges, and Graeme C Hays. 2010. Spatio- 126–136. temporal foraging patterns of a giant zooplanktivore, the leatherback turtle. Journal of Marine systems 81, 3 (2010), 225–234. [10] Andre Salvaro Furtado, Laercio Lima Pilla, and Vania Bogorny. 2018. A branch and bound strategy for Fast Trajectory Similarity Measuring. Data Knowledge Engineering 115 (2018), 16 – 31. https://doi.org/10.1016/j.datak.2018.01.003 [11] Quanquan Gu and Jiawei Han. 2011. Towards feature selection in network. In Proceedings of the 20th ACM ICIKM. ACM, 1175–1184. [12] Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. Journal of ML research 3, Mar (2003), 1157–1182. [13] Jiawei Han, Jian Pei, and Micheline Kamber. 2011. Data mining: concepts and techniques. Elsevier. [14] X He, D Cai, and P Niyogi. 2005. Laplacian Score for Feature Selection, Advances in Nerual Information Processing Systems. (2005).