Analysis of delay patterns and correlations in railway traffic data Roland Krisztián Szabó, Tomáš Horváth, and Ádám Tarcsi Eötvös Loránd University, Faculty of Informatics, Budapest, Pázmány Péter stny. 1/C., 1117, rolandszabo@inf.elte.hu tomas.horvath@inf.elte.hu ade@inf.elte.hu Abstract: Traffic itself can be a huge challenge for most Table 1: Details of a train entry in a snapshot commuters regardless of the transportation method of their choice. For example, it is inevitable to experience delays and congestion during rush hours. All commute methods Field name Example have their own specific characteristics when it comes to Date "2019.10.29 20:09:38" delays - cars and buses suffer from traffic jams and sim- Elvira ID "5614115_191029" ilar principles apply to railways as well. However, the Operator "MAV" causes of railway delays are not that straightforward and Line "40" they need further investigation. According to our personal Train number "55808" experiences most passengers are not aware of the reasons Relation "Budapest-Keleti - Pécs" behind train delays even though they are usually encoun- Latitude 46.26418 tered multiple times a day. In this paper I will present pos- Longitude 18.10566 sible answers based on the data collected from the publicly Delay 5 available APIs of Hungarian State Railways over the past 1.5 years. 1 Datasets at all. Should the pandemic be over its effects could be analyzed later on but currently it is out of scope of this The idea of the delay analysis and prediction originates paper. from paper [1] where a simpler version of this concept has A snapshot of the map contains the following informa- been used as a module in a smart alarm clock application. tion about each of the trains that were present at the time During the development of the application multiple data the snapshot was taken (Table 1). sources were investigated, some of which turned out to be unusable. In this section the details of the selected data 1.2 Weather sources will be discussed. In addition to the traffic data we also collected the cor- 1.1 Traffic responding weather data for every train, because we sus- pect that weather has an influence on the delays as well. I found that the most reliable publicly available data source It was not easy to find a free provider which is capable of for traffic is the official map of Hungarian State Railways handling the necessary amount of requests, but after many [2], where all trains can be tracked in real-time. I have trials we decided to use OpenWeatherMap [3]. Its free created a small automated script that runs on a virtual pri- tier gives access to 60 location-based weather requests per vate server and takes a snapshot of the map approximately minute, which is still not enough for every individual train, every minute and stores the result in a JSON file. but can be sufficient to place virtual weather stations all Traffic data have been being collected since January over Hungary with a resolution of approximately 35.5 km. 2019, which means there are roughly 1 year and 6 months of available information (approx. 130 million records). Definition 1. Virtual weather station. A virtual weather Due to the COVID-19 outbreak a data freeze was applied station is a GPS position which can be queried for up-to- at the end of March 2020 because of the extraordinary cir- date local weather information. cumstances that affect transportation all over the world. Hungarian State Railways canceled lots of trains and only 30% percent of a train’s capacity can be used in order to Calculating the coordinates of the virtual weather sta- prevent the spread of the infectious disease. This new sit- tions The first task is to distribute the available 60 slots uation significantly alters the operation of the railway sys- uniformly such that every train can be assigned to the clos- tem which would have introduced a lot of noise to the ex- est virtual weather station. Finding an exact solution to isting dataset and it might not be relevant in a few months the problem would have been infeasible, therefore we de- Copyright c 2020 for this paper by its authors. Use permitted un- cided to develop an approximation algorithm for which we der Creative Commons License Attribution 4.0 International (CC BY used the GeoNames geographical database [4] which con- 4.0). tains POIs in Hungary and is available for download free of charge under the Creative Commons Attribution 4.0 li- Figure 1: Reconstructed railway network of Hungary cense. The algorithm (Algorithm 1) uses a k-d tree which is a space-partitioning data structure that allows fast nearest neighbor searches. [5] The k-d tree is used to place 60 vir- tual weather stations on the map as follows: in each step the most populated POI is selected and then its neighbor- ing POIs are eliminated in the given radius. It results in an approximately uniform placement of virtual weather stations and they are located at densely populated areas where accurate weather information benefits more people. Algorithm 1 Approximation algorithm for virtual weather station placement Funct getVirtualWeatherStationPositions(pois, radius) 1: pois ← pois.sort(”population”, ”desc”) 2: kdt ← kdTree < POI > (”haversine”) Definition 2. Conflicting trains. A set of trains is said to 3: for poi ∈ pois do be in conflict when their route has common representative 4: nn ← kdt.searchRadius(poi, radius) points. 5: if nn = 0/ then The algorithm (Algorithm 2) determines the set of 6: kdt.add(poi) representative points which are within a given distance 7: end if radius to a specific representative point rp and returns 8: end for the set of trains that travel through those points without 9: return kdt taking the temporal dimension into consideration, because we only use the intersection of conflicting trains with the currently traveling trains. 2 Analysis Algorithm 2 Algorithm for determining conflicting trains 2.1 Reconstruction of the railway network Funct getConflictingTrains(allRps, rp, radius) 1: kdt ← kdTree < RP > (”haversine”, allRps) In the traffic dataset there are millions of recorded GPS co- 2: nn ← kdt.searchRadius(rp, radius) ordinates and the majority of them can be safely discarded n∈nn {n.trainId} S 3: return after the necessary information have been extracted. For this task we used the Representative point extraction and updating algorithm by Zhongyi Ni et al. [6] which is able to calculate the most significant points along a route and is 2.3 Association rules also capable of refining these points as new data becomes available due to its online nature. The task is to determine We realized that the grouping of trains can be considered the least amount of points (called the representative points) as a frequent itemset mining problem, therefore we used that can accurately represent such a route. the Apriori algorithm [7] for itemset mining and associa- The above-mentioned algorithm can determine the tion rule learning. points describing a route (Figure 1) with an arbitrary res- Definition 3. Delayed train. A train is officially consid- olution. However, the recorded GPS trajectories are noisy ered to be delayed when its delay is greater than or equal and may contain significantly misplaced outliers. The to 5 minutes. representative point extraction algorithm is able to prop- erly handle noise, but it also creates new representative The algorithm requires transactions, which can be con- points in case of outliers, therefore a support-based post- structed based on the snapshots of the map. For each snap- processing step is needed, which removes representative shot a transaction is made based on the set of conflicting points that are encountered rarely. delayed trains (Definition 2) in the given snapshot. Association rules were generated for departure delays 2.2 Conflicting trains (Table 2), for which the snapshots taken upon the sched- uled departure are used. The meaning of a rule is that if As a dimensionality reduction method it is beneficial to the set of antecedent trains are delayed then the consequent obtain the set of trains that might affect the delay of an- train is likely to depart late with the given metrics. other train. It can be also used to model delay-chain prop- Low support values are due to the fact that train 2749 agation. is only included in a transaction when it is delayed. The Table 2: A subset of association rules generated for train Table 3: A subset of sequential rules generated for train 2749 2749 Antecedents Consequent Supp. Conf. Antecedents Consequent Supp. Conf. 2879 2749 0.63 0.97 2879,2739 2749 0.21 0.81 7049 2749 0.61 0.97 2879,7039,2739 2749 0.19 0.84 2669,2879 2749 0.60 0.97 2879,2859,2739 2749 0.17 0.82 2669,6099 2749 0.61 0.90 2879,7039,2859,2739 2749 0.16 0.84 2669 2749 0.67 0.90 2879,700,7039 2749 0.15 0.80 6099 2749 0.63 0.87 2879,700,2859 2749 0.15 0.83 2649 2749 0.61 0.86 2879,6299,2739 2749 0.14 0.87 7009 2749 0.61 0.84 2879,7039,6299,2739 2749 0.14 0.89 2879,700,2739 2749 0.14 0.83 700,7039,2859,2879 2749 0.13 0.85 2879,2740,2739 2749 0.12 0.93 Definition 4. Average delay. The average of the trains’ 2879,2740,2859 2749 0.12 0.84 average delays along their route on a given day. Definition 5. Average minimum (maximum) delay. The average of the trains’ minimum (maximum) delays along their route on a given day. frequent itemset containing train 2749 has a support value of 0.36 which means the train departs late roughly 36% of the time. Month It turned out that summer is the only specific sea- son which has a peak in the average delays followed by autumn (Figure 2). This effect might be caused by main- 2.4 Sequential rules tenance works, but there is no available historical mainte- Besides the traditional association rule mining we can also nance data to confirm this theory. It’s likely not caused by consider consecutive snapshots of a specific train on a the number of passengers since there is no school in Hun- given day, which takes the temporal information into con- gary during the summer, which significantly reduces the sideration as well (Table 3). Sequential pattern mining is number of passengers in the rush hours. Higher tempera- almost the same as association rule mining, but instead of tures also seem to have an effect on the delays. working directly with a transaction we consider consecu- tive transactions recorded in time. Figure 2: Average delays grouped by month Sequential rules can be also mined for the departure de- lay, but they are more meaningful if we mine them along the entire route of the train. The rule A =⇒ B means that when the trains in A are delayed then train B will also be- come delayed in the future. In case of association rules we talked about trains that are usually delayed together, but now we have an additional temporal dimension. In order to test this method we used the SPMF open- source data mining library [8] with the RuleGrowth algo- rithm [9]. The support values are much higher in this case, because rules are mined along the entire route of the train. The support value of the frequent itemset containing train 2749 is 0.71 which means even though the train departed late only 36% of the time it got delayed 71% of the time during its trip. 2.5 Other factors Day of the week By looking at the average of all trains we In addition to the delay propagation there might be other can claim that Monday and Friday have the largest delays factors that contribute to the delay of trains, like weather on average, while weekends have somewhat lower average and temporality. In this section some of these factors will delays (Figure 3). It would be nice to have a dataset related be analyzed with possible explanations and conclusions. to the number of passengers because the larger amount of passengers may cause delay peaks at the beginning and marked as delay peaks (Figure 5). It can be concluded that at the end of the workweek. The number of passengers as the number of passengers and the density of the sched- may also have a correlation with the lower delays during ule increase, the average delay increases as well. Accord- the weekend, but we suspect that it is likely caused by the ing to the research, most relations have this pattern. sparser schedule, which effectively reduces delay propa- Two other peaks can be observed between 23:00 and gation. 01:00. In order to understand them domain knowledge is needed. The reason behind the existence of the peaks is Figure 3: Average delays grouped by day of the week that only a very small number of train travels by that time in the country (sometimes even less than 10), and when some of them are delayed, it causes a huge impact on the average. Figure 5: Average delays grouped by hour Holidays Holidays do not seem to have a significant effect on the average delays (Figure 4). The peaks were mostly predictable according to the previous researches - Pente- cost Monday and Saint Stephen’s Day have slightly higher average delays but they are both in the Summer, which has the highest average delay among the seasons and Good Friday is a Friday, which has above average delay if we compare it to the other days of the week. As a conclusion, Temperature The chart shows that the average delays in- events do not seem to cause extraordinary delays, because crease as temperature tends to either -10 or +30 Celsius they can be planned ahead. degrees (Figure 6). Due to the distribution of trains, the ends of the chart are noisy, but the trendline can be easily seen. Figure 4: Average delays grouped by holidays in 2019 Figure 6: Average delays grouped by temperature Time of the day By looking at the chart containing the delays grouped by hours, the rush hours can be clearly Weather type As far as the type of weather is concerned, adjusted by its current delay. This estimation is not so re- precipitation usually increases the delays (Figure 7). The liable on the long term, but it can give you an idea about most troublesome types are related to snow in the winter the scale of the expected delay under the current circum- and unexpected thunderstorms in the summer. Nearly all stances. rain types have higher average delays than clear sky. Another problem is that the forecast lacks a very impor- tant indicator, as it cannot tell whether the train is going to depart late or on-time. The forecast is only available after Figure 7: Average delays grouped by weather type the train has already departed. The two main goals are to find a method to predict the departure delay and to improve the long term reliability of the delay forecast mechanism already present in the application. Departure delay prediction is a special problem, be- cause we do not have any information yet about the train we are interested in. Whether the train is going to depart late or on-time can only be predicted based on its observ- able environment. The input for the departure delay pre- diction problem is a set of snapshots taken at the scheduled departure time for which the target value is the delay of the train on its first appearance on the day. 3.1 Association rules The first idea is that the previously mined association rules 2.6 Delay heatmap (Table 2) should be applied and see if we can predict whether a train is going to be delayed or not upon depar- An interesting visualization method is to generate a ture. heatmap of delay changes (Figure 8). It allows us to The algorithm (Algorithm 3) of the model is very sim- see where the delay accumulates during the trip and these ple, it only requires a set of association rules extracted peaks might suggest track problems, busy stations, or any based on the input for departure delays. A train is consid- other hidden issues that we are not aware of. ered to be delayed if it is a consequent in a rule for which all the antecedent trains are delayed in a given snapshot Figure 8: Delay heatmap of train 2749 between Monor and of the map. The hyper-parameters of the model are the Budapest-Nyugati minimum support and minimum confidence of the rules. Algorithm 3 Algorithm for predicting the departure delay (association rules) Funct predictDepartureDelayAr(snapshot, rules) 1: delayedTrains ← getDelayedTrains(snapshot) 2: for rule ∈ rules do 3: if rule.getAntecedents() ⊆ delayedTrains then 4: return True 5: end if 6: end for 7: return False In this figure, red means larger average increase of de- lay and blue means a lower average increase of delay. The The results (Table 4) are impressive, but according to green patches represent moderate average increases of de- the research it turned out there are simply not enough in- lay. formation for the association rule mining algorithm in its current form which causes underfitting. Trains are catego- 3 Departure delay prediction rized as either delayed or on-time, which cannot properly handle the following situation: when an antecedent train is Traveling in an unreliable environment on a daily basis delayed more than a given threshold (for example 10 min- can be nerve-wracking. The official mobile application of utes) then the consequent train can depart on-time as they Hungarian State Railways has a delay forecasting mech- are far away from each other and a slot becomes available anism, but it is quite limited in its current form. When a for the consequent train. Otherwise, the delay of the an- train is already moving then the schedule is automatically tecedent train propagates to the consequent train. Table 4: Departure delay prediction metrics for train 2749 Table 5: An example train embedding using the association rules on the test set 406 472 580 609 619 709 2617 ... Precision Recall F1-score Support 3 18 1 3 3 0 2 ... On time 0.85 0.85 0.85 194 3 20 2 3 4 0 1 ... Late 0.72 0.71 0.72 105 Accuracy 0.80 299 denoted by green squares and trains that depart late are marked as red squares. 3.2 Train embedding In order to solve the underfitting problem that affects the Figure 9: Embeddings for train 2749 visualized using 3- association and sequential rules, a different approach is dimensional PCA necessary. It is not enough to have an indicator whether a train is delayed or not, the exact numeric values are needed instead. It is also important to have an input with fixed length for the algorithms. The solution (Algorithm 4, Table 5) is that each con- flicting train that travels when a specific train departs is considered as a unique feature with its current delay. If a previously encountered conflicting train is not present at the time, its delay becomes 0 as it likely won’t affect the delay-chain. First, the algorithm is called with an empty state vector and a subset of trains from a snapshot. If the identifier of a train is not contained in the state vector then it is appended to it. For each train identifier in the state we determine whether it is present in the current input or not and we append its current delay to the embedding. If 3.3 Support-vector machine a train is not found in the input, its current delay is con- sidered to be 0. The returned state can be then re-used to A support-vector machine [10] tries to find a hyperplane embed another set of trains. Before training, the embed- in an n-dimensional space which separates, and therefore dings can be safely padded with zeros to have a common classifies the data points. This hyperplane should have length. maximum margin, which means it should have maximal distance between the two classes, so future data points can be classified more reliably. Algorithm 4 Algorithm for creating train embeddings For each train, a unique model is trained. In our exam- Funct embed(state,trains) ple, the space is 45-dimensional and the hyperplane sepa- 1: embedding ← [] rates the trains that depart on-time and the trains that de- 2: for train ∈ trains do part late. For the SVM experiment we’ve implemented a 3: if train.getId() ∈ / state then grid-search (Table 6) and executed it on the training set 4: state.append(train.getId()) with 5-fold cross-validation with the following parame- 5: end if ters: 6: end for 7: for trainId ∈ state do 8: if trainId ∈ trains then Table 6: Parameter grid for the SVM experiment 9: embedding.append(trains.getDelay(trainId)) 10: else Parameter name Possible values 11: embedding.append(0) Regul. parameter (C) 0.001, 0.01, 0.1, 1, 10 12: end if Kernel linear, poly, rbf, sigmoid 13: end for Kernel coeff. (gamma) 0.001, 0.01, 0.1, 1 14: return state, embedding Indep. term (coef0) 0.0, 0.001, 0.01, 0.1, 1, 10 They usually have high dimensions but they can be vi- sualized using dimensionality reduction methods (Figure The grid-search optimizes the hyper-parameters of the 9). In the following picture trains that depart on-time are SVM model on the training dataset which results in better metrics during the evaluation phase. The best parameters deep neural networks. As it was mentioned before, the de- were C=1, coef0=10, gamma=0.01 and kernel=poly. (Ta- lay estimation in the official mobile application is not so ble 7) reliable on the long term, therefore it would be beneficial to find a method for predicting the real schedule. The rea- son behind choosing deep neural networks is that state-of- Table 7: SVM prediction metrics for train 2749 the-art multivariate time series forecasting methods tend to use these technologies. [12] Precision Recall F1-score Support The general delay prediction task will be formulated as On time 0.92 0.97 0.94 194 a regression problem instead of classification, because it Late 0.94 0.84 0.88 105 is more informative for the end-user and there are signif- Accuracy 0.92 299 icantly more data available when we consider snapshots after departure as well. 3.4 Random Forest Classifier 4.1 Input A random forest [11] is an ensemble model which fits mul- For each day the snapshots containing a given train t are tiple decision trees and outputs their mode. Each decision collected in ascending order by their timestamp (Table 10). tree splits the dataset a variable number of times based on It must be noted that there can be a different amount of the delays of the conflicting trains and outputs whether the snapshots for each day, because a delayed train obviously train is going to depart late or not. travels for a longer period of time. In this section a daily The training methodology was similar to SVM’s, we ran collection of ordered snapshots for a given train t will be a grid-search (Table 8) with 5-fold cross-validation on the referred as the input. training set with the following parameters: Table 10: A subset of the input for train 2749 on a given Table 8: Parameter grid for the Random Forest Classifier day Parameter name Possible values Date Train Delay Lat Lon ... n_estimators 200, 600, 1200, 1800 19-06-16 06:39 2749 0 47.35 19.43 ... max_depth 10, 50, 100, unlimited 19-06-16 06:40 2749 0 47.35 19.43 ... min_samples_split 2, 5, 10 19-06-16 06:40 2749 0 47.35 19.42 ... min_samples_leaf 1, 2, 4 19-06-16 06:41 2749 1 47.36 19.42 ... bootstrap True, False Based on the data, there are two kinds of preprocessed The results (Table 9) are slightly better than the SVM’s, inputs for each day, a vector of auxiliary features and a and the trained model also helps with the explainability matrix of time-series features (Table 11). of the delay-chains, which is useful to prevent them The auxiliary features include an indicator whether the from occurring in the future. The best parameters were given day is a weekday, an indicator whether the given bootstrap=true, max_depth=10, min_samples_leaf=2, train departs during the rush hours and the one-hot en- min_samples_split=10 and n_estimators=200. coded representation of the month upon departure. For the time-series features a similar train embedding Table 9: Random Forest Classifier prediction metrics for is used as before, the only difference is that this time the train 2749 train we are interested in is also included in the embed- ding. The embedding has a much higher dimensionality, Precision Recall F1-score Support because conflicting trains are embedded over the entire On time 0.97 1.00 0.98 194 route of the train. In order to keep the input dimension Late 1.00 0.90 0.95 105 manageable, only the characteristics of the train embed- Accuracy 0.97 299 dings are used. An entry in the time-series input contains the current delay of the train, the classified weather and the mean, standard deviation, minimum and maximum values of the 4 Generic delay prediction delays of the conflicting trains. Let’s suppose that train t traveled during k days over the Based on the research experiences with departure delays interval covered by the dataset and the number of time- it is time to solve the prediction problem in general with series features for all of its snapshots are m. Thus the 3D Table 11: Preprocessed time-series input on a given day Figure 10: Comparison of different 10-minute prediction models for train 2749 on a given day Delay Weather Mean STD Min Max 0 3 1.20 2.69 0 13 0 3 1.25 2.65 0 13 0 3 1.32 2.62 0 13 1 3 1.35 2.60 0 13 input of the network has dimensions (k, li , m) where li is the number of snapshots on the ith day. 4.2 Output For each entry in the time-series input the corresponding output becomes the vector of true delays in the future after Table 12: Evaluation of the generic prediction model on n minutes where n ∈ {5, 10, 20, 30}. Let’s assume that the train 2749 train we are interested in is t and its current delay is de- termined by dt (Xi ). For each snapshot Xi let the output yi j n=5 n = 10 n = 20 n = 30 equal to the delay of train t at snapshots Xi+n j ( j = 1..4). LSTM MSE 1.03 2.01 4.23 7.07 In case of out of bounds indices the delay of t at the last LSTM R2 0.97 0.95 0.91 0.84 snapshot of the day is used instead. Official MSE 1.28 3.03 7.63 12.92 Official R2 0.97 0.93 0.83 0.72 ytrue i = [dt (Xi+5 ), dt (Xi+10 ), dt (Xi+20 ), dt (Xi+30 )] Official model The main goal of the generic delay pre- diction task is to obtain a more accurate forecast than it is 4.3 Evaluation currently available in the official mobile application. In or- The first evaluation was performed on train 2749. Out of der to have a meaningful comparison, we have to recreate the 236 available occurrences only 231 were used, where the model of the official forecast method and calculate its the maximum delay along the route was less than 30 min- loss and other metrics alongside our model. Fortunately, utes. There is simply not enough data for the outliers the official model is not too complicated, it simply substi- where delay may occasionally exceed 250 minutes. tutes the current delay for all future occurrences. The following metrics (Figure 10, Table 12) were cal- culated using 3-fold cross-validation. yoi f f icial = [dt (Xi ), dt (Xi ), dt (Xi ), dt (Xi )] For this train the LSTM model gives better and better results as n increases compared to the official model. LSTM model Our model has to support both the auxiliary On average, the proposed LSTM model outperforms the and time-series features, therefore a multi-input network is official model, but only when significant outliers are omit- necessary. This problem is similar to the image captioning ted from the dataset. The proposed model is not able to task, where an image is chosen as an auxiliary feature and learn extreme delays yet reliably due to their rare nature, the words of the generated caption are sequence-like. [13, but the official model is able to forecast them easily by 14] Due to the fact that there can be a varying number of simply substituting the current delay in a linear manner. snapshots per day some sort of recurrent neural network This is not a huge issue, because if a train is delayed that (RNN) is needed, which can handle the temporal nature of much it usually skips its trip on that day entirely and pas- the data as well. The output of an RNN depends not only sengers are informed on multiple platforms. on the current input but on the previous outputs as well. Its memory is very useful for the prediction of the delays, 4.4 Conclusion because it can learn complicated delay patterns. The RNN can also have a preset initial state, where we The analysis and the machine learning models presented can store the representation of the auxiliary features and in this paper could be useful for the betterment of railway the resulting network models P(Xi+1 |X0:i , auxiliary). [15] services in Hungary and they may also increase the satis- This auxiliary condition allows us to have a single network faction of the passengers. Hungarian State Railways also for all trains if we include a train identifier, but due to re- expressed their interest in the continuation of the research source constraints this was not used during the research. project in cooperation with our university. References EFOP-3.6.3-VEKOP-16-2017-00001: Talent Manage- ment in Autonomous Vehicle Control Technologies – The [1] Roland Krisztián Szabó. Smart alarm clock based on traffic Project is supported by the Hungarian Government and co- and weather information, 2018. financed by the European Social Fund. [2] MÁV Szolgáltató Központ Zrt. MÁV-START térkép, 2020. [Online; accessed 16-January-2020]. [3] Openweather Ltd. OpenWeatherMap, 2020. [Online; ac- cessed 16-January-2020]. [4] GeoNames Team. GeoNames dump (Hungary), 2020. [On- line; accessed 16-January-2020]. [5] Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509–517, September 1975. [6] Zhongyi Ni, Lijun Xie, Tian Xie, Binhua Shi, and Yao Zheng. Incremental road network generation based on ve- hicle trajectories. ISPRS International Journal of Geo- Information, 7(10), 2018. [7] Rakesh Agrawal and Ramakrishnan Srikant. Fast algo- rithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB ’94, pages 487–499, San Fran- cisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc. [8] Philippe Fournier-Viger. SPMF open-source data mining library, 2020. [Online; accessed 17-March-2020]. [9] Philippe Fournier-Viger, Roger Nkambou, and Vincent Shin-Mu Tseng. Rulegrowth: Mining sequential rules com- mon to several sequences by pattern-growth. In Proceed- ings of the 2011 ACM Symposium on Applied Computing, SAC ’11, page 956–961, New York, NY, USA, 2011. As- sociation for Computing Machinery. [10] Corinna Cortes and Vladimir Vapnik. Support-vector net- works. Mach. Learn., 20(3):273–297, September 1995. [11] Tin Kam Ho. Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1, ICDAR ’95, page 278, USA, 1995. IEEE Computer Society. [12] Papers with Code. Multivariate Time Series Forecasting, 2020. [Online; accessed 08-May-2020]. [13] Andrej Karpathy and Fei-Fei Li. Deep visual-semantic alignments for generating image descriptions. CoRR, abs/1412.2306, 2014. [14] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Du- mitru Erhan. Show and tell: A neural image caption gener- ator. CoRR, abs/1411.4555, 2014. [15] Philippe Rémy. Conditional RNN, 2019. [Online; accessed 17-March-2020].