=Paper=
{{Paper
|id=Vol-2718/paper06
|storemode=property
|title=Analysis of Delay Patterns and Correlations in Railway Traffic Data
|pdfUrl=https://ceur-ws.org/Vol-2718/paper06.pdf
|volume=Vol-2718
|authors=Roland Krisztián Szabó,Tomáš Horváth,Ádám Tarcsi
|dblpUrl=https://dblp.org/rec/conf/itat/SzaboHT20
}}
==Analysis of Delay Patterns and Correlations in Railway Traffic Data==
Analysis of delay patterns and correlations in railway traffic data
Roland Krisztián Szabó, Tomáš Horváth, and Ádám Tarcsi
Eötvös Loránd University, Faculty of Informatics, Budapest, Pázmány Péter stny. 1/C., 1117,
rolandszabo@inf.elte.hu tomas.horvath@inf.elte.hu ade@inf.elte.hu
Abstract: Traffic itself can be a huge challenge for most
Table 1: Details of a train entry in a snapshot
commuters regardless of the transportation method of their
choice. For example, it is inevitable to experience delays
and congestion during rush hours. All commute methods Field name Example
have their own specific characteristics when it comes to Date "2019.10.29 20:09:38"
delays - cars and buses suffer from traffic jams and sim- Elvira ID "5614115_191029"
ilar principles apply to railways as well. However, the Operator "MAV"
causes of railway delays are not that straightforward and Line "40"
they need further investigation. According to our personal Train number "55808"
experiences most passengers are not aware of the reasons Relation "Budapest-Keleti - Pécs"
behind train delays even though they are usually encoun- Latitude 46.26418
tered multiple times a day. In this paper I will present pos- Longitude 18.10566
sible answers based on the data collected from the publicly Delay 5
available APIs of Hungarian State Railways over the past
1.5 years.
1 Datasets at all. Should the pandemic be over its effects could be
analyzed later on but currently it is out of scope of this
The idea of the delay analysis and prediction originates paper.
from paper [1] where a simpler version of this concept has A snapshot of the map contains the following informa-
been used as a module in a smart alarm clock application. tion about each of the trains that were present at the time
During the development of the application multiple data the snapshot was taken (Table 1).
sources were investigated, some of which turned out to be
unusable. In this section the details of the selected data
1.2 Weather
sources will be discussed.
In addition to the traffic data we also collected the cor-
1.1 Traffic responding weather data for every train, because we sus-
pect that weather has an influence on the delays as well.
I found that the most reliable publicly available data source
It was not easy to find a free provider which is capable of
for traffic is the official map of Hungarian State Railways
handling the necessary amount of requests, but after many
[2], where all trains can be tracked in real-time. I have
trials we decided to use OpenWeatherMap [3]. Its free
created a small automated script that runs on a virtual pri-
tier gives access to 60 location-based weather requests per
vate server and takes a snapshot of the map approximately
minute, which is still not enough for every individual train,
every minute and stores the result in a JSON file.
but can be sufficient to place virtual weather stations all
Traffic data have been being collected since January
over Hungary with a resolution of approximately 35.5 km.
2019, which means there are roughly 1 year and 6 months
of available information (approx. 130 million records). Definition 1. Virtual weather station. A virtual weather
Due to the COVID-19 outbreak a data freeze was applied station is a GPS position which can be queried for up-to-
at the end of March 2020 because of the extraordinary cir- date local weather information.
cumstances that affect transportation all over the world.
Hungarian State Railways canceled lots of trains and only
30% percent of a train’s capacity can be used in order to Calculating the coordinates of the virtual weather sta-
prevent the spread of the infectious disease. This new sit- tions The first task is to distribute the available 60 slots
uation significantly alters the operation of the railway sys- uniformly such that every train can be assigned to the clos-
tem which would have introduced a lot of noise to the ex- est virtual weather station. Finding an exact solution to
isting dataset and it might not be relevant in a few months the problem would have been infeasible, therefore we de-
Copyright c 2020 for this paper by its authors. Use permitted un-
cided to develop an approximation algorithm for which we
der Creative Commons License Attribution 4.0 International (CC BY used the GeoNames geographical database [4] which con-
4.0). tains POIs in Hungary and is available for download free
of charge under the Creative Commons Attribution 4.0 li-
Figure 1: Reconstructed railway network of Hungary
cense.
The algorithm (Algorithm 1) uses a k-d tree which is
a space-partitioning data structure that allows fast nearest
neighbor searches. [5] The k-d tree is used to place 60 vir-
tual weather stations on the map as follows: in each step
the most populated POI is selected and then its neighbor-
ing POIs are eliminated in the given radius. It results
in an approximately uniform placement of virtual weather
stations and they are located at densely populated areas
where accurate weather information benefits more people.
Algorithm 1 Approximation algorithm for virtual weather
station placement
Funct getVirtualWeatherStationPositions(pois, radius)
1: pois ← pois.sort(”population”, ”desc”)
2: kdt ← kdTree < POI > (”haversine”) Definition 2. Conflicting trains. A set of trains is said to
3: for poi ∈ pois do be in conflict when their route has common representative
4: nn ← kdt.searchRadius(poi, radius) points.
5: if nn = 0/ then
The algorithm (Algorithm 2) determines the set of
6: kdt.add(poi)
representative points which are within a given distance
7: end if
radius to a specific representative point rp and returns
8: end for
the set of trains that travel through those points without
9: return kdt
taking the temporal dimension into consideration, because
we only use the intersection of conflicting trains with the
currently traveling trains.
2 Analysis
Algorithm 2 Algorithm for determining conflicting trains
2.1 Reconstruction of the railway network Funct getConflictingTrains(allRps, rp, radius)
1: kdt ← kdTree < RP > (”haversine”, allRps)
In the traffic dataset there are millions of recorded GPS co- 2: nn ← kdt.searchRadius(rp, radius)
ordinates and the majority of them can be safely discarded n∈nn {n.trainId}
S
3: return
after the necessary information have been extracted. For
this task we used the Representative point extraction and
updating algorithm by Zhongyi Ni et al. [6] which is able
to calculate the most significant points along a route and is 2.3 Association rules
also capable of refining these points as new data becomes
available due to its online nature. The task is to determine We realized that the grouping of trains can be considered
the least amount of points (called the representative points) as a frequent itemset mining problem, therefore we used
that can accurately represent such a route. the Apriori algorithm [7] for itemset mining and associa-
The above-mentioned algorithm can determine the tion rule learning.
points describing a route (Figure 1) with an arbitrary res-
Definition 3. Delayed train. A train is officially consid-
olution. However, the recorded GPS trajectories are noisy
ered to be delayed when its delay is greater than or equal
and may contain significantly misplaced outliers. The
to 5 minutes.
representative point extraction algorithm is able to prop-
erly handle noise, but it also creates new representative The algorithm requires transactions, which can be con-
points in case of outliers, therefore a support-based post- structed based on the snapshots of the map. For each snap-
processing step is needed, which removes representative shot a transaction is made based on the set of conflicting
points that are encountered rarely. delayed trains (Definition 2) in the given snapshot.
Association rules were generated for departure delays
2.2 Conflicting trains (Table 2), for which the snapshots taken upon the sched-
uled departure are used. The meaning of a rule is that if
As a dimensionality reduction method it is beneficial to the set of antecedent trains are delayed then the consequent
obtain the set of trains that might affect the delay of an- train is likely to depart late with the given metrics.
other train. It can be also used to model delay-chain prop- Low support values are due to the fact that train 2749
agation. is only included in a transaction when it is delayed. The
Table 2: A subset of association rules generated for train Table 3: A subset of sequential rules generated for train
2749 2749
Antecedents Consequent Supp. Conf.
Antecedents Consequent Supp. Conf.
2879 2749 0.63 0.97
2879,2739 2749 0.21 0.81 7049 2749 0.61 0.97
2879,7039,2739 2749 0.19 0.84 2669,2879 2749 0.60 0.97
2879,2859,2739 2749 0.17 0.82 2669,6099 2749 0.61 0.90
2879,7039,2859,2739 2749 0.16 0.84 2669 2749 0.67 0.90
2879,700,7039 2749 0.15 0.80 6099 2749 0.63 0.87
2879,700,2859 2749 0.15 0.83 2649 2749 0.61 0.86
2879,6299,2739 2749 0.14 0.87 7009 2749 0.61 0.84
2879,7039,6299,2739 2749 0.14 0.89
2879,700,2739 2749 0.14 0.83
700,7039,2859,2879 2749 0.13 0.85
2879,2740,2739 2749 0.12 0.93 Definition 4. Average delay. The average of the trains’
2879,2740,2859 2749 0.12 0.84 average delays along their route on a given day.
Definition 5. Average minimum (maximum) delay. The
average of the trains’ minimum (maximum) delays along
their route on a given day.
frequent itemset containing train 2749 has a support value
of 0.36 which means the train departs late roughly 36% of
the time. Month It turned out that summer is the only specific sea-
son which has a peak in the average delays followed by
autumn (Figure 2). This effect might be caused by main-
2.4 Sequential rules
tenance works, but there is no available historical mainte-
Besides the traditional association rule mining we can also nance data to confirm this theory. It’s likely not caused by
consider consecutive snapshots of a specific train on a the number of passengers since there is no school in Hun-
given day, which takes the temporal information into con- gary during the summer, which significantly reduces the
sideration as well (Table 3). Sequential pattern mining is number of passengers in the rush hours. Higher tempera-
almost the same as association rule mining, but instead of tures also seem to have an effect on the delays.
working directly with a transaction we consider consecu-
tive transactions recorded in time. Figure 2: Average delays grouped by month
Sequential rules can be also mined for the departure de-
lay, but they are more meaningful if we mine them along
the entire route of the train. The rule A =⇒ B means that
when the trains in A are delayed then train B will also be-
come delayed in the future. In case of association rules we
talked about trains that are usually delayed together, but
now we have an additional temporal dimension.
In order to test this method we used the SPMF open-
source data mining library [8] with the RuleGrowth algo-
rithm [9].
The support values are much higher in this case, because
rules are mined along the entire route of the train. The
support value of the frequent itemset containing train 2749
is 0.71 which means even though the train departed late
only 36% of the time it got delayed 71% of the time during
its trip.
2.5 Other factors
Day of the week By looking at the average of all trains we
In addition to the delay propagation there might be other can claim that Monday and Friday have the largest delays
factors that contribute to the delay of trains, like weather on average, while weekends have somewhat lower average
and temporality. In this section some of these factors will delays (Figure 3). It would be nice to have a dataset related
be analyzed with possible explanations and conclusions. to the number of passengers because the larger amount of
passengers may cause delay peaks at the beginning and marked as delay peaks (Figure 5). It can be concluded that
at the end of the workweek. The number of passengers as the number of passengers and the density of the sched-
may also have a correlation with the lower delays during ule increase, the average delay increases as well. Accord-
the weekend, but we suspect that it is likely caused by the ing to the research, most relations have this pattern.
sparser schedule, which effectively reduces delay propa- Two other peaks can be observed between 23:00 and
gation. 01:00. In order to understand them domain knowledge is
needed. The reason behind the existence of the peaks is
Figure 3: Average delays grouped by day of the week that only a very small number of train travels by that time
in the country (sometimes even less than 10), and when
some of them are delayed, it causes a huge impact on the
average.
Figure 5: Average delays grouped by hour
Holidays Holidays do not seem to have a significant effect
on the average delays (Figure 4). The peaks were mostly
predictable according to the previous researches - Pente-
cost Monday and Saint Stephen’s Day have slightly higher
average delays but they are both in the Summer, which has
the highest average delay among the seasons and Good
Friday is a Friday, which has above average delay if we
compare it to the other days of the week. As a conclusion, Temperature The chart shows that the average delays in-
events do not seem to cause extraordinary delays, because crease as temperature tends to either -10 or +30 Celsius
they can be planned ahead. degrees (Figure 6). Due to the distribution of trains, the
ends of the chart are noisy, but the trendline can be easily
seen.
Figure 4: Average delays grouped by holidays in 2019
Figure 6: Average delays grouped by temperature
Time of the day By looking at the chart containing the
delays grouped by hours, the rush hours can be clearly
Weather type As far as the type of weather is concerned, adjusted by its current delay. This estimation is not so re-
precipitation usually increases the delays (Figure 7). The liable on the long term, but it can give you an idea about
most troublesome types are related to snow in the winter the scale of the expected delay under the current circum-
and unexpected thunderstorms in the summer. Nearly all stances.
rain types have higher average delays than clear sky. Another problem is that the forecast lacks a very impor-
tant indicator, as it cannot tell whether the train is going to
depart late or on-time. The forecast is only available after
Figure 7: Average delays grouped by weather type
the train has already departed. The two main goals are to
find a method to predict the departure delay and to improve
the long term reliability of the delay forecast mechanism
already present in the application.
Departure delay prediction is a special problem, be-
cause we do not have any information yet about the train
we are interested in. Whether the train is going to depart
late or on-time can only be predicted based on its observ-
able environment. The input for the departure delay pre-
diction problem is a set of snapshots taken at the scheduled
departure time for which the target value is the delay of the
train on its first appearance on the day.
3.1 Association rules
The first idea is that the previously mined association rules
2.6 Delay heatmap (Table 2) should be applied and see if we can predict
whether a train is going to be delayed or not upon depar-
An interesting visualization method is to generate a ture.
heatmap of delay changes (Figure 8). It allows us to The algorithm (Algorithm 3) of the model is very sim-
see where the delay accumulates during the trip and these ple, it only requires a set of association rules extracted
peaks might suggest track problems, busy stations, or any based on the input for departure delays. A train is consid-
other hidden issues that we are not aware of. ered to be delayed if it is a consequent in a rule for which
all the antecedent trains are delayed in a given snapshot
Figure 8: Delay heatmap of train 2749 between Monor and of the map. The hyper-parameters of the model are the
Budapest-Nyugati minimum support and minimum confidence of the rules.
Algorithm 3 Algorithm for predicting the departure delay
(association rules)
Funct predictDepartureDelayAr(snapshot, rules)
1: delayedTrains ← getDelayedTrains(snapshot)
2: for rule ∈ rules do
3: if rule.getAntecedents() ⊆ delayedTrains then
4: return True
5: end if
6: end for
7: return False
In this figure, red means larger average increase of de-
lay and blue means a lower average increase of delay. The
The results (Table 4) are impressive, but according to
green patches represent moderate average increases of de-
the research it turned out there are simply not enough in-
lay.
formation for the association rule mining algorithm in its
current form which causes underfitting. Trains are catego-
3 Departure delay prediction rized as either delayed or on-time, which cannot properly
handle the following situation: when an antecedent train is
Traveling in an unreliable environment on a daily basis delayed more than a given threshold (for example 10 min-
can be nerve-wracking. The official mobile application of utes) then the consequent train can depart on-time as they
Hungarian State Railways has a delay forecasting mech- are far away from each other and a slot becomes available
anism, but it is quite limited in its current form. When a for the consequent train. Otherwise, the delay of the an-
train is already moving then the schedule is automatically tecedent train propagates to the consequent train.
Table 4: Departure delay prediction metrics for train 2749 Table 5: An example train embedding
using the association rules on the test set
406 472 580 609 619 709 2617 ...
Precision Recall F1-score Support
3 18 1 3 3 0 2 ...
On time 0.85 0.85 0.85 194 3 20 2 3 4 0 1 ...
Late 0.72 0.71 0.72 105
Accuracy 0.80 299
denoted by green squares and trains that depart late are
marked as red squares.
3.2 Train embedding
In order to solve the underfitting problem that affects the Figure 9: Embeddings for train 2749 visualized using 3-
association and sequential rules, a different approach is dimensional PCA
necessary. It is not enough to have an indicator whether a
train is delayed or not, the exact numeric values are needed
instead. It is also important to have an input with fixed
length for the algorithms.
The solution (Algorithm 4, Table 5) is that each con-
flicting train that travels when a specific train departs is
considered as a unique feature with its current delay. If a
previously encountered conflicting train is not present at
the time, its delay becomes 0 as it likely won’t affect the
delay-chain. First, the algorithm is called with an empty
state vector and a subset of trains from a snapshot. If the
identifier of a train is not contained in the state vector then
it is appended to it. For each train identifier in the state
we determine whether it is present in the current input or
not and we append its current delay to the embedding. If 3.3 Support-vector machine
a train is not found in the input, its current delay is con-
sidered to be 0. The returned state can be then re-used to A support-vector machine [10] tries to find a hyperplane
embed another set of trains. Before training, the embed- in an n-dimensional space which separates, and therefore
dings can be safely padded with zeros to have a common classifies the data points. This hyperplane should have
length. maximum margin, which means it should have maximal
distance between the two classes, so future data points can
be classified more reliably.
Algorithm 4 Algorithm for creating train embeddings
For each train, a unique model is trained. In our exam-
Funct embed(state,trains) ple, the space is 45-dimensional and the hyperplane sepa-
1: embedding ← [] rates the trains that depart on-time and the trains that de-
2: for train ∈ trains do part late. For the SVM experiment we’ve implemented a
3: if train.getId() ∈
/ state then grid-search (Table 6) and executed it on the training set
4: state.append(train.getId()) with 5-fold cross-validation with the following parame-
5: end if ters:
6: end for
7: for trainId ∈ state do
8: if trainId ∈ trains then Table 6: Parameter grid for the SVM experiment
9: embedding.append(trains.getDelay(trainId))
10: else Parameter name Possible values
11: embedding.append(0) Regul. parameter (C) 0.001, 0.01, 0.1, 1, 10
12: end if Kernel linear, poly, rbf, sigmoid
13: end for Kernel coeff. (gamma) 0.001, 0.01, 0.1, 1
14: return state, embedding Indep. term (coef0) 0.0, 0.001, 0.01, 0.1, 1, 10
They usually have high dimensions but they can be vi-
sualized using dimensionality reduction methods (Figure The grid-search optimizes the hyper-parameters of the
9). In the following picture trains that depart on-time are SVM model on the training dataset which results in better
metrics during the evaluation phase. The best parameters deep neural networks. As it was mentioned before, the de-
were C=1, coef0=10, gamma=0.01 and kernel=poly. (Ta- lay estimation in the official mobile application is not so
ble 7) reliable on the long term, therefore it would be beneficial
to find a method for predicting the real schedule. The rea-
son behind choosing deep neural networks is that state-of-
Table 7: SVM prediction metrics for train 2749
the-art multivariate time series forecasting methods tend
to use these technologies. [12]
Precision Recall F1-score Support The general delay prediction task will be formulated as
On time 0.92 0.97 0.94 194 a regression problem instead of classification, because it
Late 0.94 0.84 0.88 105 is more informative for the end-user and there are signif-
Accuracy 0.92 299 icantly more data available when we consider snapshots
after departure as well.
3.4 Random Forest Classifier 4.1 Input
A random forest [11] is an ensemble model which fits mul- For each day the snapshots containing a given train t are
tiple decision trees and outputs their mode. Each decision collected in ascending order by their timestamp (Table 10).
tree splits the dataset a variable number of times based on It must be noted that there can be a different amount of
the delays of the conflicting trains and outputs whether the snapshots for each day, because a delayed train obviously
train is going to depart late or not. travels for a longer period of time. In this section a daily
The training methodology was similar to SVM’s, we ran collection of ordered snapshots for a given train t will be
a grid-search (Table 8) with 5-fold cross-validation on the referred as the input.
training set with the following parameters:
Table 10: A subset of the input for train 2749 on a given
Table 8: Parameter grid for the Random Forest Classifier day
Parameter name Possible values Date Train Delay Lat Lon ...
n_estimators 200, 600, 1200, 1800 19-06-16 06:39 2749 0 47.35 19.43 ...
max_depth 10, 50, 100, unlimited 19-06-16 06:40 2749 0 47.35 19.43 ...
min_samples_split 2, 5, 10 19-06-16 06:40 2749 0 47.35 19.42 ...
min_samples_leaf 1, 2, 4 19-06-16 06:41 2749 1 47.36 19.42 ...
bootstrap True, False
Based on the data, there are two kinds of preprocessed
The results (Table 9) are slightly better than the SVM’s,
inputs for each day, a vector of auxiliary features and a
and the trained model also helps with the explainability
matrix of time-series features (Table 11).
of the delay-chains, which is useful to prevent them
The auxiliary features include an indicator whether the
from occurring in the future. The best parameters were
given day is a weekday, an indicator whether the given
bootstrap=true, max_depth=10, min_samples_leaf=2,
train departs during the rush hours and the one-hot en-
min_samples_split=10 and n_estimators=200.
coded representation of the month upon departure.
For the time-series features a similar train embedding
Table 9: Random Forest Classifier prediction metrics for is used as before, the only difference is that this time the
train 2749 train we are interested in is also included in the embed-
ding. The embedding has a much higher dimensionality,
Precision Recall F1-score Support because conflicting trains are embedded over the entire
On time 0.97 1.00 0.98 194 route of the train. In order to keep the input dimension
Late 1.00 0.90 0.95 105 manageable, only the characteristics of the train embed-
Accuracy 0.97 299 dings are used.
An entry in the time-series input contains the current
delay of the train, the classified weather and the mean,
standard deviation, minimum and maximum values of the
4 Generic delay prediction delays of the conflicting trains.
Let’s suppose that train t traveled during k days over the
Based on the research experiences with departure delays interval covered by the dataset and the number of time-
it is time to solve the prediction problem in general with series features for all of its snapshots are m. Thus the 3D
Table 11: Preprocessed time-series input on a given day Figure 10: Comparison of different 10-minute prediction
models for train 2749 on a given day
Delay Weather Mean STD Min Max
0 3 1.20 2.69 0 13
0 3 1.25 2.65 0 13
0 3 1.32 2.62 0 13
1 3 1.35 2.60 0 13
input of the network has dimensions (k, li , m) where li is
the number of snapshots on the ith day.
4.2 Output
For each entry in the time-series input the corresponding
output becomes the vector of true delays in the future after Table 12: Evaluation of the generic prediction model on
n minutes where n ∈ {5, 10, 20, 30}. Let’s assume that the train 2749
train we are interested in is t and its current delay is de-
termined by dt (Xi ). For each snapshot Xi let the output yi j n=5 n = 10 n = 20 n = 30
equal to the delay of train t at snapshots Xi+n j ( j = 1..4). LSTM MSE 1.03 2.01 4.23 7.07
In case of out of bounds indices the delay of t at the last LSTM R2 0.97 0.95 0.91 0.84
snapshot of the day is used instead. Official MSE 1.28 3.03 7.63 12.92
Official R2 0.97 0.93 0.83 0.72
ytrue
i = [dt (Xi+5 ), dt (Xi+10 ), dt (Xi+20 ), dt (Xi+30 )]
Official model The main goal of the generic delay pre-
diction task is to obtain a more accurate forecast than it is 4.3 Evaluation
currently available in the official mobile application. In or- The first evaluation was performed on train 2749. Out of
der to have a meaningful comparison, we have to recreate the 236 available occurrences only 231 were used, where
the model of the official forecast method and calculate its the maximum delay along the route was less than 30 min-
loss and other metrics alongside our model. Fortunately, utes. There is simply not enough data for the outliers
the official model is not too complicated, it simply substi- where delay may occasionally exceed 250 minutes.
tutes the current delay for all future occurrences. The following metrics (Figure 10, Table 12) were cal-
culated using 3-fold cross-validation.
yoi f f icial = [dt (Xi ), dt (Xi ), dt (Xi ), dt (Xi )] For this train the LSTM model gives better and better
results as n increases compared to the official model.
LSTM model Our model has to support both the auxiliary On average, the proposed LSTM model outperforms the
and time-series features, therefore a multi-input network is official model, but only when significant outliers are omit-
necessary. This problem is similar to the image captioning ted from the dataset. The proposed model is not able to
task, where an image is chosen as an auxiliary feature and learn extreme delays yet reliably due to their rare nature,
the words of the generated caption are sequence-like. [13, but the official model is able to forecast them easily by
14] Due to the fact that there can be a varying number of simply substituting the current delay in a linear manner.
snapshots per day some sort of recurrent neural network This is not a huge issue, because if a train is delayed that
(RNN) is needed, which can handle the temporal nature of much it usually skips its trip on that day entirely and pas-
the data as well. The output of an RNN depends not only sengers are informed on multiple platforms.
on the current input but on the previous outputs as well.
Its memory is very useful for the prediction of the delays, 4.4 Conclusion
because it can learn complicated delay patterns.
The RNN can also have a preset initial state, where we The analysis and the machine learning models presented
can store the representation of the auxiliary features and in this paper could be useful for the betterment of railway
the resulting network models P(Xi+1 |X0:i , auxiliary). [15] services in Hungary and they may also increase the satis-
This auxiliary condition allows us to have a single network faction of the passengers. Hungarian State Railways also
for all trains if we include a train identifier, but due to re- expressed their interest in the continuation of the research
source constraints this was not used during the research. project in cooperation with our university.
References EFOP-3.6.3-VEKOP-16-2017-00001: Talent Manage-
ment in Autonomous Vehicle Control Technologies – The
[1] Roland Krisztián Szabó. Smart alarm clock based on traffic Project is supported by the Hungarian Government and co-
and weather information, 2018. financed by the European Social Fund.
[2] MÁV Szolgáltató Központ Zrt. MÁV-START térkép, 2020.
[Online; accessed 16-January-2020].
[3] Openweather Ltd. OpenWeatherMap, 2020. [Online; ac-
cessed 16-January-2020].
[4] GeoNames Team. GeoNames dump (Hungary), 2020. [On-
line; accessed 16-January-2020].
[5] Jon Louis Bentley. Multidimensional binary search
trees used for associative searching. Commun. ACM,
18(9):509–517, September 1975.
[6] Zhongyi Ni, Lijun Xie, Tian Xie, Binhua Shi, and Yao
Zheng. Incremental road network generation based on ve-
hicle trajectories. ISPRS International Journal of Geo-
Information, 7(10), 2018.
[7] Rakesh Agrawal and Ramakrishnan Srikant. Fast algo-
rithms for mining association rules in large databases. In
Proceedings of the 20th International Conference on Very
Large Data Bases, VLDB ’94, pages 487–499, San Fran-
cisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc.
[8] Philippe Fournier-Viger. SPMF open-source data mining
library, 2020. [Online; accessed 17-March-2020].
[9] Philippe Fournier-Viger, Roger Nkambou, and Vincent
Shin-Mu Tseng. Rulegrowth: Mining sequential rules com-
mon to several sequences by pattern-growth. In Proceed-
ings of the 2011 ACM Symposium on Applied Computing,
SAC ’11, page 956–961, New York, NY, USA, 2011. As-
sociation for Computing Machinery.
[10] Corinna Cortes and Vladimir Vapnik. Support-vector net-
works. Mach. Learn., 20(3):273–297, September 1995.
[11] Tin Kam Ho. Random decision forests. In Proceedings of
the Third International Conference on Document Analysis
and Recognition (Volume 1) - Volume 1, ICDAR ’95, page
278, USA, 1995. IEEE Computer Society.
[12] Papers with Code. Multivariate Time Series Forecasting,
2020. [Online; accessed 08-May-2020].
[13] Andrej Karpathy and Fei-Fei Li. Deep visual-semantic
alignments for generating image descriptions. CoRR,
abs/1412.2306, 2014.
[14] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Du-
mitru Erhan. Show and tell: A neural image caption gener-
ator. CoRR, abs/1411.4555, 2014.
[15] Philippe Rémy. Conditional RNN, 2019. [Online; accessed
17-March-2020].