1. Introduction

tional journal of forecasting

Car-traffic forecasting: A representation learning approach

Ali Ziat

0 1 2

Gabriella Contardo

0 1 2

Nicolas Baskiotis

0 1 2

Ludovic Denoyer

0 1 2 0 Lv , Yisheng, Duan, Yanjie, Kang, Wenwen, Li, Zhengxi 1 Shang , Jingbo, Zheng, Yu, Tong, Wenzhu, Chang, Eric 2 Sorbonne Universite ́s , UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris , France

1997

We address the problem of learning over multiple inter-dependent temporal sequences where dependencies are modeled by a graph. We propose a model that is able to simultaneously fill in missing values and predict future ones. This approach is based on representation learning techniques, where temporal data are represented in a latent vector space. Information completion (missing values) and prediction are then performed on this latent representation. In particular, the model allows us to perform both tasks using a unique formalism, whereas most often they are addressed separately using different methods. The model has been tested for a concrete application: cartraffic forecasting where each time series characterizes a particular road and where the graph structure corresponds to the road map of the city.

1. Introduction

Traffic data has particular characteristics that can not be fully handled by classical sequential and temporal models: they contain multiple missing values, and one has to consider simultaneously multiple sources that can be somehow related, by spatial proximity for example.

We propose a novel method that aims at integrating these aspects in one single model.

The proposed approach is based on representation learning techniques aiming at projecting the observations in a continuous latent space, each sequence being modeled at each time-step by a point in this space. It has many advantages w.r.t existing techniques: it is able to simultaneously learn how to fill missing values and to predict the future of the observed temporal data, avoiding to use two different models, and it naturally allows one to deal with information sources that are organized among a graph structure.

Moreover, the model is based on continuous optimization schemes, Proceedings of the 2 nd International Workshop on Mining Urban authors. Copying permitted for private and academic purposes. allowing a fast optimization over large scale datasets.

2. Context and Model 2.1. Notations and Tasks

Let us consider a set of n temporal sequences x1, ..xn such that xi(t) ∈ X is the value of the i-th sequence at time t defined by x

i = (xi(1), .., xi(T )) . In the case where X is Rm, the context corresponds to multiple multivariate time series. The sequences contain missing values so we also define a mask m(t) such that m(t) = 1 if value x(t) is obi i i served - and thus available for training the system - and m(t) = 0 if x i (t) is missing - and thus has to be predicted i by the model. In addition, we consider that there exists a set of relations between the sequences which correspond to an external information, like spatial proximity for example when X is discrete. The sequences are thus organized in a graph G = {ei,j } such that ei,j = 1 means that xi and xj are related, and ei,j = 0 elsewhere. 2.2. Model The RepresentAtIoN-baSed TempOral Relational Model (RAINSTORM) is a loss-based model which is described through a continuous derivable loss function that will be optimized using classical optimization techniques. Let us define L(θ, γ, z) the loss function to minimize where z is the set of all the vectors zi(t) for i ∈ [1..n] and t ∈ [1..T ], T being the size of the observed time windows i.e. the history of the time series. We define L as: L(θ, γ, z) =

X mi(t)Δ(fθ(zi(t)), xi(t)) (term 1) 1 O n X

T i=1 t=1 n T −1 i=1 t=1

T i,j∈[1..N]2 t=1 + λstruct

X ei,j ||zi(t) − zj(t)||2 (term 3) (1) where O is the number of observed values i.e. values such that m(t) = 1.

i previously: This loss function contains three terms, each one associated with one of the constraints that have been presented • Term 1 aims at simultaneously learn z and a function fθ - called decoding function - such that, from zi(t), fθ can be used to predict the value xi(t). The function fθ(zi(t)) is defined as fθ : RN → X . Δ is used to measure the error between predicting fθ(zi(t)) instead of xi(t), m(t) playing the role of a mask restricting to i compute this function only on the observed values. • Term 2 aims at finding values zi(.) and a dynamic model hγ such that, when applied to zi(t), hγ allows us to predict the representation of the next state of time series i i.e. zi

(t+1). hγ is the dynamic function which models the dynamicity of each series directly in the latent space: hγ : RN → RN . The parameters γ will be learned to minimize the mean square error between the prediction hγ (zi(t)) and zi(t+1). • At last, term 3 corresponds to a structural regularity over the graph structure that encourages the model to learn closer representations for time series that are related. This will force the model to learn representations that reflect the structure of the considered graph. λdyn and λstruct are manually defined coefficients that weight the importance of the different elements in the loss function.

The learning problem aims at minimizing the loss function L(θ, γ, z) simultaneously on θ, γ and z. By restricting the fθ and hγ to be continuous derivable functions, we can use gradient-descent based optimization approaches.

3. Traffic Forecasting and Experiments

Experiences have been made on two datasets from the cities of Beijing and Warsaw. The dataset are provided by (Zheng, 2011) and (ICDM) and are not described here for sake of space.

3.1. Models

We propose to compare the RAINSTORM approach to the following baseline models, some baselines being used for data completion, and some others for prediction. For the completion problem we consider the following models: MF: This correspond to the classical matrix factorization framework for data completion.

MF-with geographic context: This method is the one named TSE (traffic speed estimation) in (Shang et al., used in traffic forecasting based on a neural network architecture, described for instance in (Dougherty & Cobbett, SAE: This is the method described in (Lv et al., 2014). We also compare RAINSTORM with a model based on a heuristic able to perform both completion and prediction that we call RoadMean and can be described as follow: this model predicts and fills missing value with the mean of observed values on the sequence.

3.2. Experiments and Results

We compare our model with baselines approach for the two tasks of completion and prediction. Results are reported in described baselines models and the RAINSTORM model for different sizes N of the latent space with a root mean square error (RMSE)

Model/Dataset 5 10 20 50 5 10 20 50

RoadMean

NeuralNetwork

SAE RAINSTORM RAINSTORM RAINSTORM RAINSTORM

RoadMean

MF-Geo RAINSTORM RAINSTORM RAINSTORM RAINSTORM

Beijing

Volume 5.51 4.77 4.75 4.82 4.78 4.54 4.66

Beijing

Volume 5.55 3.58 3.24 2.99 3.03 3.22 2.97 Volume 5.09 4.27 4.27 4.28 4.20 4.21 4.20 Speed 11.02 8.05 7.85 7.74 7.21 7.19 7.60

Warsaw

Volume 5.00 3.16 2.99 3.12 3.00 2.94 2.93 Speed 11.10 6.80 6.49 6.49 6.24 6.23 6.70

4. Conclusion

We have presented a new way to learn over incomplete multiple sources of temporal relational data sources. The RAINSTORM approach is based on representation learning techniques and aims at integrating in a latent space the observed information, the dynamicity of the sequences of data, and their relations. In comparison to baselines models that have been developed for prediction only or completion only, our approach shows interesting performance and is able to simultaneously complete missing values and predict the future evolution of the data. Dougherty, Mark S and Cobbett, Mark R. Short-term interurban traffic forecasts using neural networks. InternaICDM. I.E.E.E icdm contest: Tomtom traffic prediction for intelligent gps navigation. and Wang, F-Y. Traffic flow prediction with big data: A deep learning approach. 2014. Yu, Yong. Inferring gas consumption and pollution emission of vehicles throughout a city. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014.

Zheng, Yu. T-drive trajectory data sample, August 2011. URL