Car-traffic forecasting: A representation learning approach Ali Ziat, Gabriella Contardo, Nicolas Baskiotis, Ludovic Denoyer FIRSTNAME . LASTNAME @ LIP 6. FR Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France Abstract allowing a fast optimization over large scale datasets. We address the problem of learning over multiple inter-dependent temporal sequences where de- pendencies are modeled by a graph. We propose 2. Context and Model a model that is able to simultaneously fill in miss- 2.1. Notations and Tasks ing values and predict future ones. This approach is based on representation learning techniques, Let us consider a set of n temporal sequences x1 , ..xn such (t) where temporal data are represented in a latent that xi ∈ X is the value of the i-th sequence at time t vector space. Information completion (missing (1) (T ) defined by xi = (xi , .., xi ) . In the case where X is values) and prediction are then performed on this m R , the context corresponds to multiple multivariate time latent representation. In particular, the model al- series. The sequences contain missing values so we also lows us to perform both tasks using a unique for- (t) (t) (t) define a mask mi such that mi = 1 if value xi is ob- malism, whereas most often they are addressed served - and thus available for training the system - and separately using different methods. The model (t) (t) mi = 0 if xi is missing - and thus has to be predicted has been tested for a concrete application: car- by the model. In addition, we consider that there exists a traffic forecasting where each time series char- set of relations between the sequences which correspond to acterizes a particular road and where the graph an external information, like spatial proximity for example structure corresponds to the road map of the city. when X is discrete. The sequences are thus organized in a graph G = {ei,j } such that ei,j = 1 means that xi and xj are related, and ei,j = 0 elsewhere. 1. Introduction Traffic data has particular characteristics that can not be 2.2. Model fully handled by classical sequential and temporal models: The RepresentAtIoN-baSed TempOral Relational Model they contain multiple missing values, and one has to con- (RAINSTORM) is a loss-based model which is described sider simultaneously multiple sources that can be somehow through a continuous derivable loss function that will be related, by spatial proximity for example. optimized using classical optimization techniques. We propose a novel method that aims at integrating these Let us define L(θ, γ, z) the loss function to minimize where aspects in one single model. The proposed approach is (t) z is the set of all the vectors zi for i ∈ [1..n] and t ∈ based on representation learning techniques aiming at [1..T ], T being the size of the observed time windows i.e. projecting the observations in a continuous latent space, the history of the time series. We define L as: each sequence being modeled at each time-step by a point in this space. It has many advantages w.r.t existing 1 X X (t) n T (t) (t) techniques: it is able to simultaneously learn how to fill L(θ, γ, z) = m ∆(fθ (zi ), xi ) (term 1) missing values and to predict the future of the observed O i=1 t=1 i temporal data, avoiding to use two different models, and X X n T −1 (t+1) (t) it naturally allows one to deal with information sources + λdyn ||zi − hγ (zi )||2 (term 2) that are organized among a graph structure. Moreover, i=1 t=1 the model is based on continuous optimization schemes, X X T (t) (t) + λstruct ei,j ||zi − zj ||2 (term 3) Proceedings of the 2 nd International Workshop on Mining Urban i,j∈[1..N ]2 t=1 Data, Lille, France, 2015. Copyright c 2015 for this paper by its (1) authors. Copying permitted for private and academic purposes. where O is the number of observed values i.e. values such 38 Car-traffic forecasting: A representation learning approach (t) that mi = 1. 2014). For the prediction task, we consider: This loss function contains three terms, each one associ- NeuralNetwork: This is the classical baseline method ated with one of the constraints that have been presented used in traffic forecasting based on a neural network archi- previously: tecture, described for instance in (Dougherty & Cobbett, 1997). • Term 1 aims at simultaneously learn z and a function (t) SAE: This is the method described in (Lv et al., 2014). fθ - called decoding function - such that, from zi , (t) fθ can be used to predict the value xi . The function We also compare RAINSTORM with a model based on a (t) heuristic able to perform both completion and prediction fθ (zi ) is defined as fθ : RN → X . ∆ is used to (t) that we call RoadMean and can be described as follow: measure the error between predicting fθ (zi ) instead this model predicts and fills missing value with the mean (t) (t) of xi , mi playing the role of a mask restricting to of observed values on the sequence. compute this function only on the observed values. (.) • Term 2 aims at finding values zi and a dynamic (t) 3.2. Experiments and Results model hγ such that, when applied to zi , hγ allows us to predict the representation of the next state of time We compare our model with baselines approach for the two (t+1) series i i.e. zi . hγ is the dynamic function which tasks of completion and prediction. Results are reported in models the dynamicity of each series directly in the Table 1. and Table 2. latent space: hγ : RN → RN . The parameters γ will Table 1. Prediction at T + 1, comparison between described base- be learned to minimize the mean square error between lines models and the RAINSTORM model for different size of (t) (t+1) the prediction hγ (zi ) and zi . latent space N with a root mean square error (RMSE) N Model/Dataset Beijing Warsaw • At last, term 3 corresponds to a structural regular- Volume Volume Speed ity over the graph structure that encourages the model RoadMean 5.51 5.09 11.02 to learn closer representations for time series that are NeuralNetwork 4.77 4.27 8.05 related. This will force the model to learn representa- SAE 4.75 4.27 7.85 tions that reflect the structure of the considered graph. 5 RAINSTORM 4.82 4.28 7.74 λdyn and λstruct are manually defined coefficients that 10 RAINSTORM 4.78 4.20 7.21 weight the importance of the different elements in the loss 20 RAINSTORM 4.54 4.21 7.19 function. 50 RAINSTORM 4.66 4.20 7.60 The learning problem aims at minimizing the loss function Table 2. Completion for 50% missing data, comparison between L(θ, γ, z) simultaneously on θ, γ and z. By restricting the described baselines models and the RAINSTORM model for dif- fθ and hγ to be continuous derivable functions, we can use ferent sizes N of the latent space with a root mean square error gradient-descent based optimization approaches. (RMSE) N Model/Dataset Beijing Warsaw 3. Traffic Forecasting and Experiments Volume Volume Speed RoadMean 5.55 5.00 11.10 Experiences have been made on two datasets from the MF 3.58 3.16 6.80 cities of Beijing and Warsaw. The dataset are provided by MF-Geo 3.24 2.99 6.49 (Zheng, 2011) and (ICDM) and are not described here for 5 RAINSTORM 2.99 3.12 6.49 sake of space. 10 RAINSTORM 3.03 3.00 6.24 20 RAINSTORM 3.22 2.94 6.23 3.1. Models 50 RAINSTORM 2.97 2.93 6.70 We propose to compare the RAINSTORM approach to the following baseline models, some baselines being used for 4. Conclusion data completion, and some others for prediction. For the completion problem we consider the following models: We have presented a new way to learn over incomplete MF: This correspond to the classical matrix factorization multiple sources of temporal relational data sources. The framework for data completion. RAINSTORM approach is based on representation learn- MF-with geographic context: This method is the one ing techniques and aims at integrating in a latent space the named TSE (traffic speed estimation) in (Shang et al., observed information, the dynamicity of the sequences of 3e Car-traffic forecasting: A representation learning approach data, and their relations. In comparison to baselines mod- els that have been developed for prediction only or com- pletion only, our approach shows interesting performance and is able to simultaneously complete missing values and predict the future evolution of the data. References Dougherty, Mark S and Cobbett, Mark R. Short-term inter- urban traffic forecasts using neural networks. Interna- tional journal of forecasting, 13(1):21–31, 1997. ICDM. I.E.E.E icdm contest: Tomtom traffic prediction for intelligent gps navigation. Lv, Yisheng, Duan, Yanjie, Kang, Wenwen, Li, Zhengxi, and Wang, F-Y. Traffic flow prediction with big data: A deep learning approach. 2014. Shang, Jingbo, Zheng, Yu, Tong, Wenzhu, Chang, Eric, and Yu, Yong. Inferring gas consumption and pollution emis- sion of vehicles throughout a city. In Proceedings of the 20th ACM SIGKDD international conference on Knowl- edge discovery and data mining, 2014. Zheng, Yu. T-drive trajectory data sample, August 2011. URL http://research.microsoft. com/apps/pubs/default.aspx?id=152883. 3d