Distributed Traffic Flow Prediction with Label Proportions: From in-Network towards High Performance Computation with MPI Thomas Liebig THOMAS . LIEBIG @ TU - DORTMUND . DE University of Dortmund, 44221 Dortmund, Germany Marco Stolpe MARCO . STOLPE @ TU - DORTMUND . DE University of Dortmund, 44221 Dortmund, Germany Katharina Morik KATHARINA . MORIK @ TU - DORTMUND . DE University of Dortmund, 44221 Dortmund, Germany Abstract 1. Introduction Traffic flow prediction is an important task for traffic man- Modern traffic management should benefit from agers. It allows performance assessment of major traffic in- the diverse sensors, smart phones, and social net- frastructure, like roads and junctions. Individual mobility works data that offer the potential of enhanced benefits from predictions, as they provide necessary data services. In disaster scenarios, it is no longer for proactive, smart decisions on individual travel plans, guaranteed that a central server and reliable com- e.g. predictive situation-aware trip planning by avoidance munication is always available. This motivates of likely traffic hazards (Niu et al., 2015; Liebig et al., a distributed computing setting with restricted 2014). Traffic flow models are based on sensor observa- communication. Also in distributed High Perfor- tions of current traffic gained by a mesh of (mostly static) mance Computing communication costs have to presence sensors. While existing learning methods central- be reduced to the minimum and costly broadcast ize and process measurements on a dedicated traffic man- to all compute nodes hould be avoided. We want agement server, they have some major drawbacks, partic- to learn local models with high communication ularly in cases of disaster: The need for a reliable com- efficiency. They still require the exchange of la- munication infrastructure reduces sustainability in case of bel information in a setting of supervised learn- natural hazards. First, the server-side collection causes ing. The transmission of all labels among the high communication costs, decreasing the system’s abil- nodes can be as costly as communicating all ob- ity to process all sensor data, in time. Second, the area of servations. Sophisticated methods are required traffic prediction systems is limited by the political area of to trade-off prediction performance against com- homogeneous regulations for sending the data through the munication costs. network. Third, increasing the network’s density bares the We hereby present an in-network algorithm risk of re-identification of individual persons and tracking based on local models that only sends label them throughout the network. Existing systems are there- counts to neighboring nodes. Therefore the fore limited by communication bandwidths, processing ca- method is a novel approach that transfers no pabilities and political regulations. data about individual observations, but just ag- gregated label information. We outline its MPI We tackle these limitations by a distributed spatio-temporal implementation. And evaluate our approach on in-network learning algorithm, where sensors compute real world data in a traffic monitoring scenario. local models and efficiently communicate label counts Tests reveal that in comparison to sending all la- with their topological neighbors. Our approach sends bels, the algorithm is scalable. space-time aggregated values that, by design, provide k- anonymity. Hence, our method is privacy preserving and can be applied for large-scale traffic management scenar- ios. Our particular focus is on the prediction of future traf- Proceedings of the 2 nd International Workshop on Mining Urban Data, Lille, France, 2015. Copyright c 2015 for this paper by its fic flow at junctions throughout the region of interest (e.g. authors. Copying permitted for private and academic purposes. a city, a state or even areas at European scale). Possible je Distributed Traffic Flow Prediction with Label Proportions applications comprise, for instance, 2. Related Work Many distributed data mining algorithm learn from hor- • distributed car-to-car scenarios where cars or trucks izontally partitioned data, whereas our data is vertically communicate at junctions the number of observed partitioned. In this context, privacy-preserving SVMs vehicles at the road to estimate traffic flow and al- like (Yunhong et al., 2009) are not scalable, since they ter their individual transportation plans based on pre- send quadratic kernel matrices to a central server. Dis- dicted traffic conditions, or, tributed optimization algorithms (Bellet et al., 2014) ex- change predictions for each observation per iteration, po- • large scale traffic flow prediction that processes mas- tentially sending more than the entire dataset. So does a sive local observations on a high performance com- co-regularized least squares regression in (Brefeld et al., puter. 2006). Communication-efficient anomaly detection algo- rithms (Das et al., 2011; Stolpe et al., 2013) combine local Scalable in-network algorithms belong to the field of dis- and global models, but are 1-class algorithms reducing data tributed data mining. Existing work mostly focuses on hor- sent about observations, not labels. In (Lee et al., 2012), lo- izontally partitioned data. There, full observations, i.e. all cal support vector machine (SVM) models are trained, but features and labels, are stored on different nodes in a net- all labels are sent by a central server. work. However, network states representing the current traffic flow are vertically partitioned. Here, only partial in- Also in traffic flow prediction, most literature describes formation about observations is stored on different nodes. processes on central servers. There are two major ways Learning and prediction therefore either require the trans- to model traffic: using a simulation (Raney & Nagel, 2006) mission of observations or labels to other nodes. Previ- or applying an imputation model, trained on previous sen- ous work (Das et al., 2011; Lee et al., 2012; Stolpe et al., sor measurements. Models are required for the estimation 2013) has focused on sending less information about ob- of traffic flow at locations not being observed at all. Such servations to a central coordinator. Here, we deal with re- imputation is not the focus of our study, but the predic- ducing the amount of labels sent to neighboring peer nodes. tion of traffic flow at sensor locations. We point the inter- Communication-efficient algorithms for vertical distributed ested reader to methods of simulation (e.g. cellular automa- learning are not just relevant for traffic flow prediction, but ton (Raney & Nagel, 2006)) and model-based imputation for applications as diverse as intrusion detection, monitor- (e.g. (Liebig et al., 2012)). Most learning-based traffic flow ing production processes or smart grid management. The prediction methods analyse time series, where a popular main contributions of our work are the following: model is based on auto-regressive integrated moving aver- age (ARIMA) (Ahmed et al., 1979). Recently, an applica- 1. We introduce a privacy-preserving approach for the tion of a Gaussian Markov Model was proposed in (Schnit- distributed learning of spatio-temporal prediction zler et al., 2014), and more advanced graphical models, models which transfers only aggregated label infor- namely Spatio-Temporal-Random-Fields (STRFs), were mation, but no data about individual observations. applied to traffic modeling in (Piatkowski et al., 2013). Distributed approaches comprise an approach that applies 2. A connection is drawn between the task of learning kNN and Gaussian Process Regression (Chen et al., 2014), from label proportions and reducing communication on-line distributed prediction of traffic flow in a large-scale costs in distributed environments, and it is evaluated road network (Wang et al., 2014), distributed traffic model- on real-world data. ing in a MapReduce framework (Chen et al., 2013), Mapre- duce parallel multivariate regression (Dai et al., 2014) and 3. We introduce a fast search strategy for the LLP al- MPI (Message Passing Forum, 1994) based high perfor- gorithm (Stolpe & Morik, 2011) and demonstrate its mance computation based on SVM (Yang et al., 2014). Few prediction performance in the context of traffic flow distributed approaches combine sketches of neighbouring prediction. sensors to get probabilistic estimates of the number of vehi- cles co-occurring at different locations. Instead of counting The next section reviews related work. Section 3 details our and re-identifying individual vehicles, we use aggregated problem setting and introduces a novel approach for the in- quantities. network training of local models. Section 4 discusses learn- ing from aggregated label information, discusses its imple- The task of learning from aggregated label information was mentation in using message passing interface (MPI), anal- first introduced in (Kück & de Freitas, 2005). Theoreti- yses its communication cost and aspects of privacy. Eval- cal bounds have only recently been proven in (Yu et al., uations of our approach can be found in Sect. 5. We finish 2014). (Musicant et al., 2007) propose variants of exist- with conclusions and outlook on future work. ing algorithms. The SVM optimization problem has been jd Distributed Traffic Flow Prediction with Label Proportions adapted to the setting (Rüping, 2010; Yu et al., 2013). Before training, each Pi preprocesses measurements Vi as Mean Map (Quadrianto et al., 2009) estimates the mean follows. A window of size p is slided over the series operator solving a system of linear equations, while (Pa- Vi with step size 1, storing all thereby created windows (i) (i) (i) trini et al., 2014) extend it with a manifold regulariza- xt = {vt−p+1 , . . . , vt }, t = p, . . . , n as rows in a tion, outperforming both SVMs and Mean Map on stan- (i) (i) dataset Di . Let N (i) = {n1 , . . . , nc } be the set of in- dard datasets. A modified Kernel k-Means algorithm (Chen dices for the c neighboring nodes around Pi . Based on et al., 2009) minimizes the distance to the given label the datasets Di , Dn(i) , . . . , Dn(i) and labels Li , we want proportions by matrix factorization. Recent work learns 1 c Bayesian network (Hernndez-Gonzlez et al., 2013) and to learn a local function (model) f (i) that, given windows (i) (i) n n(i) generative (Fan et al., 2014) classifiers. The LLP algorithm xt , xt 1 , . . . , xt c of sensor readings from node Pi and proposed in (Stolpe & Morik, 2011) first determines clus- (i) its neighbors, predicts the label yt+r at node Pi with hori- ters and then tries to label them. LLP only has linear run- zon r correctly. ning time, while its prediction performance competes with (i) (i) n n(i) the approaches in (Quadrianto et al., 2009; Rüping, 2010) Interpreting windows xt , xt 1 , . . . , xt c as features of a and(Chen et al., 2009). single observation x that should be classified, the data is vertically partitioned, since each neighboring node of Pi 3. Distributed Learning of Spatio-Temporal only stores partial information about x, i.e. a subset of fea- tures. Local Models An obvious choice for the training of f (i) at Pi is to ask Given are m distributed sensor nodes P1 , . . . , Pm . Each for the recorded measurements at each neighboring node, sensor node Pi delivers an infinite series of real-valued (i) (i) (i) concatenate their columns at Pi and join the labels stored measurements . . . , vt−1 , vt , vt+1 , . . . for different time at Pi to the new dataset. The approach is more scalable points . . . , t − 1, t, t + 1, . . .. Time spans between two than centralizing all data, since the number c of neighbors measurements are equidistant, given a constant sample rate. is fixed, avoiding the bottleneck problem of limited band- Let t denote the current time of measurement, while t − a width. However, each node still needs to transmit all mea- and t+a are time points a steps in the past and future. Each surements to each of its neighbors, consuming at least as sensor node also has a spatial location. much energy per node as sending all data to a single server. Many traffic flow management tasks require the prediction Therefore, we propose to send only label information from of traffic flow categories, that are achieved by a discretiza- (i) node Pi to its neighbors and to train models f0 at node Pi tion of raw values into distinct intervals (e.g. risk level (i) (i) assignment or decision for emergency traffic signals).The and f (i) , . . . , f (i) at its neighbors. As model f (i) at node n1 nc task, given the current time point t, is therefore to predict a Pi , we propose a majority vote over predictions from itself label y from a set Y = {Y1 , . . . , Yl } of distinct categories and its neighboring nodes. All models are local, since they at some arbitrary node Pi at future time point t + r, based only consider measurements and labels of a fixed number on the current and previous (raw) sensor readings at all or of close topological neighboring nodes around Pi . More- a subset of nodes P1 , . . . , Pm . over, the approach works fully in-network without a central coordinator, since each node only communicates with its We assume that for learning, measurements and labels neighboring peer nodes. As learners at each node, one may are somehow recorded (see below) over a fixed-length consider supervised learners, like kNN, Decision Trees or time period. For the supervised training of prediction SVMs. Considering the limited computational resources of models, each node Pi thus provides a sequence Vi = (i) (i) (i) sensor nodes, however, our evaluation in Sect. 5 is solely hv1 , . . . , vn i of measurements, vj ∈ R, and a sequence based on kNN. (i) (i) (i) Li = hy1 , . . . , yn i of labels yj ∈ Y . Since the number of bits to encode all labels is often less D ISTRIBUTED L EARNING OF L OCAL M ODELS than an encoding of all measurements, communication is saved by sending labels from Pi instead of measurements Instead of centralizing all data, we propose that each Pi to Pi . However, supervised learning still requires individ- records and stores its own measurements and labels. For ual labels for all observations. The question is if communi- predicting future traffic flow categories at node Pi , we re- cation can be reduced even further, by sending fewer labels strict learning to Pi itself and c topological neighboring or aggregated label information to each neighboring node. nodes around Pi . For instance, to learn and predict the fu- Semi-supervised (Chapelle et al., 2006) and active learn- ture type of traffic flow at some street junction, considered ing (Balcan et al., 2010) show that training on fewer labels are only measurements and labels recorded at the junction may achieve a similar performance as training on all labels. itself and at c junctions closest to it. j3 Distributed Traffic Flow Prediction with Label Proportions However, such methods do not preserve the privacy of the task of cluster analysis consists of partitioning a set of data, since they need individual labels of observations (see observations into a set C of k disjunct groups (clusters) Sect. 4). Instead, we propose to send only aggregated la- C1 , . . . , Ck , such that the similarity of observations in each bel information, i.e. label counts, to neighboring nodes for cluster is minimized. LLP relies on the idea that observa- learning. tions having the same class also share similar features, i.e. that clusters somehow correspond to classes. LLP allows 4. Aggregation of Label Information for several clusters per class and assumes that the majority of elements of a cluster belongs to the same class. Once Before sending label information to each of its neighbor- given a clustering the only remaining problem is to assign ing nodes, Pi divides its time-related sequence Li of labels correct labels to each cluster. (i) (i) into consecutive batches C1 , . . . , Ch of a fixed size b More formally, let µ : X → C be a mapping that assigns (see Fig. ??). It respects the prediction horizon r, such that an arbitrary observation x ∈ X to a cluster C ∈ C. For (i) each Cj consists of labels from time point t + (j − 1)b + r centroids c1 , . . . , ck found with k-Means, µ(x) would be to t + jb + r and align correctly with time points of ob- defined as µ(x) = argminCk ∈C ||x − ck ||2 . servations (i.e. windows of measurements) at other nodes. Let n be from here on the size of datasets Di , i.e. the Further, let ℓ : C → Y be a mapping which assigns a label number of windows stored. Then, h is ⌈n/bs⌉. For each λ ∈ Y to each cluster C ∈ C. For ease of notation, let f de- (i) batch j, labels y ∈ Y are aggregated by counting how of- note model f (i) to be learned at node Pn(i) , Bi denote the ne e ten they occur, and stored in a h × l matrix of label counts n(i) (i) (i) (i) batch Bi and Π denote matrix Π . f is the composition e Q(i) = (qjd ), where qjd = |{y ∈ Cj |y = Yd }| . of mappings ℓ and µ, i.e. f = ℓ ◦ µ. Let Pn(i) be a neighboring node receiving label counts from With prediction model f , entries γjd of a model-based pro- e Pi . Pn(i) transforms Q(i) into a label proportion matrix portion matrix Γf = (γjd ) can be calculated as e (i) (i) Π(i) = (πjd ) = qjd /b, i.e. the counts of labels are divided  1 X 1 : f (x) = Yd by batch size b. Since every node knows b and r, Pn(i) can γjd = I(f (x), Yd ), I = . e |Bj | 0 : f (x) 6= Yd n(i) n(i) x∈Bj partition its own windows x1 e , . . . , xne of measurements (1) n(i) n(i) into batches B1 e , . . . , Bh e . Since the sender respects r, The LLP algorithm now minimizes the mean squared error the time spans used for aggregating the labels align cor- rectly with the windows of measurements stored at Pn(i) . h l e 1 XX The learning task at node Pn(i) now consists of learning a MSE(Π, Γf ) = (πjd − γjd )2 , (2) e hl j=1 (i) d=1 model f (i) , only based on its batches of (unlabeled) mea- ne surements and the label information from node Pi , stored between the given label proportion matrix Π and the model- in the label proportion matrix Π(i) , such that the expected based proportion matrix Γf by trying different label map- prediction error over individual observations is minimized. pings ℓ. This task is also known as learning from label proportions. A L OCAL S EARCH S TRATEGY WITH M ULTISTARTS Several methods have been developed to solve the task (see Sect. 2). Considering the limited computational resources The LLP algorithm as introduced in (Stolpe & Morik, of sensor nodes, the LLP algorithm (Stolpe & Morik, 2011) 2011) can work with different cluster algorithms and la- looked most promising for our evaluation in Sect. 5, since beling strategies. LLP with an exhaustive labeling strategy, LLP has a linear running time and its centroid model a called LLPexh in the following, tries all possible labelings small memory footprint. Moreover, it can handle multi- of the clusters. We found it too time-consuming for the class classification problems as they arise in traffic monitor- evaluations done in Sect. 5. The greedy strategy proposed ing. However, we found that it still needs to be improved in (Stolpe & Morik, 2011) didn’t achieve sufficient accura- for scalability issues and performance. The next section cies for traffic prediction. Hence, a better search strategy is describes LLP shortly, while Sect. 4 introduces a new local demanded. search method. We propose a local search that is started multiple times with different random combinations of labels. LLP with this T HE LLP A LGORITHM search strategy will be called LLPlsm in the following. The LLP learns from label proportions by first clustering all ob- local search greedily improves on the current labeling of servations and then assigning labels to each cluster. The clusters by trying all possible labels at each component of a labeling vector λ. Fitness measures how well the model- jN Distributed Traffic Flow Prediction with Label Proportions based label proportion matrix Γf , as calculated from the should be avoided. Our algorithm respects both findings current labeling, matches the given label proportions. If and therefore seems suitable for an MPI implementation. the fitness improves, the search starts from the first com- For ease of development we decided for the Cran-R pack- ponent of the labeling vector λ, again. Otherwise, it resets age Rmpi (Yu, 2002) which provides basic MPI functional- the label at the current position kpos to the label of the best ities in Cran-R, thus matrix operators can be applied to the (local) solution found so far. Returned is the best labeling data. Our program comprises the following generic steps: found over all starts of the different greedy searches. 1. Load Rmpi, and spawn slaves In each iteration, the greedy search runs until no further improvement is possible. Moreover, at each step of the al- 2. Definition of the functions for the master gorithm, the fitness either improves or is staying the same 3. Definition of necessary functions for the LLP algo- (which is a stopping criterion). Therefore, each search finds rithm at the slave a local minimum. Since the number of searches is finite, the returned labeling vector is also locally minimal. In com- 4. Initialization of the data parison to LLPexh , it cannot be guaranteed that a globally 5. Send required data and functions to the slaves optimal solution is found. However, with regard to the pre- diction results presented in Sect. 5, we found that a local 6. Tell slaves to execute their function search performed sufficient enough, despite a much lower 7. Communicate with the slaves to perform computation running time. 8. Collect the results LLP as introduced in (Stolpe & Morik, 2011) combines the MSE with two other error measures. However, we found 9. Close slaves and quit that the use of these additional measures decreases the ac- curacy in the traffic monitoring scenario. Hence, all exper- For the learning from label proportions, our implementa- iments in Sect. 5 are based on the MSE, only. Similarly, we tion presumes a shared network file system and initially abstain from the evolutionary feature weighting presented processes the data at the master such that the sliding win- in (Stolpe & Morik, 2011), since it would heavily increase dows of the measurements are stored as Robjects on the the algorithm’s running time. file system. Every task gets its pointer to the correspond- ing slice of data and the label proportions of neighbouring MPI IMPLEMENTATION nodes. The LLP algorithm is executed in every task and the trained models (cluster centers and their labels) are again We explicitly focus on the implementation with the Mes- stored physically for later re-use. This also allows the de- sage Passing Interface (MPI) (Message Passing Forum, ployment of the parallel learned models in embedded de- 1994) as message passing in the top super computers in the vices or the future application of the label proportion mod- top500 list1 base on the MPI standard. MPI implements els in high performance computation settings. Next subsec- the single program multiple data paradigm (Darema, 2001) tion analyses the communication cost of LLP in compari- whereas every node of a distributed system executes the son to kNN algorithm. same program but uses different data. To coordinate this architecture a MPI program consists of 1 master node and A NALYSIS OF C OMMUNICATION C OSTS multiple slave nodes that perform computations, the results are collected at the master. The art of MPI programming is Each node Pi transmits a matrix Q to each of its neighbor- to divide the problem into multiple tasks that are transferred ing nodes, consisting of counts for each label Yd ∈ Y and to the slaves and processed thereby. Usually, the number of batch. Such counts may be assumed to be integers. The tasks exceeds the number of slave nodes and the distribu- maximum value of each integer is b, which means we need tion of the tasks has to be organized by the master. to reserve at most ⌈log2 b⌉ bits for each label. The num- ber of batches, given n observations, is ⌈n/b⌉. The total The work in (Nupairoj & Ni, 1994) analyses the perfor- number of bits zAGG for encoding matrix Q is therefore mance of such MPI systems and reveals that communica- tion is major bottleneck in MPI programs. A succeeding lnm publication (Piernas et al., 1997) provides empirical esti- zAGG = ⌈log2 b⌉|Y | . (3) b mates for computation of communication costs depending on message lengths. Based on this publications two major In comparison, the number of bits zALL required to encode conclusions ban be made: (1) The shorter the messages the all labels of n observations, for |Y | different labels, is at lower the communication cost, and(2) broadcast messages most 1 http://www.top500.org/lists/2014/11/, last accessed May, 1st zALL = n⌈log2 |Y |⌉ . (4) 9y Distributed Traffic Flow Prediction with Label Proportions The total costs are then either zAGG or zALL , multiplied by 01/01/2013 till 14/05/2013, consisting of tuples (t, u, w), the number of nodes m. Here we assume that label infor- where u is the location of the observation and consists of mation is broadcast to each neighboring node, which is not an index for the junction, the arm and the lane number at unrealistic for sensors in topologically close regions. All which the sensor is located at. The metric w contains the payloads reported in Sect. 5 base on this assumption. aggregated vehicle count at sensor location since last mea- surement. The time stamp t denotes the recording time. A NALYSIS OF P RIVACY The vulnerable data are the original sensor readings. These 95 Accuracy of kNN vs. LLP-lsm with different aggregations 350 Payload of kNN vs. LLP with different aggregations traffic flow measurements bare the risk of re-identification 90 300 250 accuracy (%) 85 of individual vehicles. For example in a dense sensor net- Kbytes 200 80 work with sparse observations of vehicles, their occurrence 75 150 100 may be tracked throughout the network. As mobility often 70 50 is a regular behaviour and contains patterns this risk is even 65 kNN LLP-25 LLP-50 LLP-75 LLP-100 0 kNN LLP-25 LLP-50 LLP-75 LLP-100 higher. In this section we show that our LLPlsm -based al- model model gorithm transforms the data such that re-identification risk is at most 1/s. Figure 1. Trade-off between accuracy and payload sent for kNN and LLPlsm In our distributed setting, adversaries of a particular sensor node are malicious sensors that could use received mea- Local models are trained for each of the 296 sensor nodes surements of neighboring sensors for deduction of individ- and their nearest topological neighbors. As supervised ual mobility traces. The following attack model is possi- base-line learner that receives all labels, we use kNN with ble: The adversary analyses differences among neighbor- k = 15. For learning from aggregated label counts, we ing sensor readings and deduces individual movement. If cluster the observations at each node with k-Means (k = the difference among two neighboring sensor readings is 15, 50 different random starting points, 500 iterations at zero and both traffic flow counts are w, it is (depending maximum) and label the clusters with LLPlsm (with 150 on network topology) likely that w vehicles moved be- starts of the local greedy search) at each node for different tween the two sensors. In case of three neighboring sensors batch sizes b = 25, 50, 75 and 100. The accuracy of each Pa , Pb , Pc their measurements va , vb , vc can be combined method is assessed by a 10-fold cross validation, i.e. all as follows: If va − vb = w = vc it may be deduced that models are trained and evaluated for different hold-out sets on the way from Pa to Pb w vehicles turned to Pc , in case 10 times. In total 296 × 7 × 10 = 20, 720 models for kNN va − vb = −w = −vc w vehicles originated from the loca- need to be evaluated and 296 × 7 × 10 × 4 = 82, 880 mod- tion Pa . els trained and evaluated for LLPlsm . The evaluation has With our new LLPlsm -based approach we process dis- been done offline in parallel on different machines (about cretized traffic flow values and just communicate counts of 36 CPU cores). these value ranges. We denote the minimal (nonzero) in- Figure 1 shows the trade-off between accuracy and pay- terval width by s. Thus, measurements may not be distin- load sent for kNN and LLPlsm trained on differently sized guished up to a granularity of s vehicles and w is bounded batches of aggregated labels. Besides the average accuracy by s, w ≥ s. In turn, the risk of re-identification with over all 10-fold cross-validations at each node, the bars in the hereby described attack model is at most 1/s. Our ap- Fig. 1 (left) also depict the standard deviation of accuracy proach therefore provides s-anonymity by design. The ag- over all nodes. gregation of label information reduces the remaining risk for disclosure of neighboring labels at a malicious sensor In general, LLPlsm performs slightly worse than kNN. Nev- node. The solely transmission of label counts prevents ertheless, there are still many junctions for which the traffic doubtless reconstruction of the labels (Yu et al., 2014). flow is predicted quite well with LLPlsm . Some locations have bad performance with both methods, a comparison to the map reveals that these are locations of parking areas 5. Experiments e.g. inner-city parking houses and recreational areas where We perform tests of the method on data of the city of many vehicles stay for a long period of time. Dublin. The Sydney Coordinated Adaptive Traffic Sys- tem (SCATS) provides information on vehicular traffic at 6. Conclusions over 750 fixed sensor locations as spatio-temporal time se- ries (McCann, 2014). The data we use2 is a snapshot from The task of scalable traffic flow prediction involves a trade- off between the accuracy of models and the amount of com- 2 Data is publicly available at http://dublinked.ie . munication between networked nodes. Especially in high 9R Distributed Traffic Flow Prediction with Label Proportions performance computation and embedded devices commu- Chen, S., Liu, B., Qian, M., and Zhang, C. Kernel k- nication is costly. Means based framework for aggregate outputs classifi- cation. In Proc. of the Int. Conf. on Data Mining Work- In this paper we presented a novel approach for local mod- shops (ICDMW), pp. 356–361, 2009. els that trades-off communication costs to prediction accu- racy which is suitable for in-network deployment and clus- Chen, Xing-Yu, Pao, Hsing-Kuo, and Lee, Yuh-Jye. Ef- ter computations. ficient traffic speed forecasting based on massive het- Future work will focus on examining more sophisticated erogenous historical data. In Big Data (Big Data), 2014 aggregation strategies for labels. We will study how to in- IEEE International Conference on, pp. 10–17, Oct 2014. clude dynamic distributed traffic flow prediction in state- doi: 10.1109/BigData.2014.7004425. of-the-art (multi-modal) route planning methods proposed Dai, Liang, Qin, Wen, Xu, Hongke, Chen, Ting, and Qian, by (Bast et al., 2014). Chao. Urban traffic flow prediction: A mapreduce based parallel multivariate linear regression approach. In Intel- Acknowledgements ligent Transportation Systems (ITSC), 2014 IEEE 17th International Conference on, pp. 2823–2827, Oct 2014. This research has received funding from the European doi: 10.1109/ITSC.2014.6958142. Union’s Seventh Framework Programme under grant agreement number FP7-318225, INSIGHT. Additionally, Darema, Frederica. The spmd model: Past, present and this work has been supported by Deutsche Forschungsge- future. In Recent Advances in Parallel Virtual Machine meinschaft (DFG) within the Collaborative Research Cen- and Message Passing Interface, pp. 1–1. Springer, 2001. ter SFB 876, project B3. We thank Jan Czogalla for data preprocessing. Das, K., Bhaduri, K., and Votava, P. Distributed anomaly detection using 1-class SVM for vertically partitioned data. Stat. Anal. Data Min., 4(4):393–406, 2011. References Fan, K., Zhang, H., Yan, S., Wang, L., Zhang, W., and Ahmed, M.S., Cook, A.R., of Oklahoma. School of Feng, J. Learning a generative classifier from label pro- Civil Engineering, University, and Science, Environ- portions. Neurocomput., 139:47–55, 9 2014. mental. Analysis of Freeway Traffic Time Series Data Using Box and Jenkins Techniques. 1979. Hernndez-Gonzlez, J., Inza, I., and Lozano, J. A. Learn- Balcan, M.-F., Hanneke, S., and Vaughan, J. W. The true ing bayesian network classifiers from label proportions. sample complexity of active learning. Machine Learn- Pattern Recognition, 46(12):3425–3440, 2013. ing, 80(2–3):111–139, 2010. Kück, H. and de Freitas, N. Learning to classify individuals Bast, Hannah, Delling, Daniel, Goldberg, Andrew, Müller- based on group statistics. In Proc. of the 21th UAI, pp. Hannemann, Matthias, Pajor, Thomas, Sanders, Peter, 332–339, 2005. Wagner, Dorothea, and Werneck, Renato. Route plan- Lee, S., Stolpe, M., and Morik, K. Separable approximate ning in transportation networks. Technical Report MSR- optimization of support vector machines for distributed TR-2014-4, January 2014. sensing. In Machine Learning and Knowledge Discov- Bellet, A., Liang, Y., Garakani, A. B., Balcan, M.-F., and ery in Databases, volume 7524 of LNCS, pp. 387–402, Sha, F. Distributed Frank-Wolfe algorithm: A unified Berlin, Heidelberg, 2012. Springer-Verlag. framework for communication-efficient sparse learning. CoRR, abs/1404.2644, 2014. Liebig, Thomas, Xu, Zhao, May, Michael, and Wrobel, Stefan. Pedestrian quantity estimation with trajectory Brefeld, U., Gärtner, T., Scheffer, T., and Wrobel, S. Ef- patterns. In Machine Learning and Knowledge Discov- ficient co-regularised least squares regression. In Proc. ery in Databases, pp. 629–643. Springer Berlin Heidel- of the 23rd Int. Conf. on Machine Learning (ICML), pp. berg, 2012. 137–144, New York, NY, USA, 2006. ACM. Liebig, Thomas, Piatkowski, Nico, Bockermann, Christian, Chapelle, O., Schölkopf, B., and Zien, A. Semi-Supervised and Morik, Katharina. Predictive trip planning - smart Learning. MIT Press, Cambridge, MA, 2006. routing in smart cities. In Proceedings of the Workshops Chen, Cheng, Liu, Zhong, Lin, Wei-Hua, Li, Shuang- of the EDBT/ICDT 2014 Joint Conference (EDBT/ICDT shuang, and Wang, Kai. Distributed modeling in a 2014), Athens, Greece, March 28, 2014, volume 1133, mapreduce framework for data-driven traffic flow fore- pp. 331–338. CEUR-WS.org, 2014. casting. Intelligent Transportation Systems, IEEE Trans- actions on, 14(1):22–33, 2013. 9k Distributed Traffic Flow Prediction with Label Proportions McCann, Barry. A review of scats operation and deploy- Schnitzler, François, Liebig, Thomas, Mannor, Shie, and ment in dublin. In Proceedings of the 19th JCT Traf- Morik, Katharina. Combining a gauss-markov model fic Signal Symposium & Exhibition. JCT Consulting Ltd, and gaussian process for traffic prediction in dublin 2014. city center. In Proceedings of the Workshops of the EDBT/ICDT 2014 Joint Conference (EDBT/ICDT Message Passing Forum. Mpi: A message-passing inter- 2014), Athens, Greece, March 28, 2014, volume 1133, face standard. Technical report, Knoxville, TN, USA, pp. 373–374. CEUR-WS.org, 2014. 1994. Stolpe, M., Bhaduri, K., Das, K., and Morik, K. Anomaly Musicant, D. R., Christensen, J. M., and Olson, J. F. Su- detection in vertically partitioned data by distributed pervised learning by training on aggregate outputs. In core vector machines. In European Conf. on Ma- 7th Int. Conf. on Data Mining (ICDM), pp. 252–261, 10 chine Learning and Knowledge Discovery in Databases 2007. (ECML/PKDD), pp. 321–336. Springer, 2013. Niu, Xiaoguang, Zhu, Ying, Cao, Qingqing, Zhang, Xin- Stolpe, Marco and Morik, Katharina. Learning from ing, Xie, Wei, and Zheng, Kun. An online-traffic- label proportions by optimizing cluster model selec- prediction based route finding mechanism for smart city. tion. In Proceedings of the 2011 European Confer- International Journal of Distributed Sensor Networks, ence on Machine Learning and Knowledge Discovery in 501:970256, 2015. Databases - Volume Part III, ECML PKDD’11, pp. 349– Nupairoj, Natawut and Ni, Lionel M. Performance evalu- 364, Berlin, Heidelberg, 2011. Springer-Verlag. ation of some mpi implementations on workstation clus- Wang, Yubin, van Schuppen, Jan H., and Vrancken, Jos. ters. In Scalable Parallel Libraries Conference, 1994., On-line distributed prediction of traffic flow in a large- Proceedings of the 1994, pp. 98–105. IEEE, 1994. scale road network. Simulation Modelling Practice and Patrini, G., Nock, R., Caetano, T., and Rivera, P. (almost) Theory, 47(0):276 – 303, 2014. ISSN 1569-190X. doi: no label no cry. In Advances in Neural Information http://dx.doi.org/10.1016/j.simpat.2014.06.011. Processing Systems 27, pp. 190–198. Curran Associates, Yang, Zhaosheng, Mei, Duo, Yang, Qingfang, Zhou, Hux- Inc., 2014. ing, and Li, Xiaowen. Traffic flow prediction model Piatkowski, Nico, Lee, Sangkyun, and Morik, Katharina. for large-scale road network based on cloud computing. Spatio-temporal random fields: compressible represen- Mathematical Problems in Engineering, 2014, 2014. tation and distributed estimation. Machine Learning, 93 Yu, F. X., Kumar, S., Jebara, T., and Chang, Shih-Fu. On (1):115–139, 2013. ISSN 0885-6125. learning with label proportions. CoRR, abs/1402.5902, Piernas, Juan, Flores, A, and Garcı́a, José M. Analyzing the 2014. performance of mpi in a cluster of workstations based Yu, F. X. Yu, Liu, D., Kumar, S., Jebara, T., and Chang, on fast ethernet. In Recent advances in Parallel Vir- S. ∝SVM for learning with label proportions. In Proc. tual Machine and Message Passing Interface, pp. 17–24. of the 30th Int. Conf. on Machine Learning (ICML), pp. Springer, 1997. 504–512, 2013. Quadrianto, N., Smola, A. J., Caetano, T. S., and Le, Q. V. Estimating labels from label proportions. J. Mach. Yu, Hao. Rmpi: Parallel statistical comput- Learn. Res., 10:2349–2374, 12 2009. ing in r. R News, 2(2):10–14, 2002. URL http://cran.r-project.org/doc/Rnews/ Raney, B. and Nagel, K. An improved framework for large- Rnews 2002-2.pdf. scale multi-agent simulations of travel behavior. To- wards better performing European Transportation Sys- Yunhong, H., Liang, F., and Guoping, H. Privacy- tems, pp. 305–347, 2006. preserving SVM classification on vertically partitioned data without secure multi-party computation. In 5th Int. Rüping, S. SVM classifier estimation from group probabil- Conf. on Natural Computation (ICNC), volume 1, pp. ities. In Proc. of the 27th Int. Conf. on Machine Learning 543–546, 8 2009. (ICML), pp. 911–918, 2010. 9j