Distributed Traffic Flow Prediction with Label Proportions:
        From in-Network towards High Performance Computation with MPI


Thomas Liebig                                                                          THOMAS . LIEBIG @ TU - DORTMUND . DE
University of Dortmund, 44221 Dortmund, Germany
Marco Stolpe                                                                           MARCO . STOLPE @ TU - DORTMUND . DE
University of Dortmund, 44221 Dortmund, Germany
Katharina Morik                                                                    KATHARINA . MORIK @ TU - DORTMUND . DE
University of Dortmund, 44221 Dortmund, Germany


                          Abstract                                   1. Introduction
                                                                     Traffic flow prediction is an important task for traffic man-
     Modern traffic management should benefit from
                                                                     agers. It allows performance assessment of major traffic in-
     the diverse sensors, smart phones, and social net-
                                                                     frastructure, like roads and junctions. Individual mobility
     works data that offer the potential of enhanced
                                                                     benefits from predictions, as they provide necessary data
     services. In disaster scenarios, it is no longer
                                                                     for proactive, smart decisions on individual travel plans,
     guaranteed that a central server and reliable com-
                                                                     e.g. predictive situation-aware trip planning by avoidance
     munication is always available. This motivates
                                                                     of likely traffic hazards (Niu et al., 2015; Liebig et al.,
     a distributed computing setting with restricted
                                                                     2014). Traffic flow models are based on sensor observa-
     communication. Also in distributed High Perfor-
                                                                     tions of current traffic gained by a mesh of (mostly static)
     mance Computing communication costs have to
                                                                     presence sensors. While existing learning methods central-
     be reduced to the minimum and costly broadcast
                                                                     ize and process measurements on a dedicated traffic man-
     to all compute nodes hould be avoided. We want
                                                                     agement server, they have some major drawbacks, partic-
     to learn local models with high communication
                                                                     ularly in cases of disaster: The need for a reliable com-
     efficiency. They still require the exchange of la-
                                                                     munication infrastructure reduces sustainability in case of
     bel information in a setting of supervised learn-
                                                                     natural hazards. First, the server-side collection causes
     ing. The transmission of all labels among the
                                                                     high communication costs, decreasing the system’s abil-
     nodes can be as costly as communicating all ob-
                                                                     ity to process all sensor data, in time. Second, the area of
     servations. Sophisticated methods are required
                                                                     traffic prediction systems is limited by the political area of
     to trade-off prediction performance against com-
                                                                     homogeneous regulations for sending the data through the
     munication costs.
                                                                     network. Third, increasing the network’s density bares the
     We hereby present an in-network algorithm
                                                                     risk of re-identification of individual persons and tracking
     based on local models that only sends label
                                                                     them throughout the network. Existing systems are there-
     counts to neighboring nodes. Therefore the
                                                                     fore limited by communication bandwidths, processing ca-
     method is a novel approach that transfers no
                                                                     pabilities and political regulations.
     data about individual observations, but just ag-
     gregated label information. We outline its MPI                  We tackle these limitations by a distributed spatio-temporal
     implementation. And evaluate our approach on                    in-network learning algorithm, where sensors compute
     real world data in a traffic monitoring scenario.               local models and efficiently communicate label counts
     Tests reveal that in comparison to sending all la-              with their topological neighbors. Our approach sends
     bels, the algorithm is scalable.                                space-time aggregated values that, by design, provide k-
                                                                     anonymity. Hence, our method is privacy preserving and
                                                                     can be applied for large-scale traffic management scenar-
                                                                     ios. Our particular focus is on the prediction of future traf-
Proceedings of the 2 nd International Workshop on Mining Urban
Data, Lille, France, 2015. Copyright c 2015 for this paper by its    fic flow at junctions throughout the region of interest (e.g.
authors. Copying permitted for private and academic purposes.        a city, a state or even areas at European scale). Possible


                                                                    je
                                   Distributed Traffic Flow Prediction with Label Proportions

applications comprise, for instance,                               2. Related Work
                                                                   Many distributed data mining algorithm learn from hor-
  • distributed car-to-car scenarios where cars or trucks          izontally partitioned data, whereas our data is vertically
    communicate at junctions the number of observed                partitioned. In this context, privacy-preserving SVMs
    vehicles at the road to estimate traffic flow and al-          like (Yunhong et al., 2009) are not scalable, since they
    ter their individual transportation plans based on pre-        send quadratic kernel matrices to a central server. Dis-
    dicted traffic conditions, or,                                 tributed optimization algorithms (Bellet et al., 2014) ex-
                                                                   change predictions for each observation per iteration, po-
  • large scale traffic flow prediction that processes mas-
                                                                   tentially sending more than the entire dataset. So does a
    sive local observations on a high performance com-
                                                                   co-regularized least squares regression in (Brefeld et al.,
    puter.
                                                                   2006). Communication-efficient anomaly detection algo-
                                                                   rithms (Das et al., 2011; Stolpe et al., 2013) combine local
Scalable in-network algorithms belong to the field of dis-         and global models, but are 1-class algorithms reducing data
tributed data mining. Existing work mostly focuses on hor-         sent about observations, not labels. In (Lee et al., 2012), lo-
izontally partitioned data. There, full observations, i.e. all     cal support vector machine (SVM) models are trained, but
features and labels, are stored on different nodes in a net-       all labels are sent by a central server.
work. However, network states representing the current
traffic flow are vertically partitioned. Here, only partial in-    Also in traffic flow prediction, most literature describes
formation about observations is stored on different nodes.         processes on central servers. There are two major ways
Learning and prediction therefore either require the trans-        to model traffic: using a simulation (Raney & Nagel, 2006)
mission of observations or labels to other nodes. Previ-           or applying an imputation model, trained on previous sen-
ous work (Das et al., 2011; Lee et al., 2012; Stolpe et al.,       sor measurements. Models are required for the estimation
2013) has focused on sending less information about ob-            of traffic flow at locations not being observed at all. Such
servations to a central coordinator. Here, we deal with re-        imputation is not the focus of our study, but the predic-
ducing the amount of labels sent to neighboring peer nodes.        tion of traffic flow at sensor locations. We point the inter-
Communication-efficient algorithms for vertical distributed        ested reader to methods of simulation (e.g. cellular automa-
learning are not just relevant for traffic flow prediction, but    ton (Raney & Nagel, 2006)) and model-based imputation
for applications as diverse as intrusion detection, monitor-       (e.g. (Liebig et al., 2012)). Most learning-based traffic flow
ing production processes or smart grid management. The             prediction methods analyse time series, where a popular
main contributions of our work are the following:                  model is based on auto-regressive integrated moving aver-
                                                                   age (ARIMA) (Ahmed et al., 1979). Recently, an applica-
  1. We introduce a privacy-preserving approach for the            tion of a Gaussian Markov Model was proposed in (Schnit-
     distributed learning of spatio-temporal prediction            zler et al., 2014), and more advanced graphical models,
     models which transfers only aggregated label infor-           namely Spatio-Temporal-Random-Fields (STRFs), were
     mation, but no data about individual observations.            applied to traffic modeling in (Piatkowski et al., 2013).
                                                                   Distributed approaches comprise an approach that applies
  2. A connection is drawn between the task of learning            kNN and Gaussian Process Regression (Chen et al., 2014),
     from label proportions and reducing communication             on-line distributed prediction of traffic flow in a large-scale
     costs in distributed environments, and it is evaluated        road network (Wang et al., 2014), distributed traffic model-
     on real-world data.                                           ing in a MapReduce framework (Chen et al., 2013), Mapre-
                                                                   duce parallel multivariate regression (Dai et al., 2014) and
  3. We introduce a fast search strategy for the LLP al-
                                                                   MPI (Message Passing Forum, 1994) based high perfor-
     gorithm (Stolpe & Morik, 2011) and demonstrate its
                                                                   mance computation based on SVM (Yang et al., 2014). Few
     prediction performance in the context of traffic flow
                                                                   distributed approaches combine sketches of neighbouring
     prediction.
                                                                   sensors to get probabilistic estimates of the number of vehi-
                                                                   cles co-occurring at different locations. Instead of counting
The next section reviews related work. Section 3 details our       and re-identifying individual vehicles, we use aggregated
problem setting and introduces a novel approach for the in-        quantities.
network training of local models. Section 4 discusses learn-
ing from aggregated label information, discusses its imple-        The task of learning from aggregated label information was
mentation in using message passing interface (MPI), anal-          first introduced in (Kück & de Freitas, 2005). Theoreti-
yses its communication cost and aspects of privacy. Eval-          cal bounds have only recently been proven in (Yu et al.,
uations of our approach can be found in Sect. 5. We finish         2014). (Musicant et al., 2007) propose variants of exist-
with conclusions and outlook on future work.                       ing algorithms. The SVM optimization problem has been


                                                                  jd
                                   Distributed Traffic Flow Prediction with Label Proportions

adapted to the setting (Rüping, 2010; Yu et al., 2013).            Before training, each Pi preprocesses measurements Vi as
Mean Map (Quadrianto et al., 2009) estimates the mean               follows. A window of size p is slided over the series
operator solving a system of linear equations, while (Pa-           Vi with step size 1, storing all thereby created windows
                                                                     (i)        (i)         (i)
trini et al., 2014) extend it with a manifold regulariza-           xt = {vt−p+1 , . . . , vt }, t = p, . . . , n as rows in a
tion, outperforming both SVMs and Mean Map on stan-                                                        (i)       (i)
                                                                    dataset Di . Let N (i) = {n1 , . . . , nc } be the set of in-
dard datasets. A modified Kernel k-Means algorithm (Chen
                                                                    dices for the c neighboring nodes around Pi . Based on
et al., 2009) minimizes the distance to the given label
                                                                    the datasets Di , Dn(i) , . . . , Dn(i) and labels Li , we want
proportions by matrix factorization. Recent work learns                                          1               c

Bayesian network (Hernndez-Gonzlez et al., 2013) and                to learn a local function (model) f (i) that, given windows
                                                                                   (i)
                                                                        (i)   n           n(i)
generative (Fan et al., 2014) classifiers. The LLP algorithm        xt , xt 1 , . . . , xt c of sensor readings from node Pi and
proposed in (Stolpe & Morik, 2011) first determines clus-                                                (i)
                                                                    its neighbors, predicts the label yt+r at node Pi with hori-
ters and then tries to label them. LLP only has linear run-         zon r correctly.
ning time, while its prediction performance competes with                                                   (i)
                                                                                                     (i)   n          n(i)
the approaches in (Quadrianto et al., 2009; Rüping, 2010)          Interpreting windows xt , xt 1 , . . . , xt c as features of a
and(Chen et al., 2009).                                             single observation x that should be classified, the data is
                                                                    vertically partitioned, since each neighboring node of Pi
3. Distributed Learning of Spatio-Temporal                          only stores partial information about x, i.e. a subset of fea-
                                                                    tures.
   Local Models
                                                                    An obvious choice for the training of f (i) at Pi is to ask
Given are m distributed sensor nodes P1 , . . . , Pm . Each
                                                                    for the recorded measurements at each neighboring node,
sensor node Pi delivers an infinite series of real-valued
                           (i)   (i) (i)                            concatenate their columns at Pi and join the labels stored
measurements . . . , vt−1 , vt , vt+1 , . . . for different time
                                                                    at Pi to the new dataset. The approach is more scalable
points . . . , t − 1, t, t + 1, . . .. Time spans between two
                                                                    than centralizing all data, since the number c of neighbors
measurements are equidistant, given a constant sample rate.
                                                                    is fixed, avoiding the bottleneck problem of limited band-
Let t denote the current time of measurement, while t − a
                                                                    width. However, each node still needs to transmit all mea-
and t+a are time points a steps in the past and future. Each
                                                                    surements to each of its neighbors, consuming at least as
sensor node also has a spatial location.
                                                                    much energy per node as sending all data to a single server.
Many traffic flow management tasks require the prediction
                                                                    Therefore, we propose to send only label information from
of traffic flow categories, that are achieved by a discretiza-                                                         (i)
                                                                    node Pi to its neighbors and to train models f0 at node Pi
tion of raw values into distinct intervals (e.g. risk level              (i)             (i)
assignment or decision for emergency traffic signals).The           and f (i) , . . . , f (i) at its neighbors. As model f (i) at node
                                                                              n1         nc
task, given the current time point t, is therefore to predict a     Pi , we propose a majority vote over predictions from itself
label y from a set Y = {Y1 , . . . , Yl } of distinct categories    and its neighboring nodes. All models are local, since they
at some arbitrary node Pi at future time point t + r, based         only consider measurements and labels of a fixed number
on the current and previous (raw) sensor readings at all or         of close topological neighboring nodes around Pi . More-
a subset of nodes P1 , . . . , Pm .                                 over, the approach works fully in-network without a central
                                                                    coordinator, since each node only communicates with its
We assume that for learning, measurements and labels
                                                                    neighboring peer nodes. As learners at each node, one may
are somehow recorded (see below) over a fixed-length
                                                                    consider supervised learners, like kNN, Decision Trees or
time period. For the supervised training of prediction
                                                                    SVMs. Considering the limited computational resources of
models, each node Pi thus provides a sequence Vi =
  (i)          (i)                   (i)                            sensor nodes, however, our evaluation in Sect. 5 is solely
hv1 , . . . , vn i of measurements, vj ∈ R, and a sequence          based on kNN.
        (i)       (i)             (i)
Li = hy1 , . . . , yn i of labels yj ∈ Y .
                                                                    Since the number of bits to encode all labels is often less
D ISTRIBUTED L EARNING OF L OCAL M ODELS                            than an encoding of all measurements, communication is
                                                                    saved by sending labels from Pi instead of measurements
Instead of centralizing all data, we propose that each Pi           to Pi . However, supervised learning still requires individ-
records and stores its own measurements and labels. For             ual labels for all observations. The question is if communi-
predicting future traffic flow categories at node Pi , we re-       cation can be reduced even further, by sending fewer labels
strict learning to Pi itself and c topological neighboring          or aggregated label information to each neighboring node.
nodes around Pi . For instance, to learn and predict the fu-
                                                                    Semi-supervised (Chapelle et al., 2006) and active learn-
ture type of traffic flow at some street junction, considered
                                                                    ing (Balcan et al., 2010) show that training on fewer labels
are only measurements and labels recorded at the junction
                                                                    may achieve a similar performance as training on all labels.
itself and at c junctions closest to it.


                                                                   j3
                                   Distributed Traffic Flow Prediction with Label Proportions

However, such methods do not preserve the privacy of the           task of cluster analysis consists of partitioning a set of
data, since they need individual labels of observations (see       observations into a set C of k disjunct groups (clusters)
Sect. 4). Instead, we propose to send only aggregated la-          C1 , . . . , Ck , such that the similarity of observations in each
bel information, i.e. label counts, to neighboring nodes for       cluster is minimized. LLP relies on the idea that observa-
learning.                                                          tions having the same class also share similar features, i.e.
                                                                   that clusters somehow correspond to classes. LLP allows
4. Aggregation of Label Information                                for several clusters per class and assumes that the majority
                                                                   of elements of a cluster belongs to the same class. Once
Before sending label information to each of its neighbor-          given a clustering the only remaining problem is to assign
ing nodes, Pi divides its time-related sequence Li of labels       correct labels to each cluster.
                               (i)       (i)
into consecutive batches C1 , . . . , Ch of a fixed size b         More formally, let µ : X → C be a mapping that assigns
(see Fig. ??). It respects the prediction horizon r, such that     an arbitrary observation x ∈ X to a cluster C ∈ C. For
       (i)
each Cj consists of labels from time point t + (j − 1)b + r        centroids c1 , . . . , ck found with k-Means, µ(x) would be
to t + jb + r and align correctly with time points of ob-          defined as µ(x) = argminCk ∈C ||x − ck ||2 .
servations (i.e. windows of measurements) at other nodes.
Let n be from here on the size of datasets Di , i.e. the           Further, let ℓ : C → Y be a mapping which assigns a label
number of windows stored. Then, h is ⌈n/bs⌉. For each              λ ∈ Y to each cluster C ∈ C. For ease of notation, let f de-
                                                                                  (i)
batch j, labels y ∈ Y are aggregated by counting how of-           note model f (i) to be learned at node Pn(i) , Bi denote the
                                                                               ne                      e
ten they occur, and stored in a h × l matrix of label counts              n(i)                   (i)
           (i)          (i)                                        batch Bi and Π denote matrix Π . f is the composition
                                                                           e

Q(i) = (qjd ), where qjd = |{y ∈ Cj |y = Yd }| .                   of mappings ℓ and µ, i.e. f = ℓ ◦ µ.
Let Pn(i) be a neighboring node receiving label counts from        With prediction model f , entries γjd of a model-based pro-
      e
Pi . Pn(i) transforms Q(i) into a label proportion matrix          portion matrix Γf = (γjd ) can be calculated as
        e
            (i)    (i)
Π(i) = (πjd ) = qjd /b, i.e. the counts of labels are divided                                            
                                                                           1 X                               1 : f (x) = Yd
by batch size b. Since every node knows b and r, Pn(i) can         γjd =            I(f (x), Yd ), I =                                  .
                                                        e                 |Bj |                              0 : f (x) 6= Yd
                               n(i)       n(i)                                  x∈Bj
partition its own windows x1 e , . . . , xne of measurements                                                                     (1)
                n(i)         n(i)
into batches B1 e , . . . , Bh e . Since the sender respects r,
                                                                   The LLP algorithm now minimizes the mean squared error
the time spans used for aggregating the labels align cor-
rectly with the windows of measurements stored at Pn(i) .                                          h    l
                                                            e                                  1 XX
The learning task at node Pn(i) now consists of learning a                  MSE(Π, Γf ) =             (πjd − γjd )2 ,            (2)
                               e                                                               hl j=1
         (i)                                                                                           d=1
model f (i) , only based on its batches of (unlabeled) mea-
         ne
surements and the label information from node Pi , stored          between the given label proportion matrix Π and the model-
in the label proportion matrix Π(i) , such that the expected       based proportion matrix Γf by trying different label map-
prediction error over individual observations is minimized.        pings ℓ.
This task is also known as learning from label proportions.
                                                                   A L OCAL S EARCH S TRATEGY WITH M ULTISTARTS
Several methods have been developed to solve the task (see
Sect. 2). Considering the limited computational resources          The LLP algorithm as introduced in (Stolpe & Morik,
of sensor nodes, the LLP algorithm (Stolpe & Morik, 2011)          2011) can work with different cluster algorithms and la-
looked most promising for our evaluation in Sect. 5, since         beling strategies. LLP with an exhaustive labeling strategy,
LLP has a linear running time and its centroid model a             called LLPexh in the following, tries all possible labelings
small memory footprint. Moreover, it can handle multi-             of the clusters. We found it too time-consuming for the
class classification problems as they arise in traffic monitor-    evaluations done in Sect. 5. The greedy strategy proposed
ing. However, we found that it still needs to be improved          in (Stolpe & Morik, 2011) didn’t achieve sufficient accura-
for scalability issues and performance. The next section           cies for traffic prediction. Hence, a better search strategy is
describes LLP shortly, while Sect. 4 introduces a new local        demanded.
search method.                                                     We propose a local search that is started multiple times with
                                                                   different random combinations of labels. LLP with this
T HE LLP A LGORITHM
                                                                   search strategy will be called LLPlsm in the following. The
LLP learns from label proportions by first clustering all ob-      local search greedily improves on the current labeling of
servations and then assigning labels to each cluster. The          clusters by trying all possible labels at each component of
                                                                   a labeling vector λ. Fitness measures how well the model-


                                                                  jN
                                      Distributed Traffic Flow Prediction with Label Proportions

based label proportion matrix Γf , as calculated from the              should be avoided. Our algorithm respects both findings
current labeling, matches the given label proportions. If              and therefore seems suitable for an MPI implementation.
the fitness improves, the search starts from the first com-            For ease of development we decided for the Cran-R pack-
ponent of the labeling vector λ, again. Otherwise, it resets           age Rmpi (Yu, 2002) which provides basic MPI functional-
the label at the current position kpos to the label of the best        ities in Cran-R, thus matrix operators can be applied to the
(local) solution found so far. Returned is the best labeling           data. Our program comprises the following generic steps:
found over all starts of the different greedy searches.
                                                                           1. Load Rmpi, and spawn slaves
In each iteration, the greedy search runs until no further
improvement is possible. Moreover, at each step of the al-                 2. Definition of the functions for the master
gorithm, the fitness either improves or is staying the same                3. Definition of necessary functions for the LLP algo-
(which is a stopping criterion). Therefore, each search finds                 rithm at the slave
a local minimum. Since the number of searches is finite, the
returned labeling vector is also locally minimal. In com-                  4. Initialization of the data
parison to LLPexh , it cannot be guaranteed that a globally                5. Send required data and functions to the slaves
optimal solution is found. However, with regard to the pre-
diction results presented in Sect. 5, we found that a local                6. Tell slaves to execute their function
search performed sufficient enough, despite a much lower
                                                                           7. Communicate with the slaves to perform computation
running time.
                                                                           8. Collect the results
LLP as introduced in (Stolpe & Morik, 2011) combines the
MSE with two other error measures. However, we found                       9. Close slaves and quit
that the use of these additional measures decreases the ac-
curacy in the traffic monitoring scenario. Hence, all exper-           For the learning from label proportions, our implementa-
iments in Sect. 5 are based on the MSE, only. Similarly, we            tion presumes a shared network file system and initially
abstain from the evolutionary feature weighting presented              processes the data at the master such that the sliding win-
in (Stolpe & Morik, 2011), since it would heavily increase             dows of the measurements are stored as Robjects on the
the algorithm’s running time.                                          file system. Every task gets its pointer to the correspond-
                                                                       ing slice of data and the label proportions of neighbouring
MPI IMPLEMENTATION                                                     nodes. The LLP algorithm is executed in every task and the
                                                                       trained models (cluster centers and their labels) are again
We explicitly focus on the implementation with the Mes-                stored physically for later re-use. This also allows the de-
sage Passing Interface (MPI) (Message Passing Forum,                   ployment of the parallel learned models in embedded de-
1994) as message passing in the top super computers in the             vices or the future application of the label proportion mod-
top500 list1 base on the MPI standard. MPI implements                  els in high performance computation settings. Next subsec-
the single program multiple data paradigm (Darema, 2001)               tion analyses the communication cost of LLP in compari-
whereas every node of a distributed system executes the                son to kNN algorithm.
same program but uses different data. To coordinate this
architecture a MPI program consists of 1 master node and               A NALYSIS OF C OMMUNICATION C OSTS
multiple slave nodes that perform computations, the results
are collected at the master. The art of MPI programming is             Each node Pi transmits a matrix Q to each of its neighbor-
to divide the problem into multiple tasks that are transferred         ing nodes, consisting of counts for each label Yd ∈ Y and
to the slaves and processed thereby. Usually, the number of            batch. Such counts may be assumed to be integers. The
tasks exceeds the number of slave nodes and the distribu-              maximum value of each integer is b, which means we need
tion of the tasks has to be organized by the master.                   to reserve at most ⌈log2 b⌉ bits for each label. The num-
                                                                       ber of batches, given n observations, is ⌈n/b⌉. The total
The work in (Nupairoj & Ni, 1994) analyses the perfor-                 number of bits zAGG for encoding matrix Q is therefore
mance of such MPI systems and reveals that communica-
tion is major bottleneck in MPI programs. A succeeding                                              lnm
publication (Piernas et al., 1997) provides empirical esti-                              zAGG =           ⌈log2 b⌉|Y |   .       (3)
                                                                                                     b
mates for computation of communication costs depending
on message lengths. Based on this publications two major               In comparison, the number of bits zALL required to encode
conclusions ban be made: (1) The shorter the messages the              all labels of n observations, for |Y | different labels, is at
lower the communication cost, and(2) broadcast messages                most
   1
       http://www.top500.org/lists/2014/11/, last accessed May, 1st
                                                                                            zALL = n⌈log2 |Y |⌉      .           (4)


                                                                      9y
                                        Distributed Traffic Flow Prediction with Label Proportions

The total costs are then either zAGG or zALL , multiplied by           01/01/2013 till 14/05/2013, consisting of tuples (t, u, w),
the number of nodes m. Here we assume that label infor-                where u is the location of the observation and consists of
mation is broadcast to each neighboring node, which is not             an index for the junction, the arm and the lane number at
unrealistic for sensors in topologically close regions. All            which the sensor is located at. The metric w contains the
payloads reported in Sect. 5 base on this assumption.                  aggregated vehicle count at sensor location since last mea-
                                                                       surement. The time stamp t denotes the recording time.
A NALYSIS OF P RIVACY
The vulnerable data are the original sensor readings. These                            95
                                                                                            Accuracy of kNN vs. LLP-lsm with different aggregations
                                                                                                                                                               350
                                                                                                                                                                      Payload of kNN vs. LLP with different aggregations


traffic flow measurements bare the risk of re-identification                           90                                                                      300

                                                                                                                                                               250


                                                                        accuracy (%)
                                                                                       85
of individual vehicles. For example in a dense sensor net-


                                                                                                                                                      Kbytes
                                                                                                                                                               200
                                                                                       80
work with sparse observations of vehicles, their occurrence                            75
                                                                                                                                                               150

                                                                                                                                                               100
may be tracked throughout the network. As mobility often                               70                                                                       50

is a regular behaviour and contains patterns this risk is even                         65
                                                                                            kNN       LLP-25      LLP-50     LLP-75     LLP-100
                                                                                                                                                                 0
                                                                                                                                                                     kNN      LLP-25     LLP-50      LLP-75     LLP-100

higher. In this section we show that our LLPlsm -based al-                                                         model                                                                  model


gorithm transforms the data such that re-identification risk
is at most 1/s.                                                        Figure 1. Trade-off between accuracy and payload sent for kNN
                                                                       and LLPlsm
In our distributed setting, adversaries of a particular sensor
node are malicious sensors that could use received mea-                Local models are trained for each of the 296 sensor nodes
surements of neighboring sensors for deduction of individ-             and their nearest topological neighbors. As supervised
ual mobility traces. The following attack model is possi-              base-line learner that receives all labels, we use kNN with
ble: The adversary analyses differences among neighbor-                k = 15. For learning from aggregated label counts, we
ing sensor readings and deduces individual movement. If                cluster the observations at each node with k-Means (k =
the difference among two neighboring sensor readings is                15, 50 different random starting points, 500 iterations at
zero and both traffic flow counts are w, it is (depending              maximum) and label the clusters with LLPlsm (with 150
on network topology) likely that w vehicles moved be-                  starts of the local greedy search) at each node for different
tween the two sensors. In case of three neighboring sensors            batch sizes b = 25, 50, 75 and 100. The accuracy of each
Pa , Pb , Pc their measurements va , vb , vc can be combined           method is assessed by a 10-fold cross validation, i.e. all
as follows: If va − vb = w = vc it may be deduced that                 models are trained and evaluated for different hold-out sets
on the way from Pa to Pb w vehicles turned to Pc , in case             10 times. In total 296 × 7 × 10 = 20, 720 models for kNN
va − vb = −w = −vc w vehicles originated from the loca-                need to be evaluated and 296 × 7 × 10 × 4 = 82, 880 mod-
tion Pa .                                                              els trained and evaluated for LLPlsm . The evaluation has
With our new LLPlsm -based approach we process dis-                    been done offline in parallel on different machines (about
cretized traffic flow values and just communicate counts of            36 CPU cores).
these value ranges. We denote the minimal (nonzero) in-                Figure 1 shows the trade-off between accuracy and pay-
terval width by s. Thus, measurements may not be distin-               load sent for kNN and LLPlsm trained on differently sized
guished up to a granularity of s vehicles and w is bounded             batches of aggregated labels. Besides the average accuracy
by s, w ≥ s. In turn, the risk of re-identification with               over all 10-fold cross-validations at each node, the bars in
the hereby described attack model is at most 1/s. Our ap-              Fig. 1 (left) also depict the standard deviation of accuracy
proach therefore provides s-anonymity by design. The ag-               over all nodes.
gregation of label information reduces the remaining risk
for disclosure of neighboring labels at a malicious sensor             In general, LLPlsm performs slightly worse than kNN. Nev-
node. The solely transmission of label counts prevents                 ertheless, there are still many junctions for which the traffic
doubtless reconstruction of the labels (Yu et al., 2014).              flow is predicted quite well with LLPlsm . Some locations
                                                                       have bad performance with both methods, a comparison to
                                                                       the map reveals that these are locations of parking areas
5. Experiments                                                         e.g. inner-city parking houses and recreational areas where
We perform tests of the method on data of the city of                  many vehicles stay for a long period of time.
Dublin. The Sydney Coordinated Adaptive Traffic Sys-
tem (SCATS) provides information on vehicular traffic at               6. Conclusions
over 750 fixed sensor locations as spatio-temporal time se-
ries (McCann, 2014). The data we use2 is a snapshot from               The task of scalable traffic flow prediction involves a trade-
                                                                       off between the accuracy of models and the amount of com-
   2
       Data is publicly available at http://dublinked.ie .             munication between networked nodes. Especially in high


                                                                     9R
                                Distributed Traffic Flow Prediction with Label Proportions

performance computation and embedded devices commu-            Chen, S., Liu, B., Qian, M., and Zhang, C. Kernel k-
nication is costly.                                              Means based framework for aggregate outputs classifi-
                                                                 cation. In Proc. of the Int. Conf. on Data Mining Work-
In this paper we presented a novel approach for local mod-
                                                                 shops (ICDMW), pp. 356–361, 2009.
els that trades-off communication costs to prediction accu-
racy which is suitable for in-network deployment and clus-     Chen, Xing-Yu, Pao, Hsing-Kuo, and Lee, Yuh-Jye. Ef-
ter computations.                                                ficient traffic speed forecasting based on massive het-
Future work will focus on examining more sophisticated           erogenous historical data. In Big Data (Big Data), 2014
aggregation strategies for labels. We will study how to in-      IEEE International Conference on, pp. 10–17, Oct 2014.
clude dynamic distributed traffic flow prediction in state-      doi: 10.1109/BigData.2014.7004425.
of-the-art (multi-modal) route planning methods proposed       Dai, Liang, Qin, Wen, Xu, Hongke, Chen, Ting, and Qian,
by (Bast et al., 2014).                                          Chao. Urban traffic flow prediction: A mapreduce based
                                                                 parallel multivariate linear regression approach. In Intel-
Acknowledgements                                                 ligent Transportation Systems (ITSC), 2014 IEEE 17th
                                                                 International Conference on, pp. 2823–2827, Oct 2014.
This research has received funding from the European             doi: 10.1109/ITSC.2014.6958142.
Union’s Seventh Framework Programme under grant
agreement number FP7-318225, INSIGHT. Additionally,            Darema, Frederica. The spmd model: Past, present and
this work has been supported by Deutsche Forschungsge-           future. In Recent Advances in Parallel Virtual Machine
meinschaft (DFG) within the Collaborative Research Cen-          and Message Passing Interface, pp. 1–1. Springer, 2001.
ter SFB 876, project B3. We thank Jan Czogalla for data
preprocessing.                                                 Das, K., Bhaduri, K., and Votava, P. Distributed anomaly
                                                                 detection using 1-class SVM for vertically partitioned
                                                                 data. Stat. Anal. Data Min., 4(4):393–406, 2011.
References
                                                               Fan, K., Zhang, H., Yan, S., Wang, L., Zhang, W., and
Ahmed, M.S., Cook, A.R., of Oklahoma. School of
                                                                 Feng, J. Learning a generative classifier from label pro-
  Civil Engineering, University, and Science, Environ-
                                                                 portions. Neurocomput., 139:47–55, 9 2014.
  mental. Analysis of Freeway Traffic Time Series Data
  Using Box and Jenkins Techniques. 1979.                      Hernndez-Gonzlez, J., Inza, I., and Lozano, J. A. Learn-
Balcan, M.-F., Hanneke, S., and Vaughan, J. W. The true          ing bayesian network classifiers from label proportions.
  sample complexity of active learning. Machine Learn-           Pattern Recognition, 46(12):3425–3440, 2013.
  ing, 80(2–3):111–139, 2010.                                  Kück, H. and de Freitas, N. Learning to classify individuals
Bast, Hannah, Delling, Daniel, Goldberg, Andrew, Müller-        based on group statistics. In Proc. of the 21th UAI, pp.
  Hannemann, Matthias, Pajor, Thomas, Sanders, Peter,            332–339, 2005.
  Wagner, Dorothea, and Werneck, Renato. Route plan-
                                                               Lee, S., Stolpe, M., and Morik, K. Separable approximate
  ning in transportation networks. Technical Report MSR-
                                                                 optimization of support vector machines for distributed
  TR-2014-4, January 2014.
                                                                 sensing. In Machine Learning and Knowledge Discov-
Bellet, A., Liang, Y., Garakani, A. B., Balcan, M.-F., and       ery in Databases, volume 7524 of LNCS, pp. 387–402,
  Sha, F. Distributed Frank-Wolfe algorithm: A unified           Berlin, Heidelberg, 2012. Springer-Verlag.
  framework for communication-efficient sparse learning.
  CoRR, abs/1404.2644, 2014.                                   Liebig, Thomas, Xu, Zhao, May, Michael, and Wrobel,
                                                                 Stefan. Pedestrian quantity estimation with trajectory
Brefeld, U., Gärtner, T., Scheffer, T., and Wrobel, S. Ef-      patterns. In Machine Learning and Knowledge Discov-
  ficient co-regularised least squares regression. In Proc.      ery in Databases, pp. 629–643. Springer Berlin Heidel-
  of the 23rd Int. Conf. on Machine Learning (ICML), pp.         berg, 2012.
  137–144, New York, NY, USA, 2006. ACM.
                                                               Liebig, Thomas, Piatkowski, Nico, Bockermann, Christian,
Chapelle, O., Schölkopf, B., and Zien, A. Semi-Supervised       and Morik, Katharina. Predictive trip planning - smart
  Learning. MIT Press, Cambridge, MA, 2006.                      routing in smart cities. In Proceedings of the Workshops
Chen, Cheng, Liu, Zhong, Lin, Wei-Hua, Li, Shuang-               of the EDBT/ICDT 2014 Joint Conference (EDBT/ICDT
  shuang, and Wang, Kai. Distributed modeling in a               2014), Athens, Greece, March 28, 2014, volume 1133,
  mapreduce framework for data-driven traffic flow fore-         pp. 331–338. CEUR-WS.org, 2014.
  casting. Intelligent Transportation Systems, IEEE Trans-
  actions on, 14(1):22–33, 2013.


                                                              9k
                                 Distributed Traffic Flow Prediction with Label Proportions

McCann, Barry. A review of scats operation and deploy-           Schnitzler, François, Liebig, Thomas, Mannor, Shie, and
 ment in dublin. In Proceedings of the 19th JCT Traf-              Morik, Katharina. Combining a gauss-markov model
 fic Signal Symposium & Exhibition. JCT Consulting Ltd,            and gaussian process for traffic prediction in dublin
 2014.                                                             city center.     In Proceedings of the Workshops of
                                                                   the EDBT/ICDT 2014 Joint Conference (EDBT/ICDT
Message Passing Forum. Mpi: A message-passing inter-               2014), Athens, Greece, March 28, 2014, volume 1133,
 face standard. Technical report, Knoxville, TN, USA,              pp. 373–374. CEUR-WS.org, 2014.
 1994.
                                                                 Stolpe, M., Bhaduri, K., Das, K., and Morik, K. Anomaly
Musicant, D. R., Christensen, J. M., and Olson, J. F. Su-          detection in vertically partitioned data by distributed
 pervised learning by training on aggregate outputs. In            core vector machines. In European Conf. on Ma-
 7th Int. Conf. on Data Mining (ICDM), pp. 252–261, 10             chine Learning and Knowledge Discovery in Databases
 2007.                                                             (ECML/PKDD), pp. 321–336. Springer, 2013.
Niu, Xiaoguang, Zhu, Ying, Cao, Qingqing, Zhang, Xin-            Stolpe, Marco and Morik, Katharina. Learning from
  ing, Xie, Wei, and Zheng, Kun. An online-traffic-                label proportions by optimizing cluster model selec-
  prediction based route finding mechanism for smart city.         tion. In Proceedings of the 2011 European Confer-
  International Journal of Distributed Sensor Networks,            ence on Machine Learning and Knowledge Discovery in
  501:970256, 2015.                                                Databases - Volume Part III, ECML PKDD’11, pp. 349–
Nupairoj, Natawut and Ni, Lionel M. Performance evalu-             364, Berlin, Heidelberg, 2011. Springer-Verlag.
  ation of some mpi implementations on workstation clus-         Wang, Yubin, van Schuppen, Jan H., and Vrancken, Jos.
  ters. In Scalable Parallel Libraries Conference, 1994.,         On-line distributed prediction of traffic flow in a large-
  Proceedings of the 1994, pp. 98–105. IEEE, 1994.                scale road network. Simulation Modelling Practice and
Patrini, G., Nock, R., Caetano, T., and Rivera, P. (almost)       Theory, 47(0):276 – 303, 2014. ISSN 1569-190X. doi:
  no label no cry. In Advances in Neural Information              http://dx.doi.org/10.1016/j.simpat.2014.06.011.
  Processing Systems 27, pp. 190–198. Curran Associates,
                                                                 Yang, Zhaosheng, Mei, Duo, Yang, Qingfang, Zhou, Hux-
  Inc., 2014.
                                                                   ing, and Li, Xiaowen. Traffic flow prediction model
Piatkowski, Nico, Lee, Sangkyun, and Morik, Katharina.             for large-scale road network based on cloud computing.
  Spatio-temporal random fields: compressible represen-            Mathematical Problems in Engineering, 2014, 2014.
  tation and distributed estimation. Machine Learning, 93
                                                                 Yu, F. X., Kumar, S., Jebara, T., and Chang, Shih-Fu. On
  (1):115–139, 2013. ISSN 0885-6125.
                                                                   learning with label proportions. CoRR, abs/1402.5902,
Piernas, Juan, Flores, A, and Garcı́a, José M. Analyzing the      2014.
  performance of mpi in a cluster of workstations based
                                                                 Yu, F. X. Yu, Liu, D., Kumar, S., Jebara, T., and Chang,
  on fast ethernet. In Recent advances in Parallel Vir-
                                                                   S. ∝SVM for learning with label proportions. In Proc.
  tual Machine and Message Passing Interface, pp. 17–24.
                                                                   of the 30th Int. Conf. on Machine Learning (ICML), pp.
  Springer, 1997.
                                                                   504–512, 2013.
Quadrianto, N., Smola, A. J., Caetano, T. S., and Le,
  Q. V. Estimating labels from label proportions. J. Mach.       Yu, Hao.     Rmpi:  Parallel statistical comput-
  Learn. Res., 10:2349–2374, 12 2009.                              ing in r. R News, 2(2):10–14, 2002.      URL
                                                                   http://cran.r-project.org/doc/Rnews/
Raney, B. and Nagel, K. An improved framework for large-           Rnews 2002-2.pdf.
  scale multi-agent simulations of travel behavior. To-
  wards better performing European Transportation Sys-           Yunhong, H., Liang, F., and Guoping, H. Privacy-
  tems, pp. 305–347, 2006.                                         preserving SVM classification on vertically partitioned
                                                                   data without secure multi-party computation. In 5th Int.
Rüping, S. SVM classifier estimation from group probabil-         Conf. on Natural Computation (ICNC), volume 1, pp.
  ities. In Proc. of the 27th Int. Conf. on Machine Learning       543–546, 8 2009.
  (ICML), pp. 911–918, 2010.


                                                                9j