Cooperative Kernel-Based Forecasting in Decentralized
    Multi-Agent Systems for Urban Traffic Networks
                                                   Jelena Fiosina and Maksims Fiosins1


Abstract. The distributed and often decentralised nature of com-               probability, etc. can be evaluated by autonomous agents-vehicles in
plex stochastic traffic systems having a large amount of distributed           a distributed manner. In this paper, we present a general distributed
data can be represented well by multi-agent architecture. Traditional          regression forecasting algorithm and illustrate its efficiency in fore-
centralized data mining methods are often very expensive or not fea-           casting travel time.
sible because of transmission limitations that lead to the need of the            Numerous data processing and mining techniques were suggested
development of distributed or even decentralized data mining meth-             for forecasting travel time in a centralized and distributed manner.
ods including distributed parameter estimation and forecasting. We             Statistical methods, such as regression and time series, and artificial
consider a system, where drivers are modelled as autonomous agents.            intelligence methods, such as neural networks, are successfully im-
We assume that agents possess an intellectual module for data pro-             plemented for similar problems. However, travel time is affected by a
cessing and decision making. All such devices use a local kernel-              range of different factors. Thus, accurate prediction of travel time is
based regression model for travel time estimation and prediction. In           difficult and needs considerable amount of traffic data. Understand-
this framework, the vehicles in a traffic network collaborate to opti-         ing the factors affecting travel time is essential for improving predic-
mally fit a prediction function to each of their local measurements.           tion accuracy [13].
Rather than transmitting all data to one another or a central node in             We focus on non-parametric, computationally intensive estima-
a centralized approach, the vehicles carry out a part of the computa-          tion, i.e. Kernel-based estimation, which is a promising technique
tions locally by transmitting only limited amount of data. This lim-           for solving many statistical problems, including parameter estima-
ited transmission helps the agent that experiences difficulties with its       tion. In our paper, we suggest a general distributed kernel-based algo-
current predictions. We demonstrate the efficiency of our approach             rithm and use it for forecasting travel time using real-world data from
with case studies with the analysis of real data from the urban traffic        southern Hanover. We assume that each agent autonomously esti-
domain.                                                                        mates its kernel-based regression function, whose additive nature fits
                                                                               very well with streaming real-time data. When an agent is not able to
                                                                               estimate the travel time because of lack of data, i.e., when it has no
1 INTRODUCTION                                                                 data near the point of interest (because the kernel-based estimation
Multi-agent systems (MAS) often deal with complex applications,                uses approximations), it cooperates with other agents. An algorithm
such as sensors, traffic, or logistics networks, and they offer a suit-        for multi-agent cooperative learning based on based on transmission
able architecture for distributed problem solving. In such applica-            of the required data as a reply to the request of the agent experi-
tions, the individual and collective behaviours of the agents depend           encing difficulties was suggested. After obtaining the necessary data
on the observed data from distributed sources. In a typical distributed        from other agents, the agent can forecast travel-time autonomously.
environment, analysing distributed data is a non-trivial problem be-              The travel-time, estimated in the DDPM stage, can serve as an in-
cause of several constraints such as limited bandwidth (in wireless            put for the next stage of distributed decision making of the intelligent
networks), privacy-sensitive data, distributed computing nodes, etc.           agents [5].
The distributed data processing and mining (DDPM) field deals with                This study contributes in the following ways: It suggests (a) a de-
these challenges by analysing distributed data and offers many al-             centralized kernel-based regression forecasting approach, (b) a re-
gorithmic solutions for data analysis, processing, and mining using            gression model with a structure that facilitates travel-time forecast-
different tools in a fundamentally distributed manner that pays care-          ing; and it improves the application efficiency of the proposed ap-
ful attention to the resource constraints [3].                                 proach for the current real-world urban traffic data.
   Traditional centralized data processing and mining typically re-               The remainder of this paper is organized as follows. Section 2 de-
quires central aggregation of distributed data, which may not always           scribes the related previous work in the DDPM field for MAS, kernel
be feasible because of the limited network bandwidth, security con-            density estimation, and travel-time prediction. Section 3 describes
cerns, scalability problems, and other practical issues. DDPM car-             the current problem more formally. Section 4 presents the multivari-
ries out communication and computation by analyzing data in a dis-             ate kernel-based regression model adopted for streaming data. Sec-
tributed fashion [10]. The DDPM technology offers more efficient               tion 5 describes the suggested cooperative learning algorithm for op-
solutions in such applications.                                                timal prediction in a distributed MAS architecture. Section 6 presents
   In this study we focus on the urban traffic domain, where many              a case study and the final section contains the conclusions.
traffic characteristics such as travel time, travel speed, congestion
1 Institure of Informatics, Clausthal University of Technology, Germany
  email: {jelena.fiosina, maksims.fiosins}@gmail.com


                                                                           3
2 RELATED PREVIOUS WORK                                                         2.2 Travel Time Prediction Models
2.1 Distributed Data Mining in Multi-agent
    Systems                                                                     Continuous traffic jams indicate that the maximum capacity of a road
                                                                                network is met or even exceeded. In such a situation, the modelling
A strong motivation for implementing DDPM for MAS is given by                   and forecasting of traffic flow is one of the important techniques that
Da Silva et al. in [3], where authors argue that DDPM in MAS deals              need to be developed [1]. Nowadays, knowledge about travel time
with pro-active and autonomous agents that perceive their environ-              plays an important role in transportation and logistics, and it can be
ment, dynamically reason out actions on the basis of the environ-               applied in various fields and purposes. From travellers’ viewpoints,
ment, and interact with each other. In many complex domains, the                the knowledge about travel time helps to reduce the travel time and
knowledge of agents is a result of the outcome of empirical data                improves reliability through better selection of travel routes. In lo-
analysis in addition to the pre-existing domain knowledge. DDPM                 gistics, accurate travel time estimation could help reduce transport
of agents often involves detecting hidden patterns, constructing pre-           delivery costs and increase the service quality of commercial deliv-
dictive and clustering models, identifying outliers, etc. In MAS,               ery by delivering goods within the required time window by avoiding
this knowledge is usually collective. This collective ’intelligence’            congested sections. For traffic managers, travel time is an important
of MAS must be developed by distributed domain knowledge and                    index for traffic system operation efficiency [13].
analysis of the distributed data observed by different agents. Such                There are several studies in which a centralized approach is used to
distributed data analysis may be a non-trivial problem when the un-             predict the travel time. The approach was used in various intelligent
derlying task is not completely decomposable and computational re-              transport systems, such as in-vehicle route guidance and advanced
sources are constrained by several factors such as limited power sup-           traffic management systems. A good overview is given in [13]. To
ply, poor bandwidth connection, privacy-sensitive multi-party data.             make the approach effective, agents should cooperate with each other
   Klusch at al. [11] concludes that autonomous data mining agents,             to achieve their common goal via the so called gossiping scenarios.
as a special type of information agents, may perform various kinds              The estimation of the actual travelling time using vehicle-to-vehicle
of mining operations on behalf of their user(s) or in collaboration             communication without MAS architecture was described in [14].
with other agents. Systems of cooperative information agents for data              On the other hand, a multi-agent architecture is better suited for
mining in distributed, heterogeneous, or homogeneous, and massive               distributed traffic networks, which are complex stochastic systems.
data environments appear to be quite a natural progression for the              Further, by using centralized approaches the system cannot adapt
current systems to be realized in the near future.                              quickly to situations in real time, and it is very difficult to trans-
   A common feature of all approaches is that they aim at integrat-             mit a large amount of information over the network. In centralized
ing the knowledge that is extracted from data at different geograph-            approaches, it is difficult or simply physically impossible to store
ically distributed network sites with minimum network communica-                and process large data sets in one location. In addition, it is known
tion and maximum local computations. Local computation is carried               from practice that the most drivers rely mostly on their own expe-
out on each site, and either a central site communicates with each              rience; they use their historical data to forecast the travel time [5].
distributed site to compute the global models or a peer-to-peer archi-          Thus, decentralized multi-agent systems are are fundamentally im-
tecture is used. In the case of the peer-to-peer architecture, individual       portant for the representation of these networks [1]. We model our
nodes might communicate with a resource-rich centralized node, but              system with autonomous agents to allow vehicles to make decisions
they perform most tasks by communicating with neighbouring nodes                autonomously using not only the centrally processed available infor-
through message passing over an asynchronous network [3].                       mation, but also their historical data.
   A distributed system should have the following features for the                 Traffic information generally goes through the following three
efficient implementation of DDPM: the system consists of multiple               stages: data collection and cleansing, data fusion and integration, and
independent data sources, which communicate only through message                data distribution [12]. The system presented in [12] consists of three
passing; communication between peers is expensive; peers have re-               components, namely a Smart Traffic Agent, the Real-time Traffic In-
source constraints (e. g. battery power) and privacy concerns [3].              formation Exchange Protocol and a centralized Traffic Information
   Typically, communication involves bottlenecks. Since communi-                Centre that acts as the backend. A similar architecture is used in
cation is assumed to be carried out exclusively by message passing,             this study, but the prediction module, incorporated into Start Traf-
the primary goal of several DDPM methods, as mentioned in the liter-            fic Agent (vehicle agent), is different. In our study we do not focus
ature, is to minimize the number of messages sent. Building a mono-             on the transmission protocol describing only the information, which
lithic database in order to perform non-distributed data processing             should be sent from one node to another, without the descriptions of
and mining may be infeasible or simply impossible in many appli-                protocol packets. The centralized Traffic Information Centre in our
cations. The costs of transferring large blocks of data may be very             model is used only for storing system information.
expensive and result in very inefficient implementations [10].                     The decentralized MAS approach for urban traffic network was
   Moreover, sensors must process continuous (possibly fast) streams            considered in [2] also, where the authors forecast the traversal time
of data. The resource-constrained distributed environments of sensor            for each link of the network separately. Two types of agents were
networks and the need for a collaborative approach to solve many                used for vehicles and links, and a neural network was used as the
problems in this domain make MAS architecture an ideal candidate                prediction model.
for application development.                                                       For travel time forecasting different regression models can be ap-
   In our study we deal with homogeneous data. However, a promis-               plied. Linear multivariate regression model for decentralized urban
ing approach to agent-based parameter estimation for partially het-             traffic network was proposed in [4]. This regression model is well-
erogeneous data in sensor networks was suggested in [7]. Another                studied, is parametric and allows estimation of each variables contri-
decentralized approach for homogeneous data was suggested in [18]               bution. However is not sufficiently effective in the case of non-linear
to estimate the parameters of a wireless network by using a paramet-            systems. Alternatively non-parametric kernel-based regression mod-
ric linear model and stochastic approximations.                                 els can be applied. These models can be effectively used for any types


                                                                            4
of systems, however are relatively new and not well-studied yet. The                In this step, the agent that experiences difficulties with a forecast
efficiency of non-parametric kernel-based regression approach for                sends its requested data point to other traffic participants in the trans-
traffic flow forecasting in comparison to parametric approach was                mission radius. The other agents try to make a forecast themselves.
made in [17]. In this study we apply non-parametric kernel regres-               In the case of a successful forecast, the agents share their experience
sion for the similar traffic system as in [4] in order to increase the           by sending their observations that are nearest to the requested point.
prediction quality.                                                              After receiving the data from the other agents, the agent combines
                                                                                 the obtained results, increases its experience, and makes a forecast
                                                                                 autonomously.
2.3 Kernel Density Estimation                                                       In short, decentralized travel-time prediction consists of three
                                                                                 steps: 1) local prediction; 2) in the case of unsuccessful prediction:
Kernel density estimation is a non-parametric approach for estimat-
                                                                                 selection of agents for experience sharing and sending them the re-
ing the probability density function of a random variable. Kernel den-
                                                                                 quested data point; 3) aggregation of the answers and prediction.
sity estimation is a fundamental data-smoothing technique where in-
ferences about the population are made on the basis of a finite data
sample. A kernel is a weighting function used in non-parametric es-              4 LOCAL PARAMETER ESTIMATION
timation techniques.
    Let X1 , X2 , . . . , Xn be an iid sample drawn from some distribu-          The non-parametric approach to estimating a regression curve has
tion with an unknown density f . We attempt to estimate the shape of             four main purposes. First, it provides a versatile method for exploring
f , whose kernel density estimator is                                            a general relationship between two variables. Second, it can predict
                                                                                 observations yet to be made without reference to a fixed parametric
                                                     
                             1 X
                                   n                                             model. Third, it provides a tool for finding spurious observations by
                                             x − Xi
                  fˆh (x) =      K                        ,           (1)        studying the influence of isolated points. Fourth, it constitutes the
                            nh                  h
                                 i=1                                             flexible method of substitution or interpolating between adjacent X-
                                                                                 values for missing values [8].
where kernel K(•) is a non-negative real-valuedR ∞ integrable function              Let us consider a non-parametric regression model [9] with a de-
satisfying the following two requirements: −∞ K(u)du = 1 and                     pendent variable Y and a vector of d regressors X
K(−u) = K(u) for all values of u; h > 0 is a smoothing parame-
ter called bandwidth. The first requirement to K(•) ensures that the                                         Y = m(x) + ǫ,                             (2)
kernel density estimator is a probability density function. The second
requirement to K(•) ensures that the average of the corresponding                where ǫ is the disturbance term such that E(ǫ|X = x) = 0 and
distribution is equal to that of the sample used [8]. Different kernel           V ar(ǫ|X = x) = σ 2 (x), and m(x) = E(Y |X = x). Further,
functions are available: Uniform, Epanechnikov, Gausian, etc. They               let (Xi , Yi )n
                                                                                               i=1 be the observations sampled from the distribution of
differ in the manner in which they take into account the vicinity ob-            (X, Y ). Then the Nadaraya-Watson kernel estimator is
servations to estimate the function from the given variables.                                         Pn             x−Xi
                                                                                                                            
   A very suitable property of the kernel function is its additive na-                                  i=1
                                                                                                            K          h
                                                                                                                           Yi         pn (x)
                                                                                              mn (x) = P n            x−Xi
                                                                                                                                 =          ,         (3)
ture. This property makes the kernel function easy to use for stream-                                              K     h
                                                                                                                                      qn (x)
                                                                                                               i=1
ing and distributed data [8], [3], [7]. In [11], the distributed kernel-
based clustering algorithm was suggested on the basis of the same                where K(•) is the kernel function of Rd and h is the bandwidth. Ker-
property. In this study, kernel density is used for kernel regression to         nel functions satisfy the restrictions from (1). In our case we have
estimate the conditional expectation of a random variable.                       a multi-dimensional kernel function K(u) = K(u1 , u2 , . . . , ud ),
                                                                                 that can be easily presented with univariate kernel functions as:
                                                                                 K(u) = K(u1 ) · K(u2 ) · . . . · K(ud ). We used the Gaussian kernel
3 PROBLEM FORMULATION                                                            in our experiments.
We consider a traffic network with several vehicles, represented as                 Network packets are streaming data. Standard statistical and data
autonomous agents, which predict their travel time on the basis of               mining methods deal with a fixed dataset. There is a fixed size n for
their current observations and history. Each agent estimates locally             dataset and algorithms are chosen as a function of n. In streaming
the parameters of the same traffic network. In order to make a fore-             data there is no n: data are continually captured and must be pro-
cast, each agent constructs a regression model, which explains the               cessed as they arrive. It is important to develop algorithms that work
manner in which different explanatory variables (factors) influence              with non-stationary datasets, handle the streaming data directly, and
the travel time. A detailed overview of such factors is provided                 update their models on the fly [6].
in [13]. The following information is important for predicting the                  This requires recursive windowing. The kernel density estimator
travel time [15]: average speed before the current segment, number               has a simple recursive windowing method that allows the recursive
of stops, number of left turns, number of traffic light, average travel          estimation using the kernel density estimator:
time estimated by Traffic Management Centres (TMC). We should                                                                              
                                                                                                    pn (x)   pn−1 (x) + K x−X
                                                                                                                            h
                                                                                                                              n
                                                                                                                                  Yn
also take into account the possibility of an accident, network over-                       mn (x) =        =               x−X
                                                                                                                                   .                  (4)
load (rush hour) and the weather conditions.                                                        qn (x)    qn−1 (x) + K    h
                                                                                                                                n


   Let us consider a vehicle, whose goal is to drive through the defi-
nite road segment under specific environment conditions (day, time,
                                                                                 5 DECENTRALISED MULTI-AGENT
city district, weather, etc.). Let us suppose that it has no or little ex-
                                                                                   COOPERATIVE LEARNING ALGORITHM
perience of driving in such conditions. For accurate travel time esti-
mation, it contacts other traffic participants, which share their expe-          In this section, we describe the cooperation for sharing the predic-
rience in the requested point.                                                   tion experience among the agents in a network. While working with


                                                                             5
streaming data, one should take into account two main facts. The
nodes should coordinate their prediction experience over some pre-
vious sampling period and adapt quickly to the changes in the stream-


                                                                                                              5
                                                                                                                                                                         h=2
ing data, without waiting for the next coordination action.                                                                                                              h=8
                                                                                                                                                                         h=5
                                                                                                                                                                         h=3
   Let us first discuss the cooperation technique. We introduce the


                                                                                                              4
                                                                                   Number of communications
following definitions.
   Let L = {Lj | 1 ≤ j ≤ p} be a group of p agents. Each agent


                                                                                                              3
L ∈ L has a local dataset Dj = {(Xjc , Ycj )|c = 1. . . . , N j }, where
  j

Xjc is a d-dimensional vector. In order to underline the dependence


                                                                                                              2
of the prediction function (3) from the local dataset of agent Lj , we
denote the prediction function by m[Dj ](x).


                                                                                                              1
   Consider a case when some agent Li is not able to forecast for
some d-dimensional future data point Xinew because it does not have


                                                                                                              0
sufficient data in the neighbourhood of Xinew . In this case, it sends                                            0          20          40          60        80         100

a request to other traffic participants in its transmission radius by                                                                         Time

sending the data point Xinew to them. Each agent Lj that has received
the request tries to predict m[Dj ](Xinew ). If it is successful, it replies
to agent Li by sending its best data representatives D̂(j,i) from the                             Figure 1.            Average number of communications over time for different h
neighbourhood of the requested point Xinew . Let us define Gi ⊂ L,
a group of agents, which are able to reply to agent Li by sending the
requested data.                                                                    making their common experience equal to 2400. We assumed the
   To select the best data representatives, each agent Lj makes a                  maximal number of transmitted observations from one agent equals
ranking among its dataset Dj . It can be seen from (3) that each Ycj               2.
is taken with the weight wcj with respect to Xinew , where
                                               j
                                                    
                                       Xinew −Xc                                                                      Table 1.    System factors influencing the travel time
                               K           h
                    wcj =   Pn            Xi     j     .
                                            new −Xl
                               l=1
                                     K          h                                                        Variable       Description
                                                                                                            Y           travel time (min);
The observations with maximum weights wcj are the best candidates                                          X1           length of the route (km);
for sharing the experience.                                                                                X2           average speed in the system (km/h);
                                                                                                           X3           average number of stops in the system (units/min);
   All the data D̂(j,i) , Lj ∈ Gi received by agent Li should be ver-                                      X4           congestion level of the flow (Veh/h);
ified, and duplicated data should S be removed. We denote the new                                          X5           number of traffic lights in the route (units);
dataset of agent Li as Dnew  i
                                 = Lj ∈Gi D̂(j,i) . Then, the kernel                                       X6           number of left turns in the route (units).
function of agent Li is updated taking into account the additive na-
ture of this function:                                                                During the simulation, to predict more accurately, the agents used
            i
                             X                                                     the presented cooperative learning algorithm that supported the com-
         m[Dnew ](x) =               m[D̂(j,i) ](x) + m[Di ](x).        (5)
                                                                                   munication between agents with the objective of improving the pre-
                            Lj ∈Gi
                                                                                   diction quality. The necessary number of communications depends
 Finally, agent Li can autonomously make its forecast as                           on the value of the smoothing parameter h. The average number
   i
m[Dnew  ](Xinew ) for Xinew .                                                      of necessary communications is given in Figure 1. We can see the
                                                                                   manner in which the number of communications decreased with the
                                                                                   learning time. We varied h and obtained the relation between the
6 CASE STUDIES
                                                                                   communication numbers and h as a curve. The prediction ability of
We simulated a traffic network in the southern part of Hanover (Ger-               one of the agents is presented at Figure 2. Here, we can also see the
many). The network contains three parallel and five perpendicular                  relative prediction error, which decreases with time. The predictions
streets, creating fifteen intersections with a flow of approximately               that used communication between agents are denoted by solid tri-
5000 vehicles per hour.                                                            angles, and the number of such predictions also decreases with the
   The vehicles solve a travel time prediction problem. They receive               time.
information about the centrally estimated system variables (such as                   The goodness of fit of the system was estimated using a cross-
average speed, number of stops, congestion level, etc.) for this city              validation technique. We assume that each agent has its own training
district from TMC, combine it with their historical information, and               set, but it uses all observations of other agents as a test set, so we
make adjustments according to the information of other participants                use 20-fold cross-validation. To estimate the goodness of fit, we used
using the presented cooperative-learning algorithm. In this case, the              analysis of variance and generalized coefficient of determination R2
regression analysis is an essential part of the local time prediction              that provides an accurate measure of the effectiveness of the predic-
process. We consider the local kernel based regression model (3) and               tion of future outcomes by using the non-parametric model [16]. The
implement the cooperative learning algorithm (5). The factors are                  calculated R2 values and the corresponding number of the observa-
listed in Table 1. The selection and significance of these variables               tions that were not predicted (because cooperation during testing was
was considered in [4].                                                             not allowed) depending on h are listed in Table 2. From Figure 3 we
   We simulated 20 agents having their initial experience represented              can also see how R2 is distributed among the system agents. The
by a dataset of size 20 till each agent made 100 predictions, thus                 results suggest that we should find some trade-off between system


                                                                               6
                                                                                                                               learning algorithms were presented, using the non-parametric kernel-
                                                                                                                               based regression model. We demonstrated the efficiency of the sug-
                                                                         Self prediction                                       gested approach through simulation with real data from southern
                                                                         Prediction after communication
                              3.0


                                                                                                                               Hanover. The experimental results show the high efficiency of the
                                                                                                                               proposed approach. In future we are going to develop a combined
                              2.5
Relative forecasting error


                                                                                                                               approach that allows agent to choose between parametric and non-
                              2.0


                                                                                                                               parametric estimator for more accurate prediction.
                              1.5


                                                                                                                               ACKNOWLEDGEMENTS
                              1.0


                                                                                                                               The research leading to these results has received funding from the
                              0.5


                                                                                                                               European Union Seventh Framework Programme (FP7/2007-2013)
                              0.0


                                                                                                                               under grant agreement No. PIEF-GA-2010-274881. We thank Prof.
                                         0      20        40            60             80           100
                                                                                                                               J. P. Müller for useful discussions during this paper preparation.
                                                                 Time


                                                                                                                               REFERENCES
                             Figure 2.       Relative prediction error and communication frequency of a                         [1] A.L Bazzan, J. Wahle, and F. Kluegl, ‘Agents in traffic modelling - from
                                                          single agent over time                                                    reactive to social behaviour’, Advances in Artificial Intelligence (LNAI),
                                                                                                                                    1701, 509–523, (1999).
                                                                                                                                [2] R. Claes and T. Holvoet, ‘Ad hoc link traversal time prediction’, in Proc.
                                                                                                                                    of the 14th Int. IEEE Conf. on Intelligent Transportation Systems, (Oc-
                                                                                                                                    tober 2011).
                                                                                                                                [3] J.C. da Silva, C. Giannella, R. Bhargava, H. Kargupta, and M. Klusch,
                                                                                                                                    ‘Distributed data mining and agents’, Eng. Appl. Artif. Intell., 18(7),
                                                                                                                                    791–801, (2005).
                                    80


                                                                                                                                [4] J. Fiosina, ‘Decentralised regression model for intelligent forecasting
                                                                                                                                    in multi-agent traffic networks’, Distributed Computing and Artificial
                                    60
  Frequency


                                                                                                                                    Intelligence (Advances in Intelligent and Soft Computing), 151, 255–
                                                                                                                                    263, (2012).
                                    40


                                                                                                                                [5] M. Fiosins, J. Fiosina, J. Müller, and J. Görmer, ‘Agent-based integrated
                                                                                                                                    decision making for autonomous vehicles in urban traffic’, Advances on
                                    20


                                                                                                                                    Practical Applications of Agents and Multiagent Systems (Advances in
                                                                                                                                    Intelligent and Soft Computing), 88, 173–178, (2011).
                                    0


                                                                                                                                [6] Handbook of Computational Statistics: Concepts and Methods, eds.,
                                                0.75           0.80          0.85             0.90           0.95                   J. E. Gentle, W. Härdle, and Y. Mori, Springer, Berlin/Heidelberg, 2004.
                                                                                 2
                                                                                                                                [7] C. Guestrin, P. Bodik, R. Thibaux, M. Paskin, and S. Madden, ‘Dis-
                                                                             R                                                      tributed regression: an efficient framework for modeling sensor network
                                                                                                                                    data’, in Proc. of the 3rd Int. Sym. on Information Processing in Sensor
                                                                                                                                    Networks, Berkeley, (2004).
                                                                                                                                [8] W. Härdle, Applied Nonparametric Regression, Cambridge University
                                                                                                                                    Press, Cambridge, 2002.
                                     Figure 3. Distribution of goodness of fit measure R2 using                                 [9] W. Härdle, M. Müller, S. Sperlich, and A. Werwatz, Nonparametric and
                                            cross-validation for the whole system, h = 2                                            Semiparametric Models, Springer, Berlin/Heidelberg, 2004.
                                                                                                                               [10] Advances in Distributed and Parallel Knowledge Discovery, eds.,
                                                                                                                                    H. Kargupta and P. Chan, AAAI Press The MIT PressAcademic Press,
accuracy (presented by R2 ) and the number of necessary communi-                                                                    Melno Park and Cambridge and London, 2000.
                                                                                                                               [11] M. Klusch, S. Lodi, and G. Moro, ‘Agent-based distributed data min-
cations (presented by the percentage of not predicted observations),                                                                ing: The kdec scheme’, in Proc. of Int. Conf. on Intelligent Information
which depend on h. The point of trade-off should depend on the com-                                                                 Agents - The AgentLink Perspective, volume 2586 of LNCS, pp. 104–
munication and accuracy costs.                                                                                                      122. Springer, (2003).
                                                                                                                               [12] W. Lee, S. Tseng, and W. Shieh, ‘Collaborative real-time traffic infor-
                                                                                                                                    mation generation and sharing framework for the intelligent transporta-
      Table 2. R2 goodness of fit measure using cross-validation for the whole                                                      tion system’, Information Scienses, 180, 62–70, (2010).
                             system for different h                                                                            [13] H. Lin, R. Zito, and M.A.P. Taylor, ‘A review of travel-time predic-
                                                                                                                                    tion in transport and logistics’, in Proc. of the Eastern Asia Society for
                               Characteristic of System                              h=2       h=3        h=5       h=8             Transportation Studies, volume 5, pp. 1433 – 1448, Hamburg, (2005).
                                 System average R2                                   0.86      0.84       0.83      0.82       [14] G. Malnati, C. Barberis, and C.M. Cuva, ‘Gossip: Estimating actual
                        Average % of not predicted observations                        3        1          0.5      0.1             travelling time using vehicle to vehicle communication’, in Proc. of the
                                                                                                                                    4-th Int. Workshop on Intelligent Transportation, Hamburg, (2007).
                                                                                                                               [15] C.E. McKnight, H.S. Levinson, C. Kamga, and R.E. Paaswell, ‘Impact
   A linear regression model [4] applied to the same data gives lower                                                               of traffic congestion on bus travel time in northern new jersey’, Trans-
average goodness of fit R2 =0.77, however predictions can be calcu-                                                                 portation Research Record Journal, 1884, 27–35, (2004).
lated for all data points.                                                                                                     [16] J.S. Racine, ‘Consistent significance testing for nonparametric regres-
                                                                                                                                    sion’, Journal of Business and Economic Statistics, 15, 369379, (1997).
                                                                                                                               [17] B.L. Smith, B.M. Williams, and R.K. Oswald, ‘Comparison of para-
7 CONCLUSIONS                                                                                                                       metric and nonparametric models for traffic flow forecasting’, Trans-
                                                                                                                                    portation Research Part C, 10, 303–321, (2002).
In this study, the problem of travel-time prediction was considered.                                                           [18] S.S. Stankovic, M.S. Stankovic, and D.M. Stipanovic, ‘Decentralized
                                                                                                                                    parameter estimation by consensus based stochastic approximation’,
Multi-agent architecture with autonomous agents was implemented
                                                                                                                                    IEEE Trans. Automatic Controll, 56, (2009).
for this purpose. Distributed parameter estimation and cooperative


                                                                                                                           7