Cooperative Kernel-Based Forecasting in Decentralized Multi-Agent Systems for Urban Traffic Networks Jelena Fiosina and Maksims Fiosins1 Abstract. The distributed and often decentralised nature of com- probability, etc. can be evaluated by autonomous agents-vehicles in plex stochastic traffic systems having a large amount of distributed a distributed manner. In this paper, we present a general distributed data can be represented well by multi-agent architecture. Traditional regression forecasting algorithm and illustrate its efficiency in fore- centralized data mining methods are often very expensive or not fea- casting travel time. sible because of transmission limitations that lead to the need of the Numerous data processing and mining techniques were suggested development of distributed or even decentralized data mining meth- for forecasting travel time in a centralized and distributed manner. ods including distributed parameter estimation and forecasting. We Statistical methods, such as regression and time series, and artificial consider a system, where drivers are modelled as autonomous agents. intelligence methods, such as neural networks, are successfully im- We assume that agents possess an intellectual module for data pro- plemented for similar problems. However, travel time is affected by a cessing and decision making. All such devices use a local kernel- range of different factors. Thus, accurate prediction of travel time is based regression model for travel time estimation and prediction. In difficult and needs considerable amount of traffic data. Understand- this framework, the vehicles in a traffic network collaborate to opti- ing the factors affecting travel time is essential for improving predic- mally fit a prediction function to each of their local measurements. tion accuracy [13]. Rather than transmitting all data to one another or a central node in We focus on non-parametric, computationally intensive estima- a centralized approach, the vehicles carry out a part of the computa- tion, i.e. Kernel-based estimation, which is a promising technique tions locally by transmitting only limited amount of data. This lim- for solving many statistical problems, including parameter estima- ited transmission helps the agent that experiences difficulties with its tion. In our paper, we suggest a general distributed kernel-based algo- current predictions. We demonstrate the efficiency of our approach rithm and use it for forecasting travel time using real-world data from with case studies with the analysis of real data from the urban traffic southern Hanover. We assume that each agent autonomously esti- domain. mates its kernel-based regression function, whose additive nature fits very well with streaming real-time data. When an agent is not able to estimate the travel time because of lack of data, i.e., when it has no 1 INTRODUCTION data near the point of interest (because the kernel-based estimation Multi-agent systems (MAS) often deal with complex applications, uses approximations), it cooperates with other agents. An algorithm such as sensors, traffic, or logistics networks, and they offer a suit- for multi-agent cooperative learning based on based on transmission able architecture for distributed problem solving. In such applica- of the required data as a reply to the request of the agent experi- tions, the individual and collective behaviours of the agents depend encing difficulties was suggested. After obtaining the necessary data on the observed data from distributed sources. In a typical distributed from other agents, the agent can forecast travel-time autonomously. environment, analysing distributed data is a non-trivial problem be- The travel-time, estimated in the DDPM stage, can serve as an in- cause of several constraints such as limited bandwidth (in wireless put for the next stage of distributed decision making of the intelligent networks), privacy-sensitive data, distributed computing nodes, etc. agents [5]. The distributed data processing and mining (DDPM) field deals with This study contributes in the following ways: It suggests (a) a de- these challenges by analysing distributed data and offers many al- centralized kernel-based regression forecasting approach, (b) a re- gorithmic solutions for data analysis, processing, and mining using gression model with a structure that facilitates travel-time forecast- different tools in a fundamentally distributed manner that pays care- ing; and it improves the application efficiency of the proposed ap- ful attention to the resource constraints [3]. proach for the current real-world urban traffic data. Traditional centralized data processing and mining typically re- The remainder of this paper is organized as follows. Section 2 de- quires central aggregation of distributed data, which may not always scribes the related previous work in the DDPM field for MAS, kernel be feasible because of the limited network bandwidth, security con- density estimation, and travel-time prediction. Section 3 describes cerns, scalability problems, and other practical issues. DDPM car- the current problem more formally. Section 4 presents the multivari- ries out communication and computation by analyzing data in a dis- ate kernel-based regression model adopted for streaming data. Sec- tributed fashion [10]. The DDPM technology offers more efficient tion 5 describes the suggested cooperative learning algorithm for op- solutions in such applications. timal prediction in a distributed MAS architecture. Section 6 presents In this study we focus on the urban traffic domain, where many a case study and the final section contains the conclusions. traffic characteristics such as travel time, travel speed, congestion 1 Institure of Informatics, Clausthal University of Technology, Germany email: {jelena.fiosina, maksims.fiosins}@gmail.com 3 2 RELATED PREVIOUS WORK 2.2 Travel Time Prediction Models 2.1 Distributed Data Mining in Multi-agent Systems Continuous traffic jams indicate that the maximum capacity of a road network is met or even exceeded. In such a situation, the modelling A strong motivation for implementing DDPM for MAS is given by and forecasting of traffic flow is one of the important techniques that Da Silva et al. in [3], where authors argue that DDPM in MAS deals need to be developed [1]. Nowadays, knowledge about travel time with pro-active and autonomous agents that perceive their environ- plays an important role in transportation and logistics, and it can be ment, dynamically reason out actions on the basis of the environ- applied in various fields and purposes. From travellers’ viewpoints, ment, and interact with each other. In many complex domains, the the knowledge about travel time helps to reduce the travel time and knowledge of agents is a result of the outcome of empirical data improves reliability through better selection of travel routes. In lo- analysis in addition to the pre-existing domain knowledge. DDPM gistics, accurate travel time estimation could help reduce transport of agents often involves detecting hidden patterns, constructing pre- delivery costs and increase the service quality of commercial deliv- dictive and clustering models, identifying outliers, etc. In MAS, ery by delivering goods within the required time window by avoiding this knowledge is usually collective. This collective ’intelligence’ congested sections. For traffic managers, travel time is an important of MAS must be developed by distributed domain knowledge and index for traffic system operation efficiency [13]. analysis of the distributed data observed by different agents. Such There are several studies in which a centralized approach is used to distributed data analysis may be a non-trivial problem when the un- predict the travel time. The approach was used in various intelligent derlying task is not completely decomposable and computational re- transport systems, such as in-vehicle route guidance and advanced sources are constrained by several factors such as limited power sup- traffic management systems. A good overview is given in [13]. To ply, poor bandwidth connection, privacy-sensitive multi-party data. make the approach effective, agents should cooperate with each other Klusch at al. [11] concludes that autonomous data mining agents, to achieve their common goal via the so called gossiping scenarios. as a special type of information agents, may perform various kinds The estimation of the actual travelling time using vehicle-to-vehicle of mining operations on behalf of their user(s) or in collaboration communication without MAS architecture was described in [14]. with other agents. Systems of cooperative information agents for data On the other hand, a multi-agent architecture is better suited for mining in distributed, heterogeneous, or homogeneous, and massive distributed traffic networks, which are complex stochastic systems. data environments appear to be quite a natural progression for the Further, by using centralized approaches the system cannot adapt current systems to be realized in the near future. quickly to situations in real time, and it is very difficult to trans- A common feature of all approaches is that they aim at integrat- mit a large amount of information over the network. In centralized ing the knowledge that is extracted from data at different geograph- approaches, it is difficult or simply physically impossible to store ically distributed network sites with minimum network communica- and process large data sets in one location. In addition, it is known tion and maximum local computations. Local computation is carried from practice that the most drivers rely mostly on their own expe- out on each site, and either a central site communicates with each rience; they use their historical data to forecast the travel time [5]. distributed site to compute the global models or a peer-to-peer archi- Thus, decentralized multi-agent systems are are fundamentally im- tecture is used. In the case of the peer-to-peer architecture, individual portant for the representation of these networks [1]. We model our nodes might communicate with a resource-rich centralized node, but system with autonomous agents to allow vehicles to make decisions they perform most tasks by communicating with neighbouring nodes autonomously using not only the centrally processed available infor- through message passing over an asynchronous network [3]. mation, but also their historical data. A distributed system should have the following features for the Traffic information generally goes through the following three efficient implementation of DDPM: the system consists of multiple stages: data collection and cleansing, data fusion and integration, and independent data sources, which communicate only through message data distribution [12]. The system presented in [12] consists of three passing; communication between peers is expensive; peers have re- components, namely a Smart Traffic Agent, the Real-time Traffic In- source constraints (e. g. battery power) and privacy concerns [3]. formation Exchange Protocol and a centralized Traffic Information Typically, communication involves bottlenecks. Since communi- Centre that acts as the backend. A similar architecture is used in cation is assumed to be carried out exclusively by message passing, this study, but the prediction module, incorporated into Start Traf- the primary goal of several DDPM methods, as mentioned in the liter- fic Agent (vehicle agent), is different. In our study we do not focus ature, is to minimize the number of messages sent. Building a mono- on the transmission protocol describing only the information, which lithic database in order to perform non-distributed data processing should be sent from one node to another, without the descriptions of and mining may be infeasible or simply impossible in many appli- protocol packets. The centralized Traffic Information Centre in our cations. The costs of transferring large blocks of data may be very model is used only for storing system information. expensive and result in very inefficient implementations [10]. The decentralized MAS approach for urban traffic network was Moreover, sensors must process continuous (possibly fast) streams considered in [2] also, where the authors forecast the traversal time of data. The resource-constrained distributed environments of sensor for each link of the network separately. Two types of agents were networks and the need for a collaborative approach to solve many used for vehicles and links, and a neural network was used as the problems in this domain make MAS architecture an ideal candidate prediction model. for application development. For travel time forecasting different regression models can be ap- In our study we deal with homogeneous data. However, a promis- plied. Linear multivariate regression model for decentralized urban ing approach to agent-based parameter estimation for partially het- traffic network was proposed in [4]. This regression model is well- erogeneous data in sensor networks was suggested in [7]. Another studied, is parametric and allows estimation of each variables contri- decentralized approach for homogeneous data was suggested in [18] bution. However is not sufficiently effective in the case of non-linear to estimate the parameters of a wireless network by using a paramet- systems. Alternatively non-parametric kernel-based regression mod- ric linear model and stochastic approximations. els can be applied. These models can be effectively used for any types 4 of systems, however are relatively new and not well-studied yet. The In this step, the agent that experiences difficulties with a forecast efficiency of non-parametric kernel-based regression approach for sends its requested data point to other traffic participants in the trans- traffic flow forecasting in comparison to parametric approach was mission radius. The other agents try to make a forecast themselves. made in [17]. In this study we apply non-parametric kernel regres- In the case of a successful forecast, the agents share their experience sion for the similar traffic system as in [4] in order to increase the by sending their observations that are nearest to the requested point. prediction quality. After receiving the data from the other agents, the agent combines the obtained results, increases its experience, and makes a forecast autonomously. 2.3 Kernel Density Estimation In short, decentralized travel-time prediction consists of three steps: 1) local prediction; 2) in the case of unsuccessful prediction: Kernel density estimation is a non-parametric approach for estimat- selection of agents for experience sharing and sending them the re- ing the probability density function of a random variable. Kernel den- quested data point; 3) aggregation of the answers and prediction. sity estimation is a fundamental data-smoothing technique where in- ferences about the population are made on the basis of a finite data sample. A kernel is a weighting function used in non-parametric es- 4 LOCAL PARAMETER ESTIMATION timation techniques. Let X1 , X2 , . . . , Xn be an iid sample drawn from some distribu- The non-parametric approach to estimating a regression curve has tion with an unknown density f . We attempt to estimate the shape of four main purposes. First, it provides a versatile method for exploring f , whose kernel density estimator is a general relationship between two variables. Second, it can predict observations yet to be made without reference to a fixed parametric   1 X n model. Third, it provides a tool for finding spurious observations by x − Xi fˆh (x) = K , (1) studying the influence of isolated points. Fourth, it constitutes the nh h i=1 flexible method of substitution or interpolating between adjacent X- values for missing values [8]. where kernel K(•) is a non-negative real-valuedR ∞ integrable function Let us consider a non-parametric regression model [9] with a de- satisfying the following two requirements: −∞ K(u)du = 1 and pendent variable Y and a vector of d regressors X K(−u) = K(u) for all values of u; h > 0 is a smoothing parame- ter called bandwidth. The first requirement to K(•) ensures that the Y = m(x) + ǫ, (2) kernel density estimator is a probability density function. The second requirement to K(•) ensures that the average of the corresponding where ǫ is the disturbance term such that E(ǫ|X = x) = 0 and distribution is equal to that of the sample used [8]. Different kernel V ar(ǫ|X = x) = σ 2 (x), and m(x) = E(Y |X = x). Further, functions are available: Uniform, Epanechnikov, Gausian, etc. They let (Xi , Yi )n i=1 be the observations sampled from the distribution of differ in the manner in which they take into account the vicinity ob- (X, Y ). Then the Nadaraya-Watson kernel estimator is servations to estimate the function from the given variables. Pn x−Xi  A very suitable property of the kernel function is its additive na- i=1 K h Yi pn (x) mn (x) = P n x−Xi  = , (3) ture. This property makes the kernel function easy to use for stream- K h qn (x) i=1 ing and distributed data [8], [3], [7]. In [11], the distributed kernel- based clustering algorithm was suggested on the basis of the same where K(•) is the kernel function of Rd and h is the bandwidth. Ker- property. In this study, kernel density is used for kernel regression to nel functions satisfy the restrictions from (1). In our case we have estimate the conditional expectation of a random variable. a multi-dimensional kernel function K(u) = K(u1 , u2 , . . . , ud ), that can be easily presented with univariate kernel functions as: K(u) = K(u1 ) · K(u2 ) · . . . · K(ud ). We used the Gaussian kernel 3 PROBLEM FORMULATION in our experiments. We consider a traffic network with several vehicles, represented as Network packets are streaming data. Standard statistical and data autonomous agents, which predict their travel time on the basis of mining methods deal with a fixed dataset. There is a fixed size n for their current observations and history. Each agent estimates locally dataset and algorithms are chosen as a function of n. In streaming the parameters of the same traffic network. In order to make a fore- data there is no n: data are continually captured and must be pro- cast, each agent constructs a regression model, which explains the cessed as they arrive. It is important to develop algorithms that work manner in which different explanatory variables (factors) influence with non-stationary datasets, handle the streaming data directly, and the travel time. A detailed overview of such factors is provided update their models on the fly [6]. in [13]. The following information is important for predicting the This requires recursive windowing. The kernel density estimator travel time [15]: average speed before the current segment, number has a simple recursive windowing method that allows the recursive of stops, number of left turns, number of traffic light, average travel estimation using the kernel density estimator: time estimated by Traffic Management Centres (TMC). We should  pn (x) pn−1 (x) + K x−X h n Yn also take into account the possibility of an accident, network over- mn (x) = = x−X  . (4) load (rush hour) and the weather conditions. qn (x) qn−1 (x) + K h n Let us consider a vehicle, whose goal is to drive through the defi- nite road segment under specific environment conditions (day, time, 5 DECENTRALISED MULTI-AGENT city district, weather, etc.). Let us suppose that it has no or little ex- COOPERATIVE LEARNING ALGORITHM perience of driving in such conditions. For accurate travel time esti- mation, it contacts other traffic participants, which share their expe- In this section, we describe the cooperation for sharing the predic- rience in the requested point. tion experience among the agents in a network. While working with 5 streaming data, one should take into account two main facts. The nodes should coordinate their prediction experience over some pre- vious sampling period and adapt quickly to the changes in the stream- 5 h=2 ing data, without waiting for the next coordination action. h=8 h=5 h=3 Let us first discuss the cooperation technique. We introduce the 4 Number of communications following definitions. Let L = {Lj | 1 ≤ j ≤ p} be a group of p agents. Each agent 3 L ∈ L has a local dataset Dj = {(Xjc , Ycj )|c = 1. . . . , N j }, where j Xjc is a d-dimensional vector. In order to underline the dependence 2 of the prediction function (3) from the local dataset of agent Lj , we denote the prediction function by m[Dj ](x). 1 Consider a case when some agent Li is not able to forecast for some d-dimensional future data point Xinew because it does not have 0 sufficient data in the neighbourhood of Xinew . In this case, it sends 0 20 40 60 80 100 a request to other traffic participants in its transmission radius by Time sending the data point Xinew to them. Each agent Lj that has received the request tries to predict m[Dj ](Xinew ). If it is successful, it replies to agent Li by sending its best data representatives D̂(j,i) from the Figure 1. Average number of communications over time for different h neighbourhood of the requested point Xinew . Let us define Gi ⊂ L, a group of agents, which are able to reply to agent Li by sending the requested data. making their common experience equal to 2400. We assumed the To select the best data representatives, each agent Lj makes a maximal number of transmitted observations from one agent equals ranking among its dataset Dj . It can be seen from (3) that each Ycj 2. is taken with the weight wcj with respect to Xinew , where  j  Xinew −Xc Table 1. System factors influencing the travel time K h wcj = Pn  Xi j . new −Xl l=1 K h Variable Description Y travel time (min); The observations with maximum weights wcj are the best candidates X1 length of the route (km); for sharing the experience. X2 average speed in the system (km/h); X3 average number of stops in the system (units/min); All the data D̂(j,i) , Lj ∈ Gi received by agent Li should be ver- X4 congestion level of the flow (Veh/h); ified, and duplicated data should S be removed. We denote the new X5 number of traffic lights in the route (units); dataset of agent Li as Dnew i = Lj ∈Gi D̂(j,i) . Then, the kernel X6 number of left turns in the route (units). function of agent Li is updated taking into account the additive na- ture of this function: During the simulation, to predict more accurately, the agents used i X the presented cooperative learning algorithm that supported the com- m[Dnew ](x) = m[D̂(j,i) ](x) + m[Di ](x). (5) munication between agents with the objective of improving the pre- Lj ∈Gi diction quality. The necessary number of communications depends Finally, agent Li can autonomously make its forecast as on the value of the smoothing parameter h. The average number i m[Dnew ](Xinew ) for Xinew . of necessary communications is given in Figure 1. We can see the manner in which the number of communications decreased with the learning time. We varied h and obtained the relation between the 6 CASE STUDIES communication numbers and h as a curve. The prediction ability of We simulated a traffic network in the southern part of Hanover (Ger- one of the agents is presented at Figure 2. Here, we can also see the many). The network contains three parallel and five perpendicular relative prediction error, which decreases with time. The predictions streets, creating fifteen intersections with a flow of approximately that used communication between agents are denoted by solid tri- 5000 vehicles per hour. angles, and the number of such predictions also decreases with the The vehicles solve a travel time prediction problem. They receive time. information about the centrally estimated system variables (such as The goodness of fit of the system was estimated using a cross- average speed, number of stops, congestion level, etc.) for this city validation technique. We assume that each agent has its own training district from TMC, combine it with their historical information, and set, but it uses all observations of other agents as a test set, so we make adjustments according to the information of other participants use 20-fold cross-validation. To estimate the goodness of fit, we used using the presented cooperative-learning algorithm. In this case, the analysis of variance and generalized coefficient of determination R2 regression analysis is an essential part of the local time prediction that provides an accurate measure of the effectiveness of the predic- process. We consider the local kernel based regression model (3) and tion of future outcomes by using the non-parametric model [16]. The implement the cooperative learning algorithm (5). The factors are calculated R2 values and the corresponding number of the observa- listed in Table 1. The selection and significance of these variables tions that were not predicted (because cooperation during testing was was considered in [4]. not allowed) depending on h are listed in Table 2. From Figure 3 we We simulated 20 agents having their initial experience represented can also see how R2 is distributed among the system agents. The by a dataset of size 20 till each agent made 100 predictions, thus results suggest that we should find some trade-off between system 6 learning algorithms were presented, using the non-parametric kernel- based regression model. We demonstrated the efficiency of the sug- Self prediction gested approach through simulation with real data from southern Prediction after communication 3.0 Hanover. The experimental results show the high efficiency of the proposed approach. In future we are going to develop a combined 2.5 Relative forecasting error approach that allows agent to choose between parametric and non- 2.0 parametric estimator for more accurate prediction. 1.5 ACKNOWLEDGEMENTS 1.0 The research leading to these results has received funding from the 0.5 European Union Seventh Framework Programme (FP7/2007-2013) 0.0 under grant agreement No. PIEF-GA-2010-274881. We thank Prof. 0 20 40 60 80 100 J. P. Müller for useful discussions during this paper preparation. Time REFERENCES Figure 2. Relative prediction error and communication frequency of a [1] A.L Bazzan, J. Wahle, and F. Kluegl, ‘Agents in traffic modelling - from single agent over time reactive to social behaviour’, Advances in Artificial Intelligence (LNAI), 1701, 509–523, (1999). [2] R. Claes and T. Holvoet, ‘Ad hoc link traversal time prediction’, in Proc. of the 14th Int. IEEE Conf. on Intelligent Transportation Systems, (Oc- tober 2011). [3] J.C. da Silva, C. Giannella, R. Bhargava, H. Kargupta, and M. Klusch, ‘Distributed data mining and agents’, Eng. Appl. Artif. Intell., 18(7), 791–801, (2005). 80 [4] J. Fiosina, ‘Decentralised regression model for intelligent forecasting in multi-agent traffic networks’, Distributed Computing and Artificial 60 Frequency Intelligence (Advances in Intelligent and Soft Computing), 151, 255– 263, (2012). 40 [5] M. Fiosins, J. Fiosina, J. Müller, and J. Görmer, ‘Agent-based integrated decision making for autonomous vehicles in urban traffic’, Advances on 20 Practical Applications of Agents and Multiagent Systems (Advances in Intelligent and Soft Computing), 88, 173–178, (2011). 0 [6] Handbook of Computational Statistics: Concepts and Methods, eds., 0.75 0.80 0.85 0.90 0.95 J. E. Gentle, W. Härdle, and Y. Mori, Springer, Berlin/Heidelberg, 2004. 2 [7] C. Guestrin, P. Bodik, R. Thibaux, M. Paskin, and S. Madden, ‘Dis- R tributed regression: an efficient framework for modeling sensor network data’, in Proc. of the 3rd Int. Sym. on Information Processing in Sensor Networks, Berkeley, (2004). [8] W. Härdle, Applied Nonparametric Regression, Cambridge University Press, Cambridge, 2002. Figure 3. Distribution of goodness of fit measure R2 using [9] W. Härdle, M. Müller, S. Sperlich, and A. Werwatz, Nonparametric and cross-validation for the whole system, h = 2 Semiparametric Models, Springer, Berlin/Heidelberg, 2004. [10] Advances in Distributed and Parallel Knowledge Discovery, eds., H. Kargupta and P. Chan, AAAI Press The MIT PressAcademic Press, accuracy (presented by R2 ) and the number of necessary communi- Melno Park and Cambridge and London, 2000. [11] M. Klusch, S. Lodi, and G. Moro, ‘Agent-based distributed data min- cations (presented by the percentage of not predicted observations), ing: The kdec scheme’, in Proc. of Int. Conf. on Intelligent Information which depend on h. The point of trade-off should depend on the com- Agents - The AgentLink Perspective, volume 2586 of LNCS, pp. 104– munication and accuracy costs. 122. Springer, (2003). [12] W. Lee, S. Tseng, and W. Shieh, ‘Collaborative real-time traffic infor- mation generation and sharing framework for the intelligent transporta- Table 2. R2 goodness of fit measure using cross-validation for the whole tion system’, Information Scienses, 180, 62–70, (2010). system for different h [13] H. Lin, R. Zito, and M.A.P. Taylor, ‘A review of travel-time predic- tion in transport and logistics’, in Proc. of the Eastern Asia Society for Characteristic of System h=2 h=3 h=5 h=8 Transportation Studies, volume 5, pp. 1433 – 1448, Hamburg, (2005). System average R2 0.86 0.84 0.83 0.82 [14] G. Malnati, C. Barberis, and C.M. Cuva, ‘Gossip: Estimating actual Average % of not predicted observations 3 1 0.5 0.1 travelling time using vehicle to vehicle communication’, in Proc. of the 4-th Int. Workshop on Intelligent Transportation, Hamburg, (2007). [15] C.E. McKnight, H.S. Levinson, C. Kamga, and R.E. Paaswell, ‘Impact A linear regression model [4] applied to the same data gives lower of traffic congestion on bus travel time in northern new jersey’, Trans- average goodness of fit R2 =0.77, however predictions can be calcu- portation Research Record Journal, 1884, 27–35, (2004). lated for all data points. [16] J.S. Racine, ‘Consistent significance testing for nonparametric regres- sion’, Journal of Business and Economic Statistics, 15, 369379, (1997). [17] B.L. Smith, B.M. Williams, and R.K. Oswald, ‘Comparison of para- 7 CONCLUSIONS metric and nonparametric models for traffic flow forecasting’, Trans- portation Research Part C, 10, 303–321, (2002). In this study, the problem of travel-time prediction was considered. [18] S.S. Stankovic, M.S. Stankovic, and D.M. Stipanovic, ‘Decentralized parameter estimation by consensus based stochastic approximation’, Multi-agent architecture with autonomous agents was implemented IEEE Trans. Automatic Controll, 56, (2009). for this purpose. Distributed parameter estimation and cooperative 7