Holistic distributed stream clustering for smart grids Pedro Pereira Rodrigues1 and João Gama2 Abstract. Smart grids consist of millions of automated electronic to use ant collony optimization on smart meters data to improve the meters that will be installed in electricity distribution networks and current balancing on low-voltage distribution network. Further re- connected to servers that will manage grid supervision, billing and search could even take more advantages from smart grids if con- customer services. World sustainability regarding energy manage- sumption patterns could be extracted [14]. ment will definitely rely on such grids, so smart grids need also to The energy market is changing to meet the global challenge of be sustainable themselves. This sustainability depends on several re- power consumption awareness even at the lower household level [3]. search problems that emerge from this new setting (from power bal- New energy distribution concepts and the advent of smart grids has ance to energy markets) requiring new approaches for knowledge changed the way energy is priced, negotiated and billed. We are discovery and decision support. This paper presents a holistic dis- now in a world of hourly real-time pricing [1] which make use of tributed stream clustering view of possible solutions for those prob- smart meters to overcome the need for demand prediction preci- lems, supported by previous research in related domains. The ap- sion and, more important, demand prediction reliability [13]. Fur- proach is based on two orthogonal clustering algorithms, combined thermore, with the advent of micro-generation at household level, the for a holistic clustering of the grid. Experimental results are included market expanded into multiplicity of energy buyers and energy sell- to illustrate the benefits of each algorithm, while the proposal is dis- ers. In this setting, new techniques to efficiently auction in the market cussed in terms of application to smart grid problems. This holistic are required in order to make the smart grid smarter. Ramachandran approach could be used to help solving some of the smart grid intel- et al. (2011) developed a profit-maximizing adaptive bidding strategy ligent layer research problems, thus improving global sustainability. based on hybrid-immune-system-based particle swarm optimization. 1 INTRODUCTION 1.2 Components and features The Smart Grid (SG), regarded as the next generation power grid, Smart grids are built on different sub-systems and present special is an electric system that uses two-way digital information, cyber- features that need to be attended. The sources of energy are hetero- secure communication technologies, and computational intelligence geneous (power plants, wind, sun, sea, etc) and might be intermittent. in an integrated fashion across heterogeneous and distributed elec- A key characteristic of a SG is that it supports two-way flow of elec- tricity generation, transmission, distribution and consumption to tricity and information: a user might generate electricity and put it achieve energy efficiency. It is a loose integration of complementary back into the grid; electric vehicles may be used as mobile batter- components, subsystems, functions, and services under the pervasive ies, sending power back to the grid when demand is high, etc. This control of highly intelligent management-and-control systems [4]. backward flow is relevant, mainly in microgrids, where parts of the A key and novel characteristic of smart grids is the intelligent layer system that might be islanded due to power failures. Following [4], that analyses the data produced by these meters allowing companies the three major systems in SG are: to develop powerful new capabilities in terms of grid management, planning and customer services for energy efficiency. The develop- • Smart infrastructure system that supports advanced and heteroge- ment of the market with a growing share of load management incen- neous electricity generation, delivery and consumption. Is respon- tives and the increasing number of local generators will bring new sible for metering information and monitoring, and information difficulties to grid management and exploitation. transmission among of systems, devices and sensors. • Management systems providing advanced management and mon- itoring, grid topology and control services. The objectives are en- 1.1 Research problems ergy efficiency improvement, supply and demand balance, emis- Power and current balance is major goal of all electricity distribu- sion control, operation cost reduction, and utility maximization. tion networks, given its impact on the need to produce, buy or sell • Protection system providing grid reliability analysis, failure pro- energy. Moreover, due to the fluctuating power from renewable en- tection, security and privacy protection services. ergy sources and loads, supply-demand balancing of power system becomes problematic [17]. Several intelligent techniques have been proposed in the past that make use of the amounts of streaming data 1.3 Advantages and challenges that is available. As an example, Pasdar and Mahne (2011) proposed Some of the anticipated benefits of a SG include [4]: 1 LIAAD - INESC TEC & Faculty of Medicine of the University of Porto, Portugal, email: pprodrigues@med.up.pt • improving power reliability and quality; 2 LIAAD - INESC TEC & Faculty of Economics of the University of Porto, • optimizing facility utilization and averting construction of back- Portugal, email: jgama@fep.up.pt up (peak load) power plants; 18 • enhancing capacity and efficiency of existing electric power net- to different groups [6]. There are two different clustering problems works, hence improving resilience to disruption; in ubiquitous and streaming settings: clustering sensor streams and • enabling predictive maintenance and self-healing responses to clustering streaming sensors. The former problem searches for dense system disturbances; regions of the data space, identifying hot-spots where sensors tend to • facilitating expanded deployment of renewable energy sources; produce data, while the latter finds groups of sensors that behave sim- • accommodating distributed power sources, while automating ilarly through time [15]. We identify two different settings for clus- maintenance and operation; tering problems in smart grids. In the first setting a cluster is defined • reducing greenhouse gas emissions by enabling electric vehicles to be a set of sensors (meters, households, generators, etc.). In the and new power sources, thus reducing oil consumption by reduc- second setting, a cluster is defined to be a set of data points (demand, ing the need for inefficient generation during peak usage periods; supply, prices, etc.) generated by multiple sources. • presenting opportunities to improve grid security; • enabling transition to plug-in electric vehicles and new energy storage options; 2.1 Research on clustering electrical networks • increasing consumer choice, new products, services, and markets. Several real-world applications use machine learning methods to ex- tract knowledge from sensor networks. The case of electricity load All these jointly lead to massive research problems that might be demand analysis is a paradigmatic one that has been (and continues tackled by artificial intelligence techniques. Some challenges where to be) studied. Sensors distributed all around electrical-power distri- machine learning can play a relevant role, include: bution networks produce streams of data at high-speed. Three major • The reliability of the system supports itself on millions of meters questions rise: a) can we define consumption profiles based on simi- and other devices that require online monitoring and global asset lar sensors? b) can we find global patterns in network consumption? management [2]. and c) can we manage the uncertainty in sensor data? • Real-time simulation and contingency analysis of the entire grid To efficiently find consumption profiles, clustering techniques have to be possible. However, not all operations models currently were applied to the streams produced by each sensor, either hier- make use of real-time data [8]. archically at a central server [16] or distributed in the network [15]. • Interoperability issues that arise from the integration of distributed Although the problem is still very hard to model, given the dimen- generation and alternate energy sources [17]. sionality of the networks at stake, the incremental systems evolved • The heterogeneity and volatility of smart grids require mecha- and adapt to changes in the data, bridging the gap to future paths nisms to allow islanding [9] and self-healing [2]. of research. Regarding global network patterns, related research has • Finer granularity in management leads to strong demand response resulted in a system that distributes the clustering process into lo- requirements [7] and dynamic pricing strategies [1]. cal and central tasks, based on single sensor data discretization and centralized clustering of frequent states [5]. But data and models are both uncertain. For example, if a sensor reads 100, most of times it 2 THE DATA MINING POINT OF VIEW could be 99 or 101. This uncertainty has been tackled by reliability Present SG monitoring systems suffer from the lack of machine estimators and improved predictions using those estimates [13], but learning technologies that can adapt the behavior of monitoring sys- reliability for clustering definitions is still uncharted territory. tems on the basis of the sequence patterns arriving over time. From a data mining point of view, a smart grid is a network (eventually 2.2 Clustering as a smart grid problem solver decomposable) of distributed sources of high-speed data streams. Smart meters produce streams of data continuously in real-time. A In this work we argue that major smart grids problems previously data stream is an ordered sequence of instances that can be read only enunciated can and should be addressed as unsupervised machine once or a small number of times [6, 10], using limited computing and learning problems. storage capabilities. These sources of data are characterized by being open-ended, flowing at high-speed, and generated by non stationary Power balance Power balance is the most basic-level problem that distributions.In smart grids the dynamics of data are unknown; the smart grids need to solve before anything else. The strongest re- topology of network changes over time, the number of meters tends quirement is that energy is available in the entire network. Hence, to increase and the context where the meter acts evolves over time. clustering the data and sources together to find hot-spots can de- In smart grids, several knowledge discovery tasks are involved: tect specific points of danger in the network. prediction, cluster (profiling) analysis, event and anomaly detection, Multiple alternate sources In smart grids, supply and demand correlation analysis, etc. However, different types of devices present must be leveled across multiple alternate sources. Hence, com- different levels of resources and care should be taken in data mining bining clustering definitions for power demand and power supply methods that aim to extract knowledge from such restricted scenar- should give indications on how to better level the sources. ios. All these characteristics constitute real challenges and oppor- Contingency analysis Contingency analysis tries to produce detec- tunities for applied research in ubiquitous data mining. Generally, tion and reaction mechanisms to specific unexpected problems. the main features inherent to ubiquitous learning algorithms are that Hence, monitoring the evolution of clusters of nodes, should help the system should be capable of process data incrementally, evolving on detecting drifting sources of demand or supply. over time, while monitoring the evolution of its own learning process Islanding Islanding is a concept that is directly connected with clus- and self-diagnosis this process. However, learning algorithms differ tering, in the sense that it searches for subnetworks where de- in the extent of self-awareness they offer in this diagnosis. . mand and supply are leveled. Hence, local distributed clustering One of the most popular knowledge discovery techniques is clus- of sources and data should produce the expected definitions. tering, the process of finding groups in data such that data objects Self-healing Self-healing relates to the ability to rearrange and clustered in the same group are more alike than objects assigned adapt the network on-the-fly to meet unexpected changes. Hence, 19 ad-hoc distributed clustering of sources, independently from a Impact of the number of sensors on Kappa centralized server, should produce procedures for self-healing. 1.0 Online monitoring and asset management These features are − Averaged over values of s for each domain−cluster (d,k) pair − − − − strongly connected with incremental learning and adaptation of − ● k=3 − − − k=4 0.9 ● − − − − − ● k=5 − learned models. Hence, incremental models for sources and data − − − ● k=6 − − − ● k=2 − − − − − − − ● k=7 clustering, and their evolution, should provide basic information. 0.8 − − Kappa − − − − − − − − − − Dynamic energy pricing Energy pricing largely depends on supply − − − − − − 0.7 − − − and demand balance. Hence, clustering power demand and supply − together with buy and sell prices, should give insights on prospec- 0.6 tive energy pricing. 8 16 32 64 128 Number of sensors (d) 3 HOLISTIC DISTRIBUTED CLUSTERING Impact of the number of sensors on Kappa 1.0 Averaged over values of k for each domain−overlap (d,s) pair − The smart grid produces different types of data, on each source (node − − − − − − − − ● s=0.01 − or subnetwork), which must be taken into account: power demand, s=0.05 0.9 ● − − ● s=0.1 − power supply, energy sell price, energy buy price. As previously − − − − 0.8 Kappa stated, two clustering problems exist: clustering data and clustering − − − − − data sources. This way, each node might be assigned to a cluster on − 0.7 − − (at least) eight different clustering definitions. For all problems, there − − 0.6 − is a common requirement: each node (meter) should process locally − their own data. Only aggregated data should be shared between the 8 16 32 64 128 different nodes in the grid. Number of sensors (d) From the previous section it became clear that a holistic approach Impact of Communication Incompleteness in Agreement (σ=0.05) to clustering in smart grids is needed and should produce benefits 1.0 to energy sustainability. In this section we present such a proposal, based on two existing works on stream clustering (L2GClust and Average Proportion of Agreement, P(A) 0.9 DGClust) and their prospective integration in a multi-dimensional 0.8 clustering system. Next sections present the original clustering algo- rithms, their application to electricity demand sensor data streams, 0.7 and how they could be merged into a holistic clustering system. 0.6 0.5 3.1 L2GClust: Distributed clustering of grid nodes 0.00 0.10 0.20 0.30 0.40 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.87 0.89 0.91 0.93 0.95 0.97 0.99 1.00 Probability of Message Loss (λ) Clustering streaming data sources has been recently tackled in re- search, but usual clustering algorithms need the data streams to be Figure 1. L2GClust: sensitivity of κ̂ statistic to the number of sensors (d), fed to a central server [15]. Considering the number of sensors possi- for different number (k) and overlap (s) of clusters. Bottom plot presents the bly included in a smart grid, this requirement could be a bottleneck. impact of communication incompleteness on average proportion of A local algorithm was proposed to perform clustering of sensors on agreement for 5 clusters in a 128 sensor network. ubiquitous sensor networks, based on the moving average of each node’s data over time [15]. L2GClust has two main characteristics. On one hand, each sensor node keeps a sketch of its own data. On the One important task in electrical networks is to define profiles other hand, communication is limited to direct neighbors, so cluster- of consumers, to better predict their behavior in the near future. ing is computed at each node. The moving average of each node is L2GClust was applied to a sample of an electrical network to try approximated using memoryless fading average, while clustering is to find such profiles. From the raw data received at each sub-station, based on the furthest point algorithm applied to the centroids com- observations were aggregated on a hourly basis over more than two puted by the node’s direct neighbors. This way, each sensor acts as and a half years [14]. The log of electricity demand data from active data stream source but also as a processing node, keeping a sketch of power sensors was used to check whether consumer profiles would its own data, and a definition of the clustering structure of the entire rise. The log has hourly data from a subsample (780 sensors) of the network of data sources. entire data set (∼4000 sensors). Since no information existed on the Global evaluation of the L2GClust algorithm on synthetic data re- actual electricity distribution network, the simulator used this dataset vealed high agreement with the centralized, yet streaming, counter- as input data to a random network and monitored the resulting clus- part, being especially robust in terms of cluster separability. Also, for tering structures. Unfortunately, real data is never clean, and half of stable concepts, empirical evidence of convergence was found. On the sensors have more than 27% missing values, which naturally hin- the other hand, sensitivity analysis exposed the robusteness of the dered the analysis. Given this, and the dynamic nature of the data, local algorithm approach. Figure 1 shows that agreement levels are no convergence was possible in the clustering structures. However, robust to an increase on the number of clusters, being, however, a bit we could stress that, as more data is being fed to the system, better more sensitive with respect to network size and cluster overlapping. agreement can be achieved with the centralized approach, as exposed Nonetheless, the robusteness to network communication problems is in Figure 2. Hence, not only does the agreement tend to increase with exposed, as the proportion of agreement is harmed only for high lev- more observations, but also changes on the clustering structure are els of communication incompleteness. apparently possible to detect. L2GClust presented good characteris- 20 Evolution of Clustering Validity The Distributed Grid Clustering (DGClust) algorithm was pro- posed for clustering data points produced on wide sensor net- 1.0 P(A) works [5]. The rationale is to use: a) online discretization of each single sensor data, tracking changes of data intervals (states) instead 0.8 Kappa of raw data (to reduce communication to central server); b) frequent 0.6 Validity state monitoring at the central server, preventing processing all possi- 0.4 ble state combinations (to cut computation); and c) online clustering 0.2 of frequent states (to keep high validity and adaptivity). Each local sensor receives data from a given source, producing a univariate data 0.0 stream, which is potentially infinite. Therefore, each sensor’s data is 0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570 600 630 660 690 720 750 780 810 840 870 900 930 Days processed locally, being incrementally discretized into a univariate adaptive grid. Each new data point triggers a cell in this grid, reflect- Figure 2. L2GClust evolution of clustering agreement (probability of ing the current state of the data stream at the local site. Whenever agreement and κ̂ statistic) for a real active power sensor data log. a local site changes its state, that is, the triggered cell changes, the new state is communicated to a central site. Furthermore, the cen- Impact of the number of sensors on the normalized loss tral site keeps the global state of the entire network where each local Averaged over values of m for each domain−granularity (d,w) pair and normalized by number of sensors site’s state is the cell number of each local site’s grid. Nowadays, sen- sor networks may include thousands of sensors. This scenario yields 0.10 an exponential number of cell combinations to be monitored by the 0.08 Average Normalized Loss ● ● w=7 w=5 central site. However, it is expected that only a small number of this 0.06 w=13 combinations are frequently triggered by the whole network, so, par- ● ● w=17 ● w=15 w=21 allel to the aggregation, the central site keeps a small list of counters ● 0.04 ● w=19 ● w=11 of the most frequent global states. Finally, the current clustering defi- ● w=9 0.02 nition is defined and maintained by an adaptive partitional clustering 2 8 16 32 64 128 algorithm applied on the frequent states central points. Number of sensors (d) To evaluate the sensitivity of the system to the number of sensors, Impact of the number of sensors on communication synthetic data was used and the average result for a given value of granularity (w), averaged over all values of number of frequent states 100% Averaged over values of m for each domain−granularity (d,w) pair to monitor (m, as loss seemed to be only lightly dependent on this 90% factor) was analyzed. In figure 3 we note no clear trend, strengthen- Average Communication ● w=21 ing the evidence of robusteness to wide sensor networks. Regarding 80% ● w=19 ● w=17 w=15 communication reduction when compared with centralized cluster- ● w=13 70% ● ● w=11 w=9 ing, figure 3 also shows that the amount of communication reduction ● ● w=7 60% ● w=5 does not depend on the number of sensors. This way, the benefits of 50% reduced transmission rates are extensible to wide sensor networks. 40% 2 8 16 32 64 128 Number of sensors (d) 3.3 HDClust: Holistic Distributed Clustering The two algorithms previously exposed are designed for streaming Figure 3. DGClust: impact of the number of sensors on loss to real data, and work with reduced computational costs in terms of memory centroids (top) and communication reduction (bottom) [5]. and communications bandwidth. They present strong characteristics that could be even improved if used together. In L2GClust, each sen- tics to find clusters of sensors in wide networks such as smart grids. sor node each node has an approximation of the global clustering. In DGClust, a centralized site maintains the global cluster structure of the entire network at reduced communication costs. The main idea of 3.2 DGClust: Grid clustering of grid data streams the Holistic Distributed Clustering (HDClust) is to integrate the local Clustering data points is probably the most common unsupervised distributed approach of L2GClust, with the grid data clustering ap- learning process in knowledge discovery. In ubiquitous settings, proach of DGClust, in order to achieve the holistic clustering of data however, there aren’t many tailored solutions to try to extract knowl- and sources on sensor networks such as smart grids. Specifically, for edge in order to define dense regions of the sensor data space. Clus- each measured dimension: tering examples in sensor networks can be used to search for hot- spots where sensors tend to produce data. In this settings, grid-based • each local node (meter) keeps a sketch of its own data streams (as clustering represents a major asset as regions can be, strictly or in L2GClust) and a local discretization grid (as in DGClust); loosely, defined by both the user and the adaptive process [5]. The ap- • communication is restricted to the neighborhood (as in L2GClust); plication of clustering to grid cells enhances the abstraction of cells • at regular intervals, each local node receives from its neighbors as interval regions which are better interpreted by humans. More- the estimates of the clusters centroids (as in L2GClust) and the over, comparing intervals or grids is usually easier than comparing current data discretized grid cell (as in DGClust); exact points, as an external scale is not required: intervals have in- • each node keeps an estimate of the global clustering of nodes by trinsic scaling. The comprehension of how sensors are interacting in clustering neighbors’ centroids (as in L2GClust); the network is greatly improved by using grid-based clustering tech- • each node keeps a frequent state list and maintains a clustering of niques for the data examples produced by sensors. the most frequent states (as in DGClust) from the neighbors; 21 be computed: profiling, anomaly and event detection, outliers de- tections, trends, deviations, etc. In this paper, we have discussed read dimension data distributed clustering algorithm for data streams produced on wide sensor networks like smart grids. Furthermore, we have shown how smart grid problems can be addressed as clustering problems, and proposed a holistic approach to better extract knowledge from the sketch data flag grid cell grid. We believe that this holistic approach could be used to help solving some of the smart grid intelligent layer research problems. get nodes centroids get neighbors cells Current research focus on the integration of both algorithms into the L2GClust DGClust schema and its evaluation on real-world electrical networks data. check frequent items ensemble clustering ACKNOWLEDGEMENTS This work is funded by the ERDF data clustering through Programme COMPETE and by the Portuguese Government through FCT, projects PEst-C/SAU/UI0753/2011 and PTDC/EIA/098355/2008. The authors acknowledge the help of Luı́s Lopes and João Araújo. global nodes centroids neighbor data centroids c k REFERENCES [1] H. Allcott, ‘Rethinking real-time electricity pricing’, Resource and En- bi-clustering ergy Economics, 33(4), 820–842, (2011). [2] S.M. Amin, ‘Smart grid: Overview, issues and opportunities. advances and challenges in sensing, modeling, simulation, optimization and con- other dimensions trol’, European Journal of Control, 17(5-6), 547–567, (2011). other dimensions centroids [3] F. Benzi, N. Anglani, E. Bassi, and L. Frosini, ‘Electricity smart meters interfacing the households’, IEEE Transactions on Industrial Electron- ics, 58(10), 4487–4494, (2011). [4] Xi Fang, Satyajayant Misra, Guoliang Xue, and Dejun Yang, ‘Smart holistic clustering grid – the new and improved power grid: a survey’, IEEE Communica- tions Surveys & Tutorials, (2012). (to appear). HDClust [5] João Gama, Pedro Pereira Rodrigues, and Luı́s Lopes, ‘Clustering dis- tributed sensor data streams using local processing and reduced com- munication’, Intelligent Data Analysis, 15(1), 3–28, (January 2011). [6] Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani, and Liadan O’Callaghan, ‘Clustering data streams: Theory and practice’, Figure 4. HDClust schema to be applied at each node, for each included IEEE Transactions on Knowledge and Data Engineering, 15(3), 515– dimension. Left branch applies L2GClust while right branch applies 528, (2003). DGClust using data from the neighbors, each node acting also as central [7] A. Iwayemi, P. Yi, X. Dong, and C. Zhou, ‘Knowing when to act: An clustering agent. Both clustering definitions are then combined and optimal stopping method for smart grid demand response’, IEEE Net- integrated with other measured dimensions. work, 25(5), 44–49, (2011). [8] J.A. Kavicky, ‘Impacts of smart grid data on parallel path and contin- gency analysis efforts’, in IEEE PES General Meeting, (2010). [9] R.H. Lasseter, ‘Smart distribution: Coupled microgrids’, Proceedings • to link clustering of sources with clustering of data, each node also of the IEEE, 99(6), 1074–1082, (2011). receives from the neighbors their self assignment to a cluster. [10] S. Muthukrishnan, Data Streams: Algorithms and Applications, Now Publishers Inc, New York, NY, 2005. In the resulting cluster structure, each sensor maintains C clusters of [11] A. Pasdar and H.H. Mehne, ‘Intelligent three-phase current balanc- ing technique for single-phase load based on smart metering’, Interna- data sources, and K clusters of data points. tional Journal of Electrical Power and Energy Systems, 33(3), 693–698, In a smart grid context, and taking advantage of the decomposable (2011). property of the grid network (microgrids), L2GClust and DGClust [12] B. Ramachandran, S.K. Srivastava, C.S. Edrington, and D.A. Cartes, can work together. Assume a microgrid of D sensors, and 4 dimen- ‘An intelligent auction scheme for smart grid market using a hybrid im- mune algorithm’, IEEE Transactions on Industrial Electronics, 58(10), sions or quantities of interest: power demand, power supply, energy 4603–4612, (2011). sell price and energy buy price. The resulting HDClust, the network [13] Pedro Pereira Rodrigues, Zoran Bosnić, João Gama, and Igor is summarized by C clusters of data sources, and K clusters of data Kononenko, ‘Estimating reliability for assessing and correcting individ- points, for each quantity of interest. In real-time and at each moment, ual streaming predictions’, in Reliable Knowledge Discovery, 267–287, each sensor is in a state hci , ki i in each dimension. Figure 4 presents Springer Verlag, (2012). [14] Pedro Pereira Rodrigues and João Gama, ‘A system for analysis and the global schema for a holistic approach to clustering, to be applied prediction of electricity load streams’, Intelligent Data Analysis, 13(3), at each node of a smart grid. The combination of the characteristics 477–496, (June 2009). both algorithms seems not only possible, but extremely relevant as [15] Pedro Pereira Rodrigues, João Gama, João Araújo, and Luı́s Lopes, complementary knowledge discovery in a holistic view of the grid. ‘L2GClust: Local-to-global clustering of stream sources’, in Proceed- ings of ACM SAC 2011, pp. 1011–1016, (March 2011). [16] Pedro Pereira Rodrigues, João Gama, and João Pedro Pedroso, ‘Hier- 4 REMARKS AND FUTURE PATHS archical clustering of time-series data streams’, IEEE Transactions on Knowledge and Data Engineering, 20(5), 615–627, (May 2008). Smart grids are a paradigmatic example of ubiquitous streaming data [17] K. Tanaka, A. Yoza, K. Ogimi, A. Yona, T. Senjyu, T. Funabashi, and sources. Data is produced at high speed, from a dynamic (time- C.-H. Kim, ‘Optimal operation of dc smart house system by control- lable loads based on smart grid topology’, Renewable Energy, 39(1), changing) environment. Meters are geographically distributed, form- 132–139, (2012). ing a network. On top of clustering algorithms, several tasks can 22