=Paper=
{{Paper
|id=Vol-1558/paper17
|storemode=property
|title=DeMand: A Tool for Evaluating and Comparing Device-Level Demand and Supply Forecast Models
|pdfUrl=https://ceur-ws.org/Vol-1558/paper17.pdf
|volume=Vol-1558
|authors=Bijay Neupane,Laurynas Siksnys,Torben Bach Pedersen
|dblpUrl=https://dblp.org/rec/conf/edbt/NeupaneSP16
}}
==DeMand: A Tool for Evaluating and Comparing Device-Level Demand and Supply Forecast Models==
DeMand: A Tool for Evaluating and Comparing Device-Level Demand and Supply Forecast Models Bijay Neupane Laurynas Šikšnys Torben Bach Pedersen Aalborg University Aalborg University Aalborg University bn21@cs.aau.dk siksnys@cs.aau.dk tbp@cs.aau.dk Motivation Minimize the difference between supply and demand ABSTRACT Total consumption Peak Excess Supply MW Wind energy production Fine-grained device-level predictions of both shiftable and non- shiftable energy demand and supply is vital in order to take advan- tage of Demand Response (DR) for efficient utilization of Renew- able Energy Sources. The selection of an effective device-level load forecast model is a challenging task, mainly due to the diversity of the models and the lack of proper tools and datasets that can be used Time Total consumption to validate them. In this paper, we introduce the DeMand system Reduced Peak and Wind energy production Energy Loss for fine-tuning, analyzing, and validating the device-level forecast MW models. The system offers several built-in device-level measure- ment datasets, forecast models, features, and errors measures, thus semi-automating most of the steps of the forecast model selection and validation process. This paper presents the architecture and data model of the DeMand system; and provides a use-case exam- ple on how one particular forecast model for predicting a device Time Intelligence Figure 1: Balancing thedetection Demand and prediction andofthe energy at the device Supply level RES from 3 state can be analyzed and validated using the DeMand system. 1. INTRODUCTION Renewable Energy Sources (RES) are increasingly becoming an power grid. Therefore, underlying device-level forecast models of important part of the future power grid. However, the dependence both non-shiftable and shiftable energy demands are required for of RES production on weather conditions, such as periods with being able to effectively plan electricity consumption and produc- wind and sunshine, creates challenges related to balancing elec- tion. tricity demand with intermittent RES supply. This makes RES The selection of the best forecast models for a variety of devices, more difficult to integrate to the power grid, compared to traditional data granularity, and forecast horizon is a challenging and resource- (fossil-based) energy sources. intensive task, mainly due to the (1) the diversity of the models, (2) To address the RES integration challenges, there are numerous the lack of proper tools, similar to [10], and (3) the unavailability smart grid projects aiming at the efficient utilization of intermittent of proper datasets that can be used to validate all these models. RES production. For example, the TotalFlex [1] project proposes First, the device-level energy demand highly depends on the de- a Demand Response (DR) technique to actively control electric- vice type and its functionality. For example, heating devices (e.g., ity consumption and production of individual household devices in a heat pump) operate for long durations and the demand for each order to confront the challenges of intermittent RES supply. The timestamp depends on various factors such as climate, temperature, project utilizes shiftable aggregated demands from household de- room size, past demands, etc. On the other hand, electric vehicles vices (e.g., dishwashers, washing machines) to generate a demand need to be charged for a couple of hours and energy demand de- and supply schedules that minimize differences between demand pends on factors such as current charge level, capacity, charging and supply, as shown in Figure 1. In TotalFlex (and many other rate, etc. Further, the energy demand for household devices (e.g., a DR projects), accurate and timely predictions of non-shiftable and dishwasher) depends on the user’s behavior and other external fac- shiftable energy are vital in order to generate effective schedules; tors such as the time of use, duration of use, frequency of use,etc. otherwise forecast errors might lead to high-cost imbalances in the In addition, the amount of energy and the concrete energy profiles depend on the configuration of a device and in some cases are user- specific [7]. Therefore, it is hard to design a single model that considers all these factors and handles the stochasticity associated with device level forecasts. Thus, instead of a single generalizable model, a large variety of models are required to forecast device level demands for different types of household devices. (c) 2016, Copyright is with the authors. Published in the Workshop Pro- Second, the forecast model selection and validation process in- ceedings of the EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bor- cludes a number of steps which are often time-consum- ing (see deaux, France) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted under the terms of the Creative Commons license CC- Figure 2). In this process, researchers have to spend an enormous by-nc-nd 4.0 amount of time, especially, in data preprocessing, i.e., data ex- Interface External Data Feature Parameter Model Existing Data Source Core Logic Component Dataset Preprocessing Generation Selection Execution Features Iterator Data Time‐Series Model Manager Manager Creator Feature Generator Model Acceptable Model Evaluation Executer Reconfiguration No Accuracy User‐defined Final Forecast Features Model Yes Existing Evaluator Figure 2: The process of forecast model selection and validation Evaluator. Device Data Schema Future Demand User‐defined Historical Demand Evaluator Result Schema Result Analyzer Forecast System Figure 4: System Components External Input and User Settings describes the system architecture and functionalities. Section 3 de- scribes its data model. Section 4 gives the use-case example of a Figure 3: Inputs and outputs of the DeMand system system. Finally, Section 5 concludes the paper and provides future work directions. traction, cleaning, transformation, and handling outliers and ob- servation gaps, and feature generation, where a set of features is 2. DeMand SYSTEM OVERVIEW generated repeatedly until a sufficient model accuracy is obtained. In this section, we present the architecture and the function- Here a feature is a variable, derived based on input dataset values ality of the DeMand system. The system is designed as a tool or additional external information, that is assumed to be helpful for to automate experiments on device-level forecasting and to facili- improving forecasting accuracy, e.g., temperature, wind speed, the tate the comparison and re-(evaluation) of the existing experiments. day of the week, potentially, influencing demand and supply. Further, the system provides flexibility in using the available re- Third, we can find a number of works dedicated to forecasting sources or uploading user-defined resources such as datasets, fore- and analyzing device-level demand [3, 9, 2, 7]. However, work cast models, evaluation metrics, etc. The user can define all neces- on device-level forecasting is still limited, because experiments are sary parameters and system configuration using the user interfaces. typically fine-tuned for a particular dataset and usage patterns, and Once the user selects the timeseries and configures the experimen- are hardly reproducible with the reported level of accuracy. Fur- tal setup, the system provides all the available suggestion for the thermore, efficient and precise extraction of all relevant device- experiments such as a list of features, evaluation metrics, etc. Fur- level data is a challenging and still ongoing research [8, 4, 5]. thermore, the DeMand system is also envisioned to provide an open Consequently, there is a lack of proper datasets containing high- repository of device-level datasets that will be accessible to the re- resolution measurements of a large number of devices that include search community for further experiments. Therefore, in addition all the relevant external influences. The experiments are typically to the graphical display in the interface, the experimental results performed with private datasets containing measurements that are and datasets are stored in the system database. collected within the scope of a project and are not freely available. The DeMand system with the most essential components (rect- Even the freely available datasets only include measurements for angles) and their dependencies is shown in Figure 4. Here, the use limited (short) time durations and are often too noisy to perform of independent components for feature extraction, evaluator, and any detailed analysis [6]. Lastly, there are no effective tools de- data management, etc. allows adding and removing system features signed specifically for tuning and validating various device-level in the plug-and-play fashion, making the system highly flexible and demand and supply forecasting models based on real measurements. customizable for specific use-cases. We now present each of these In the light of these challenges, we present the DeMand sys- components individually. tem that allows the user to analyse and validate various forecasting models using a number of provided datasets and built-in or user- 2.1 Interface Component defined functions. The system is designed to automate most of the This components is designed to simplify and speed-up the pro- preprocessing steps. At the same time, it provides flexibility for the cess of setting up an experiment. Specifically, it offers a graphical user (a researcher or energy market player) to use either existing user interface (GUI) and allows selecting a data source, a predictor, system modules or plug-in custom user-defined modules. Figure 3 the values of configuration parameters, features, and error measures shows the inputs and outputs of the DeMand system. The system to use. It also provides visualization of time-series data. The user offers the following features and functionality: i) a repository of can set multiple values for the parameters to submit multiple tasks available device-level datasets for evaluating and comparing fore- in a single execution. It also offers the visual representation of out- cast models, ii) access to existing (standard) forecasting algorithms, comes of the experiment, which plays a significant role in the fore- iii) dynamic generation of features for various forecast horizons cast model analysis. Thus, in the end, the interface plots graphs for and data resolutions, iv) support for various experimental configu- the experimental outcomes, according to the selected settings. For rations and generation of multiple forecast models, v) functionality the multiple sets of parameters, the interface shows detailed plots to compare experiment results, and vi) easy integration of external for each configuration. If the user utilizes all the existing mod- features and learning algorithms. ules and functionality, the interface can reduce model analysis time The remainder of the paper is organized as follows. Section 2 drastically. 2.2 Core Logic Component ture repository. Additionally, the user can define new functions for Core Logic is the central component in the system, and it orches- the specification of custom features. trates data manipulations and data exchange between other compo- nents according to user settings. The Core Logic component in- Script 1: Python code for user defined feature cludes four different sub-components, shown in Figure 4, that au- #INPUT: time series with date column at first tomate the data preprocessing and parameter selection steps, shown #OUTPUT: binary feature representing weekdays in Figure 2. # or weekend for each data point from datetime import datetime as dt Iterator The Iterator sub-component is responsible for parsing def is_weekend(timeseries = None): all the input parameters and determining the number of tasks to be if timeseries == None: executed. Here, a task is an independent execution of a forecast raise TypeError(’Data can not be null’) model with a particular predictor, dataset, parameters, etc. For ex- elif type(timeseries[0]) is not dt.date: ample, if the user has selected two values of the probability thresh- raise TypeError(’First column must be a old, e.g., in a classification task, then the iterator executes the same datetime.date, not a %s’ % type(timeseries[0])) forecast model for each threshold value with the other parameters else: unchanged. feature_series = [] Model Creator The Model Creator sub-component creates an for item in timeseries: object that contains all the required parameters for the tasks. Then, day_of_the_week = item[0].isoweekday() all other (sub-)components fetch the parameters from this object. if day_of_the_week <=5: # not weekend For multiple tasks, the sub-component updates object elements (pa- feature_series.append(0) else: # is weekend rameters) that take more than a single value. The Model Creator feature_series.append(1) sub-component is also responsible for persistent storage of model return feature_series #binary features parameters until the completion of the current task. Time Series Manager The Time Series Manager sub-component For example, Script 1 shows a simple user-defined function given is responsible for the automation of all data preparation and manip- in Python, which, for each data point in the original time series, ulation tasks, such as data aggregation, noise filtration, filling ob- identifies if a specific data point has been recorded during the week- servation gaps, etc. Further, it also acts as a communication bridge end (value 1) or during a weekday (value 0). The component calls between the Data Manager and the Feature Generator components. this user-defined function with the time series dataset as the in- Model Executer The Model Executer sub-component executes put parameter. The function computes the features for each row in the task with the provided feature set and configuration. If the ex- the time series and returns an output. The feature generator cal- perimental configuration already exists in the database, i.e., the ex- culates the size of the output, i.e., the number of attributes, and periment is not unique, the Model Executer terminates the current increases the size of the feature vector by the respective number. task and extracts the prediction results from the database. Further, In some cases, the device-level measurement dataset contains some the Model Executer handles errors in user-defined predictors/classi- sensitive context information, such as location, family size, age, fiers by replacing them with the default predictor; else it terminates group, occupation, etc. Therefore, this information is only accessi- the current task and requests the Iterator sub-component to delete ble through the feature generator and is never revealed to the user. all remaining tasks in the queue. At the end of the execution, it This approach facilitates the user when using sensitive information passes the predicted values to the (model) Evaluator component. as features without requiring to disclose such information. 2.3 Data Manager Component 2.5 Evaluator Component The Data Manager component is responsible for extracting time- The Evaluator component receives as input the output of the pre- series data from a user-defined source, i.e., a database or files in a dictor/classifier and evaluates its performance according to the cho- user-specified location, and feeding extracted data to the System sen error measure and parameters. Further, it writes the experi- Core component. The Data Manager includes a data parser that ment attributes and results to the database in the Result Schema transforms the raw data into the required format. After the data is for future reference and queries. The evaluator also invokes the successfully parsed, it is then persistently stored in the database for functions to plot the graphical representation of the experimental further analysis. This gives the user the comprehensive repository results. The user can select existing error measures (out of many of datasets (e.g., for different device types) for future experiments, available ones), or define custom measures using a Python function, where the validation of forecast models in homogeneous environ- similarly to feature specification. ments becomes possible. Although the database is confined to the energy domain, there are no methods to validate the domain of the 2.6 Result Analyser Component dataset except the required format. Thus, it is possible to upload The Result Analyzer is a component that processes the user queries timeseries from any domain without getting an error. However, a and fetches the requested dataset from the system database. The user can manually validate the timeseries before submitting the task user can write simple SQL queries to extract all the relevant data by using the graphical interface (plot) provided by the system. The and results of an experiment satisfying certain conditions, such as database schemas are described in Section 3. type of predictor, dataset, forecast horizon, etc. Further, the user can write queries to compare experiments using a particular error 2.4 Feature Generator Component measure. For example, let us consider a binary classification task The Feature Generator component is responsible for the genera- where the objective is to predict a device state, i.e., idle (0) or ac- tion of all features chosen by the user. As discussed earlier, features tive (1), at a particular hour in the future. Using the SQL query are variables (e.g., temperature or the day of the week), derived shown in Script 2, a user can query the results of the entire classifi- based on the input dataset values (time series) or external informa- cation tasks based on the logistic regression model, ordered in the tion, specified to, potentially, improve forecasting accuracy. The decreasing order of the Area Under the Curve (AUC) value. Fur- features are generated by using pre-defined functions from the fea- ther, from the list of available results, the user can select up to two Figure 5: The schema of the Device Measurements Database experiments and compare their performance graphically. Finally, if needed, the earlier experiments can be re-executed on a new dataset using the same or new configuration of parameters. Figure 6: The Schema of the Results Database Script 2: Sql Script to extract data of earlier experiments Select a.*, AUC from Experiment a left join (select experimentID, AUC from Result) b fore, in the case of missing information, the tables are filled with on a.experimentID = b.experimentID default values in the mandatory (NOT NULL) fields and generated where predictor = ’Logistic_Regression’ unique values in the primary key fields. Further, even if the user order by AUC desc provides the values for the primary keys, such as, houseID, de- viceID, etc., the system automatically appends unique values to avoid primary key conflicts. At present, the device measurements 3. DATABASE SCHEMAS database contains the energy consumption profiles for 200+ devices The persistent storage of the device measurements and results is from 13 different households, representing 14 different type of de- essential for being able to compare and (re-)evaluate different fore- vices. cast models using multiple datasets and configuration parameters. Since the device level datasets are stored into the MSSQL database, 3.2 Results Database currently we have used the same database system to store the ex- The experiment results and configuration parameter values are perimental results. However, the DeMand system is envisioned to stored in the three tables, shown in Figure 6. support diverse types of data, such as various time series, vary- Here, the Experiment table stores records for each performed ing experiment configurations, results, etc. that cannot be confined experiment, together with all used configuration parameter values. to a strict schema. Therefore, the inclusion of non-relational and However, actual parameters used in an experiment differ based on schema-less data models such as NoSQL, JSON would be a good the type of predictor. Even the same family of predictor might re- choice as an add-on for the next version of the system. In the De- quire different sets of parameters depending on its implementation. Mand system, device measurements and results are separated and For example, L1 regularized logistic regression has an extra penal- stored in the database using two different schemas, presented next. izing parameter λ, unlike its simple implementation. Therefore, we have only a few columns in the Experiment table for the generic pa- 3.1 Device Measurements Database rameters, such as forecast horizon, data granularity, threshold val- The schema of the device measurements database is shown in ues, etc. The remaining parameters are stored in the column oth- Figure 5. The House table stores all the information regarding the erParameters as a list of key-value pairs. Finally, the description owner of the device, and the table Device stores device information. of all the features used in an experiment is stored in the column We consider that households can be represented by a large number featureList. of categories. As it is impractical to have columns for all possi- The Forecast table stores information about the complete time ble categories, we thus have only a few columns for the generic series or its fragment (defined by startDate endDate) used in an categories, and the rest of the details are stored in the otherDe- experiment. It also stores the output of the predictor and the corre- tails column as a list of key-value pairs, such as (regularization, .01 sponding test labels, where the predictor output values are distin- ),(windowSize, 7). The amount of energy depends on the type of the guished from the test data by using binary values in the isForecast device. Therefore, the types of devices are stored in the DeviceType column, i.e., the value of 1 in the isForecast column indicates the table. Further, even similar devices can show significant variation predictor output values (forecasts). in the energy demand due to differences among their models; thus, The Result table stores the values of various predictor perfor- the device models are stored in the DeviceModel table. mance measures. As before, there exist a large number of mea- A new dataset uploaded by the user is stored in the same schema, sures for performance evaluation. Therefore, we have columns for after it is parsed and validated by the Database Manager compo- only a few frequently used performance measures; and the values nent. The requirement for the new dataset to be eligible for storage of new measures are stored in the otherMetrics column as a list of is that it has to have a label for device type and a timestamp value key-value pairs. A single experiment can produce more than one for each data point. Further, we know that for the device-level result depending on the chosen values of a parameter. For example, dataset, additional context information is rarely available. There- recall the classification task where the classifier produces two dif- Figure 9: Feature selection in DeMand Figure 7: The main window of the DeMand system (a) 10 watt (b) 100 watts Figure 10: Precision-Recall curve: for various demand thresholds ing and test sets based on the selected split ratio. Then, we also choose a predictor to be used for classification, which, in our ex- ample, is the Logistic Regression with L1 regularization (LR-L1). Further, we configure an hour-ahead forecast model by selecting Figure 8: Defining external source in DeMand the forecast horizon and data granularity of 1 hour. Afterwards, we select the set of features which we think will help to improve the performance of the classifier. As shown in Figure 9, the system ferent results – one for each probability threshold value. Therefore, automatically provides a list of available features that can be used the Result table can have multiple rows for single experiments. with the selected dataset. In our example, as seen in Figure9, we select all the available features. Further, we choose the precision- 4. DeMand USE-CASE EXAMPLE recall curve as the error measure for evaluating the classifier (see In this section, we briefly walk-through the DeMand functional- Figure 7). ity using the real-world use-case of the device-level forecast model Finally, we execute the experiments from the completely con- analysis. Here, we continue with the binary classification problem figured experiment. The system might still request some addi- of predicting a device state for the next hour. tional parameters specific to the predictor. In the case of user- defined predictors, all additional parameters have to be handled by 4.1 Execution of Forecasting the user. To set the parameter value, a user has to include a call to First, we start by making a choice of the dataset to use for the the add_parameter function with a parameter name and a value experiment. In our case, we select the consumption time series of a as input. In our example, the LR-L1 model requires the values of washer dryer from the existing database. Alternatively, we can also the penalty parameter λ and the probability threshold. After we select a new dataset using the window shown in Figure 8. The sys- provide the values of all required parameters, the system executes tem then automatically plots the selected dataset in the main win- the experiment and stores results in the database along with all fore- dow and also provides some general statistics on the dataset, such casted values. as minimum, average, and maximum demand (see Figure 7). This information is helpful in deciding the threshold value (in watts) for 4.2 Result Presentation the segmentation of data based on a device state (active or ideal). In After the completion of all experiments, the results are presented our example, we have selected two values 10 watts and 100 watts as in the main window, as shown in the Figure 7. The system plots the the threshold (see Figure 7). As result, the system creates two ex- values of the classifiers according to the chosen parameters. In our periments to generate a classification model for each of the thresh- example, the system plots performance values in terms of error, old values. precision, and recall of two classifiers with different threshold watt Next, we select the percentage of time series to use as a test parameter values. The system also provides a detailed description set (20% in our example) using the main DeMand system window, of the results in the main window as a textual description. Further, shown in Figure 7. The timeseries is sequentially split into train- if we click on the individual plots, the system automatically shows Figure 12: Comparison between experiments Figure 11: Result Analyzer Windows of the DeMand system precision-recall curves for each classifier, as seen in Figures 10a 6. ACKNOWLEDGMENTS and 10b. For a detailed comparison of the results, we can also use This work was supported in part by the TotalFlex project spon- the result analyzer, as illustrated in Figure 11. In this example, sored by the ForskEL program of Energinet.dk. we query all experiments that has been performed with the same deviceID and dataGranularity parameter values, sorted based on AUC values. The output of the query can be seen in the lower 7. REFERENCES section of the Figure 11. Here, the experiment at the top of the table [1] The TotalFlex project, 2014. has the best performance in terms of the selected error measure. http://www.totalflex.dk/Forside/. Additionally, we can choose any two experiments for a visual side- [2] A. Alhamoud, P. Xu, F. Englert, A. Reinhardt, P. Scholl, and by-side comparison, as shown in Figure 12. R. Steinmetz. Extracting human behavior patterns from As seen from the use-case example presented above, the De- appliance-level power consumption data. In Wireless Sensor Mand system provides an analytical (decision-support) platform for Networks, Lecture Notes in Computer Science. 2015. comparing, validating, and choosing different forecast models and [3] A. Barbato, A. Capone, M. Rodolfi, and D. Tagliaferri. their parameter values. With the comprehensive model comparison Forecasting the usage of household appliances through power information offered by the DeMand system, the user can decide on meter sensors for demand management in the smart grid. In the best prediction model (algorithm) and select its parameters for a Smart Grid Communications (SmartGridComm), 2011 IEEE specific dataset or the given collection of datasets. As it can be seen International Conference on, pages 404–409, 2011. from this use-case, the DeMand system with all its features and [4] D. Egarter, V. Bhuvana, and W. Elmenreich. Paldi: Online built-in functionality allows significantly reducing time needed to load disaggregation via particle filtering. Instrumentation and select and validate forecast models, compared to using general tools Measurement, IEEE Transactions on, 64(2):467–477, 2015. or hard-coded solutions requiring, typically, much more data pre- [5] S. Gupta, M. S. Reynolds, and S. N. Patel. Electrisense: processing, system configuration, and result post-processing time. Single-point sensing using emi for electrical event detection and classification in the home. In Proceedings of the 12th ACM International Conference on Ubiquitous Computing, 5. CONCLUSION AND FUTURE WORK UbiComp ’10, pages 139–148, 2010. In this paper, we have presented the DeMand system for fine- [6] J. Z. Kolter and M. J. Johnson. REDD: A Public Data Set for tuning, analyzing, and validating the device-level forecast models. Energy Disaggregation Research. In SustKDD workshop, The system offers a number of built-in device-level measurement 2011. datasets, forecast models, features, and errors measures; and al- [7] B. Neupane, T. Pedersen, and B. Thiesson. Towards lows users to evaluate and compare different forecast models based flexibility detection in device-level energy consumption. In on different parameters, making device-level forecasting more ac- Proceedings of the Second ECML/PKDD Workshop, cessible and efficient. DARE’14, pages 1–16. 2014. In this paper, we have presented the architecture and a data model [8] O. Parson, S. Ghosh, M. Weal, and A. Rogers. An of the DeMand system. We also provided the use-case example unsupervised training method for non-intrusive appliance on how a forecast model for predicting a device state can be ana- load monitoring. Artificial Intelligence, pages 1 – 19, 2014. lyzed using the DeMand system. Thus, we showed that DeMand is an easy-to-use system automating most of the steps of the forecast [9] A. Reinhardt, D. Christin, and S. S. Kanhere. Can smart model selection and validation process. plugs predict electric power consumption?: A case study. In In the future, we plan to integrate additional features such as i) an Proceedings of the 11th International Conference on Mobile API-centric architecture for the result analyzer, ii) a model and pa- and Ubiquitous Systems: Computing, Networking and rameters recommender system, iii) a more flexible comprehensive Services, pages 257–266, 2014. data processor, and iv) ensemble learning. We foresee that the full [10] R. Ulbricht, U. Fischer, L. Kegel, D. Habich, H. Donker, and potential of the DeMand system is to be unleashed, if the system W. Lehner. Ecast: A benchmark framework for renewable is used repeatedly, possible by multiple users, allowing to build-up energy forecasting systems. In EDBT/ICDT Workshops, and utilize a large repository of device-level data and predictors. pages 148–155, 2014.