=Paper=
{{Paper
|id=Vol-2742/short8
|storemode=property
|title=Interacting with Features: Visual Inspection of Black-box Fault Type Classification Systems in Electrical Grids
|pdfUrl=https://ceur-ws.org/Vol-2742/short8.pdf
|volume=Vol-2742
|authors=Carmelo Ardito,Yashar Deldjoo,Eugenio Di Sciascio,Fatemeh Nazary
|dblpUrl=https://dblp.org/rec/conf/aiia/ArditoDSN20
}}
==Interacting with Features: Visual Inspection of Black-box Fault Type Classification Systems in Electrical Grids==
Interacting with Features: Visual Inspection of Black-box Fault Type Classification Systems in Electrical Grids Carmelo Ardito, Yashar Deldjoo, Eugenio Di Sciascio, and Fatemeh Nazary? Politecnico di Bari, Italy firstname.lastname@poliba.it Abstract. Automatic fault type classification is an important ingredi- ent of smart electrical grids. Similar to other machine-learning models, methods developed for fault classification suffer from the issue of lack of transparency. This work sheds light on preliminary insights of an ongo- ing study, in which we show how feature importance measurement and feature interaction visualization using partial dependence plots (PDPs) can help interpretability of the classification outcomes. While the former, measures the role of each feature on the final predictions in isolation, the latter focuses on mutual interaction between pairs of features. We show the merits of these two complementary feature analysis mechanisms in facilitating interpretability of the fault type classification task. Keywords: Fault type classification · Interpretability · Visualization. 1 Introduction and Context Smart grids (SGs) are recognized as power distribution systems (PDSs) that need to possess traits including high reliability, efficiency, and penetration of renewable energy sources [1]. PDSs, however, are susceptible to a variety of electrical abnormalities and occasional failures, as the result of adverse weather conditions, equipment aging and degradation, security attacks among others. Over the last years, a set of machine-learned approaches have emerged that aim to detect and diagnose fault in a data-driven manner. This capability, known as self healing, is important to make electrical grids reliable and smart. In a nutshell, the goal in self healing is to restore and recover the interruption of electricity in the electrical grid automatically and reduce the interruption period for costumers [7] by performing fault detection, fault type classification and fault location identification. Fault type classification, the task we focus our attention in this work, classifies an occurred electrical fault in the three-phase electrical grid into one of the predefined classes according to (i) symmetrical faults, such as LLL, LLLG, which are related to three-phase faults, and (ii) asymmetrical ? Authors are listed in alphabetical order. Corresponding author: Fatemeh Nazary Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). faults, such as LG, LL, LLG, which show line-to-ground, line-to-line and line- to-line-to-ground faults respectively. A common characteristic of the prior literature is that the nature of the empirical experiments carried out orients toward the prediction aspect of the fault event, aiming to find an answer to questions such as “is it possible to detect a fault using ML techniques reliably”? or “which classification technique can more accurately predict a class type? ” and so forth. Regretfully, such trends for full automation of PDS’s self-healing capability are not designed to inform human operators who have relied on manual/visual awareness for a long time. To keep humans involved in the control loop, it is crucial to design interpretable ML models that can replace these black-box prediction models and to produce rules that can be understood with little inspection. Motivated by this observation, the work at hand puts its attention outside the subject of proposing another classification method for fault prediction, instead it tries to focus on the central question “Given popular classification techniques already recognized by the community, is it possible to exploit the results of pre- dictions in order to obtain more interpretable outcomes? ” The contributions of this work are two-fold: 1. Feature extraction and representation: we rely on features extracted from the three-phase voltage signals, represented in both time and frequency (transform) domains. For feature representation, we compute the n-th mo- ment of the probability distribution functions (PDFs) [11] (n ∈ [1, 4]) to- gether with the energy and max of the signals on both time- and frequency- domain signals. 2. Interpretability: To better facilitate interpretability, we utilize feature im- portance measurement by employing the model-dependent technique based on decision tree [12], and further propose to utilize visual analytic techniques using partial dependence plots (PDPs) [8]. These two complementary visual analysis techniques measure/visualize the individual impact of features and their pairwise relationship on the final classification outcome, thereby help- ing user interpret the results of the classification model at hand. The results of our empirical study show that in general, the computed fea- tures in this work are not only descriminative for our classification scenario, but are also easily interpretable, making the classification process transparent. While previous works have exploited features coming from signal or transform domains [4, 10, 5], our approach for computing n-the PDF moments of the both time and frequency signals, extracts rich information from signals that tend to be mutually complementary in some cases. In fact, by combining feature visual- ization (what is the relationship between features?) with attribution (how does it affect the output?), we can explore how the classifier decides between different fault types. The current work presented in this paper is the preliminary result of a larger ongoing study that makes advances to interpretability of ML models in the context of SGs, providing new insights on how to interpret results of fault prediction by proposing an inexpensive feature extraction, feature selection and visualization technique. IEEE-13 Node test feeder (distribution grid) Faulty zone (FZ): Fault injection 671-680 Measurements of three-phase voltage signals from the FZ Extracting Extracting Signal-level features Transform-domain features Multi-class single label Classification Measurement of features Visualization of features importance interaction impact AG BG CG AB BC AC ABC Fig. 1. The main processing stages in our proposed system 2 Proposed method The goal of the proposed method is two-fold: (i) fault-type classification, and (ii) interpretability achieved via feature importance measurement and data visualiza- tion. The main processing stages involved in the proposed system are presented in Figure 1. The input to the system is the IEEE-13 node test feeder, while the output is one of the seven fault types, namely: line-to-ground (AG, BG,CG), Line-to-Line (AB, AC, BC), and three-phase fault (ABC). 2.1 Fault simulation and Feature Extraction We chose IEEE-13 node test feeder, which includes a voltage generator of 4.16 kvlt and 13 buses for the simulation of fault and measurement of three-phase signals. One can divide this distribution system into four critical zones, zone 1: 632-671, zone 2: 632-633, zone 3: 692-675, and zone 4: 671-680. To collect data, faults were injected to one arbitrarily chosen zone, in this case zone 4, and then features were collected from three-phase voltage signals of this zone. We injected all the 7 different faults (i.e., AG, BG, CG, AB, BC, AC, ABC). These faults have been applied at a certain start time t = 0.01 and revoked at time t = 0.02 for all of the fault simulations. Thus, tf = [0.01 − 0.02] represents the faulty period while th = [0 − 0.01] characterizes the non-faulty (healthy) period. All the features that were extracted were taken from the faulty period tf were normalized by the same feature extracted from the healthy period th to obtain a relative score. The following two classes of features were extracted: – Signal-level features: Six features were extracted from raw voltage data of three phases. They include the 1st to 4-th moments: mean, standard de- viation, skewness, kurtosis together with the energy and the maximum level of the signal. – Transform-domain features: In addition, we extracted features based on discrete Fourier transform (DFT), to obtain richer information about frequency of the signals. After applying DFT, from the computed spectrum we extracted similar features as signal-level features. In total, 12 (6+6) features were collected to represent the features in our labelled training dataset. These two set of features constitute the backbone of many ML systems [3, 2]. To augment the training dataset with further data, the fault resistance value Rf in the fault detection module was varied by choosing 20 different values in the range of 0.001 to 2 as done in previous works [9, 6]. This resulted in 20 simulations for each of the fault types and a training dataset of 140 samples taking into account all the 7 fault types. 2.2 Fault type classification and interpretability analysis Fault type classification was done by using two main classifiers: decision tree and k-nearest neighbors. We model the classification task as a multi-class signal label classification — instead of multi-label — since there are more classifiers’ choices available for the single-label classification task. For interpretability experiment (see next section), we only use decision tree to keep the discussion simple. Finding important variables (features) helps to discover the main drivers in a supervised learning classification task. However, this approach does not produce information about the relationship between input variables and how this relationship impacts the ML model outcome (predictions). The approach envisioned in this work contemplates using: (i) a classical feature importance technique to show the contribution of each feature on predictions individually, and (ii) a partial dependence plot (PDP) to understand the relationship between pairs of input variables and predictions. PDP is calculated after the model is fitted on the training data; thus, it is a model-specific feature importance analysis technique (rather than model-agnostic). For example, in our context a PDP can show whether the probability of certain fault increases with signal energy and kurtosis of the frequency signal, a question whose answer does not seem to be trivial. Furthermore, PDP can establish the type relationship between two features: monotonic, linear, or not related. These are important cues that can help the human operator to better inspect/interpret the black-box fault classification predictions with little supervision. 3 Results and discussions The discussion of results is organized into two sections. First, we describe the results of classification and next, we describe the impact of two feature analysis techniques on the interpretability of classification predictions. Classification: Table 1 summarizes the classification results using two clas- sifiers, namely decision tree and k-nearest neighbors, on the basis of a hold-out setting (80%-20%) for training and test set. We can notice that in all the con- sidered experimental cases the average classification accuracy is more than 92%, indicating the discriminative power of the features chosen. The best classification outcome is achieved for the decision tree with the accuracy of 96.42%. Thus, we use decision tree for the next step. Table 1. Classification accuracy (%) using 12 features and two classifiers. For the k-nearest neighbors, k = 5 was used. Classifier decision tree k-nearest neighbors Accuracy 96.42 92.85 Feature analysis and interpretability: Results of feature importance analysis are shown in Fig. 2. In particular, Fig 2-a shows the impact of indi- vidual features on fault type classification predictions. According to the results, the most informative features are (i) from signal-level features: energy, mean and kurtosis, while (ii) from frequency-level features: energy and mean. Thus, the in- formation that this analysis provides is that both signal-level and frequency-level features can play a role in the classification predictions. Fig 2-b and Fig 2-c however provide a more meticulous interpretation of the results. These plots are results of utilizing the PDP approach (see Section 2.2) and visualize the impact of mutual feature interactions on the classification out- come. We can note that the two selected features (as an example) in Fig 2-b, i.e., mean− dft and energy− sig are NOT mutually informative; in other words, a change in the values of both of these features does not lead to the increase or decrease in the classification outcome. This is equal to say that mean− dft has all the necessary information encoded in the set {mean− dft, energy− sig}. Thus, we can safely use mean− dft for the classification task and expect to ob- tain good classification results. However, as shown in Fig 2-c, for what concerns the interaction between features {mean− dft, kurtosis− sig} a different relation is obtained. We can note that, in this case, both of the features monotonically impact the classification predictions. The highest classification is achieved when feature values are in the bottom-left portion of the figure. We round off this discussion by highlighting that the results of our study show that the information provided by the PDP analysis for the SG fault type classification task offer new insights that could not be obtained from the clas- sical feature importance analysis technique, as shown in Fig 2-a. For example, while Fig 2-a reports on the impact of the 12 employed features as a group, it does not provide specific insights if the same results could be obtained when a smaller set of features are used. We can see that while some pairs of features are mutually complementary such as mean− dft and energy− sig, there exist other kurtosis_sig energy_sig mean_dft mean_dft Low High Low High (a) (b) (c) Fig. 2. Results of feature analysis (a) feature importance scores for 12 features by the decision tree (b-c) PDP interaction plots using two dominant features in part (a). feature pairs that are correlated. This information could eventually be used by the system designer to know (i) which feature(s) to focus on for the extraction phase from the SG signals, (ii) how to represent the feature to obtain more in- formative features (e.g., n-th PDF moment we used), and (iii) by the system human operator to understand the root of specific faults in the system. 4 Conclusion and future work This work presented preliminary results of a large study, in which we focused on the central question of interpretability of ML models in the context of fault prediction for smart grids. First, we classified fault types using two different classifiers, k-nearest neighbors and decision tree, and identified decision tree as the best choice; afterwards, for the interpretability task, we studied the role of two complementary feature analysis techniques, namely feature importance measurement and feature interaction visualization using partial dependence plots (PDPs). We provided insights that can be obtained from the PDP technique on the relationship between features, that could not be found in the classical approach. Our study acknowledges merits of the two complementary feature analysis mechanisms in facilitating offering explanations. For the future work, we plan to extend our dataset by injecting fault to other critical zones, and using a wider set of features. We plan to experiment with larger electrical grids, e.g., IEEE-34, 37 and 123 that are commonly used in the literature [3]. Finally, we consider to study more interpretable models for the core prediction task. Acknowledgments This work has been partially funded by e-distribuzione S.p.A company, Italy, through a PhD scholarship granted to Fatemeh Nazary. References 1. Cremer, J.L., Konstantelos, I., Strbac, G.: From optimization-based machine learn- ing to interpretable security rules for operation. IEEE Transactions on Power Sys- tems 34(5), 3826–3836 (2019) 2. Deldjoo, Y., Schedl, M., Cremonesi, P., Pasi, G.: Content-based multimedia rec- ommendation systems: Definition and application domains. In: Proceedings of the 9th Italian Information Retrieval Workshop, Rome, Italy, May, 28-30, 2018. CEUR Workshop Proceedings, vol. 2140. CEUR-WS.org (2018) 3. Gilanifar, M., Cordova, J., Wang, H., Stifter, M., Ozguven, E.E., Strasser, T.I., Arghandeh, R.: Multi-task logistic low-ranked dirty model for fault detection in power distribution system. IEEE Transactions on Smart Grid 11(1), 786–796 (2019) 4. Jamehbozorg, A., Shahrtash, S.: A decision tree-based method for fault classifi- cation in double-circuit transmission lines. IEEE transactions on power delivery 25(4), 2184–2189 (2010) 5. Kashyap, K.H., Shenoy, U.J.: Classification of power system faults using wavelet transforms and probabilistic neural networks. In: Proceedings of the 2003 Inter- national Symposium on Circuits and Systems, 2003. ISCAS’03. vol. 3, pp. III–III. IEEE (2003) 6. Lwin, M., Min, K.W., Padullaparti, H.V., Santoso, S.: Symmetrical fault detection during power swings: An interpretable supervised learning approach. In: 2017 IEEE Power & Energy Society General Meeting. pp. 1–5. IEEE (2017) 7. Mohammadi-Hosseininejad, S.M., Fereidunian, A., Shahsavari, A., Lesani, H.: A healer reinforcement approach to self-healing in smart grid by phevs parking lot allocation. IEEE Transactions on Industrial Informatics 12(6), 2020–2030 (2016) 8. Molnar, C.: Interpretable Machine Learning. Lulu. com (2020) 9. Onaolapo, A.K., Akindeji, K.T., Adetiba, E.: Simulation experiments for faults location in smart distribution networks using ieee 13 node test feeder and artificial neural network. In: Journal of Physics: Conference Series. vol. 1378, p. 032021. IOP Publishing (2019) 10. Saleh, K.A., Hooshyar, A., El-Saadany, E.F.: Hybrid passive-overcurrent relay for detection of faults in low-voltage dc grids. IEEE Transactions on smart grid 8(3), 1129–1138 (2015) 11. Spanos, A.: Probability Theory and Statistical Inference: Empirical Modeling with Observational Data. Cambridge University Press (2019) 12. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLach- lan, G.J., Ng, A., Liu, B., Philip, S.Y., et al.: Top 10 algorithms in data mining. Knowledge and information systems 14(1), 1–37 (2008)