<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Interacting with Features: Visual Inspection of Black-box Fault Type Classi cation Systems in Electrical Grids</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Carmelo Ardito</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yashar Deldjoo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eugenio Di Sciascio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fatemeh Nazary?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Bari</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Automatic fault type classi cation is an important ingredient of smart electrical grids. Similar to other machine-learning models, methods developed for fault classi cation su er from the issue of lack of transparency. This work sheds light on preliminary insights of an ongoing study, in which we show how feature importance measurement and feature interaction visualization using partial dependence plots (PDPs) can help interpretability of the classi cation outcomes. While the former, measures the role of each feature on the nal predictions in isolation, the latter focuses on mutual interaction between pairs of features. We show the merits of these two complementary feature analysis mechanisms in facilitating interpretability of the fault type classi cation task.</p>
      </abstract>
      <kwd-group>
        <kwd>Fault type classi cation Interpretability Visualization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Smart grids (SGs) are recognized as power distribution systems (PDSs) that
need to possess traits including high reliability, e ciency, and penetration of
renewable energy sources [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. PDSs, however, are susceptible to a variety of
electrical abnormalities and occasional failures, as the result of adverse weather
conditions, equipment aging and degradation, security attacks among others.
Over the last years, a set of machine-learned approaches have emerged that aim
to detect and diagnose fault in a data-driven manner. This capability, known
as self healing, is important to make electrical grids reliable and smart. In a
nutshell, the goal in self healing is to restore and recover the interruption of
electricity in the electrical grid automatically and reduce the interruption period
for costumers [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] by performing fault detection, fault type classi cation and fault
location identi cation. Fault type classi cation, the task we focus our attention
in this work, classi es an occurred electrical fault in the three-phase electrical
grid into one of the prede ned classes according to (i) symmetrical faults, such
as LLL, LLLG, which are related to three-phase faults, and (ii) asymmetrical
? Authors are listed in alphabetical order. Corresponding author: Fatemeh Nazary
Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
faults, such as LG, LL, LLG, which show line-to-ground, line-to-line and
lineto-line-to-ground faults respectively.
      </p>
      <p>A common characteristic of the prior literature is that the nature of the
empirical experiments carried out orients toward the prediction aspect of the
fault event, aiming to nd an answer to questions such as \is it possible to
detect a fault using ML techniques reliably "? or \which classi cation technique
can more accurately predict a class type? " and so forth. Regretfully, such trends
for full automation of PDS's self-healing capability are not designed to inform
human operators who have relied on manual/visual awareness for a long time.
To keep humans involved in the control loop, it is crucial to design interpretable
ML models that can replace these black-box prediction models and to produce
rules that can be understood with little inspection.</p>
      <p>Motivated by this observation, the work at hand puts its attention outside the
subject of proposing another classi cation method for fault prediction, instead
it tries to focus on the central question \Given popular classi cation techniques
already recognized by the community, is it possible to exploit the results of
predictions in order to obtain more interpretable outcomes? "</p>
      <p>
        The contributions of this work are two-fold:
1. Feature extraction and representation: we rely on features extracted
from the three-phase voltage signals, represented in both time and frequency
(transform) domains. For feature representation, we compute the n-th
moment of the probability distribution functions (PDFs) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] (n 2 [1; 4])
together with the energy and max of the signals on both time- and
frequencydomain signals.
2. Interpretability: To better facilitate interpretability, we utilize feature
importance measurement by employing the model-dependent technique based
on decision tree [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and further propose to utilize visual analytic techniques
using partial dependence plots (PDPs) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. These two complementary visual
analysis techniques measure/visualize the individual impact of features and
their pairwise relationship on the nal classi cation outcome, thereby
helping user interpret the results of the classi cation model at hand.
      </p>
      <p>
        The results of our empirical study show that in general, the computed
features in this work are not only descriminative for our classi cation scenario,
but are also easily interpretable, making the classi cation process transparent.
While previous works have exploited features coming from signal or transform
domains [
        <xref ref-type="bibr" rid="ref10 ref4 ref5">4, 10, 5</xref>
        ], our approach for computing n-the PDF moments of the both
time and frequency signals, extracts rich information from signals that tend to
be mutually complementary in some cases. In fact, by combining feature
visualization (what is the relationship between features?) with attribution (how does
it a ect the output?), we can explore how the classi er decides between di erent
fault types. The current work presented in this paper is the preliminary result
of a larger ongoing study that makes advances to interpretability of ML models
in the context of SGs, providing new insights on how to interpret results of fault
prediction by proposing an inexpensive feature extraction, feature selection and
visualization technique.
      </p>
      <p>IEEE-13 Node test feeder
(distribution grid)</p>
      <p>Fault injection
Measurements of three-phase
voltage signals from the FZ</p>
      <p>Multi-class single label</p>
      <p>Classification
AG BG CG AB BC AC ABC</p>
      <p>Faulty zone (FZ):</p>
      <p>671-680</p>
      <p>Extracting
Signal-level features</p>
      <p>Extracting</p>
      <p>Transform-domain features
Measurement of features
importance
Visualization of features
interaction impact
The goal of the proposed method is two-fold: (i) fault-type classi cation, and (ii)
interpretability achieved via feature importance measurement and data
visualization. The main processing stages involved in the proposed system are presented
in Figure 1. The input to the system is the IEEE-13 node test feeder, while the
output is one of the seven fault types, namely: line-to-ground (AG, BG,CG),
Line-to-Line (AB, AC, BC), and three-phase fault (ABC).
We chose IEEE-13 node test feeder, which includes a voltage generator of 4:16
kvlt and 13 buses for the simulation of fault and measurement of three-phase
signals. One can divide this distribution system into four critical zones, zone
1: 632-671, zone 2: 632-633, zone 3: 692-675, and zone 4: 671-680. To collect
data, faults were injected to one arbitrarily chosen zone, in this case zone 4, and
then features were collected from three-phase voltage signals of this zone. We
injected all the 7 di erent faults (i.e., AG, BG, CG, AB, BC, AC, ABC). These
faults have been applied at a certain start time t = 0:01 and revoked at time
t = 0:02 for all of the fault simulations. Thus, tf = [0:01 0:02] represents the
faulty period while th = [0 0:01] characterizes the non-faulty (healthy) period.
All the features that were extracted were taken from the faulty period tf were
normalized by the same feature extracted from the healthy period th to obtain
a relative score. The following two classes of features were extracted:
{ Signal-level features: Six features were extracted from raw voltage data
of three phases. They include the 1st to 4-th moments: mean, standard
deviation, skewness, kurtosis together with the energy and the maximum level
of the signal.
{ Transform-domain features: In addition, we extracted features based
on discrete Fourier transform (DFT), to obtain richer information about
frequency of the signals. After applying DFT, from the computed spectrum
we extracted similar features as signal-level features.</p>
      <p>
        In total, 12 (6+6) features were collected to represent the features in our labelled
training dataset. These two set of features constitute the backbone of many
ML systems [
        <xref ref-type="bibr" rid="ref2 ref3">3, 2</xref>
        ]. To augment the training dataset with further data, the fault
resistance value Rf in the fault detection module was varied by choosing 20
di erent values in the range of 0:001 to 2 as done in previous works [
        <xref ref-type="bibr" rid="ref6 ref9">9, 6</xref>
        ]. This
resulted in 20 simulations for each of the fault types and a training dataset of
140 samples taking into account all the 7 fault types.
2.2
      </p>
      <p>Fault type classi cation and interpretability analysis
Fault type classi cation was done by using two main classi ers: decision tree and
k-nearest neighbors. We model the classi cation task as a multi-class signal label
classi cation | instead of multi-label | since there are more classi ers' choices
available for the single-label classi cation task. For interpretability experiment
(see next section), we only use decision tree to keep the discussion simple.</p>
      <p>Finding important variables (features) helps to discover the main drivers
in a supervised learning classi cation task. However, this approach does not
produce information about the relationship between input variables and how
this relationship impacts the ML model outcome (predictions). The approach
envisioned in this work contemplates using: (i) a classical feature importance
technique to show the contribution of each feature on predictions individually,
and (ii) a partial dependence plot (PDP) to understand the relationship between
pairs of input variables and predictions. PDP is calculated after the model is
tted on the training data; thus, it is a model-speci c feature importance analysis
technique (rather than model-agnostic). For example, in our context a PDP
can show whether the probability of certain fault increases with signal energy
and kurtosis of the frequency signal, a question whose answer does not seem
to be trivial. Furthermore, PDP can establish the type relationship between
two features: monotonic, linear, or not related. These are important cues that
can help the human operator to better inspect/interpret the black-box fault
classi cation predictions with little supervision.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Results and discussions</title>
      <p>The discussion of results is organized into two sections. First, we describe the
results of classi cation and next, we describe the impact of two feature analysis
techniques on the interpretability of classi cation predictions.</p>
      <p>Classi cation: Table 1 summarizes the classi cation results using two
classi ers, namely decision tree and k-nearest neighbors, on the basis of a hold-out
setting (80%-20%) for training and test set. We can notice that in all the
considered experimental cases the average classi cation accuracy is more than 92%,
indicating the discriminative power of the features chosen. The best classi cation
outcome is achieved for the decision tree with the accuracy of 96.42%. Thus, we
use decision tree for the next step.</p>
      <p>Feature analysis and interpretability: Results of feature importance
analysis are shown in Fig. 2. In particular, Fig 2-a shows the impact of
individual features on fault type classi cation predictions. According to the results,
the most informative features are (i) from signal-level features : energy, mean and
kurtosis, while (ii) from frequency-level features : energy and mean. Thus, the
information that this analysis provides is that both signal-level and frequency-level
features can play a role in the classi cation predictions.</p>
      <p>Fig 2-b and Fig 2-c however provide a more meticulous interpretation of the
results. These plots are results of utilizing the PDP approach (see Section 2.2)
and visualize the impact of mutual feature interactions on the classi cation
outcome. We can note that the two selected features (as an example) in Fig 2-b,
i.e., mean dft and energy sig are NOT mutually informative; in other words,
a change in the values of both of these features does not lead to the increase
or decrease in the classi cation outcome. This is equal to say that mean dft
has all the necessary information encoded in the set fmean dft, energy sigg.
Thus, we can safely use mean dft for the classi cation task and expect to
obtain good classi cation results. However, as shown in Fig 2-c, for what concerns
the interaction between features fmean dft, kurtosis sigg a di erent relation
is obtained. We can note that, in this case, both of the features monotonically
impact the classi cation predictions. The highest classi cation is achieved when
feature values are in the bottom-left portion of the gure.</p>
      <p>We round o this discussion by highlighting that the results of our study
show that the information provided by the PDP analysis for the SG fault type
classi cation task o er new insights that could not be obtained from the
classical feature importance analysis technique, as shown in Fig 2-a. For example,
while Fig 2-a reports on the impact of the 12 employed features as a group, it
does not provide speci c insights if the same results could be obtained when a
smaller set of features are used. We can see that while some pairs of features are
mutually complementary such as mean dft and energy sig, there exist other
(a)</p>
      <p>Low
mean_dft
(b)
High</p>
      <p>Low</p>
      <p>High
mean_dft
(c)
feature pairs that are correlated. This information could eventually be used by
the system designer to know (i) which feature(s) to focus on for the extraction
phase from the SG signals, (ii) how to represent the feature to obtain more
informative features (e.g., n-th PDF moment we used), and (iii) by the system
human operator to understand the root of speci c faults in the system.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion and future work</title>
      <p>
        This work presented preliminary results of a large study, in which we focused
on the central question of interpretability of ML models in the context of fault
prediction for smart grids. First, we classi ed fault types using two di erent
classi ers, k-nearest neighbors and decision tree, and identi ed decision tree as
the best choice; afterwards, for the interpretability task, we studied the role
of two complementary feature analysis techniques, namely feature importance
measurement and feature interaction visualization using partial dependence plots
(PDPs). We provided insights that can be obtained from the PDP technique
on the relationship between features, that could not be found in the classical
approach. Our study acknowledges merits of the two complementary feature
analysis mechanisms in facilitating o ering explanations. For the future work,
we plan to extend our dataset by injecting fault to other critical zones, and using
a wider set of features. We plan to experiment with larger electrical grids, e.g.,
IEEE-34, 37 and 123 that are commonly used in the literature [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Finally, we
consider to study more interpretable models for the core prediction task.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work has been partially funded by e-distribuzione S.p.A company, Italy,
through a PhD scholarship granted to Fatemeh Nazary.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cremer</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Konstantelos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strbac</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>From optimization-based machine learning to interpretable security rules for operation</article-title>
          .
          <source>IEEE Transactions on Power Systems</source>
          <volume>34</volume>
          (
          <issue>5</issue>
          ),
          <volume>3826</volume>
          {
          <fpage>3836</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Deldjoo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schedl</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cremonesi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasi</surname>
          </string-name>
          , G.:
          <article-title>Content-based multimedia recommendation systems: De nition and application domains</article-title>
          .
          <source>In: Proceedings of the 9th Italian Information Retrieval Workshop</source>
          , Rome, Italy, May,
          <fpage>28</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>2018</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2140</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gilanifar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cordova</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stifter</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozguven</surname>
            ,
            <given-names>E.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strasser</surname>
            ,
            <given-names>T.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arghandeh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>: Multi-task logistic low-ranked dirty model for fault detection in power distribution system</article-title>
          .
          <source>IEEE Transactions on Smart Grid</source>
          <volume>11</volume>
          (
          <issue>1</issue>
          ),
          <volume>786</volume>
          {
          <fpage>796</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Jamehbozorg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shahrtash</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A decision tree-based method for fault classi - cation in double-circuit transmission lines</article-title>
          .
          <source>IEEE transactions on power delivery 25(4)</source>
          ,
          <volume>2184</volume>
          {
          <fpage>2189</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kashyap</surname>
            ,
            <given-names>K.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shenoy</surname>
          </string-name>
          , U.J.:
          <article-title>Classi cation of power system faults using wavelet transforms and probabilistic neural networks</article-title>
          .
          <source>In: Proceedings of the 2003 International Symposium on Circuits and Systems</source>
          ,
          <year>2003</year>
          . ISCAS'
          <volume>03</volume>
          . vol.
          <volume>3</volume>
          , pp.
          <source>III{III. IEEE</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lwin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Min</surname>
            ,
            <given-names>K.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Padullaparti</surname>
            ,
            <given-names>H.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santoso</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Symmetrical fault detection during power swings: An interpretable supervised learning approach</article-title>
          .
          <source>In: 2017 IEEE Power &amp; Energy Society General Meeting</source>
          . pp.
          <volume>1</volume>
          {
          <issue>5</issue>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mohammadi-Hosseininejad</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fereidunian</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shahsavari</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lesani</surname>
          </string-name>
          , H.:
          <article-title>A healer reinforcement approach to self-healing in smart grid by phevs parking lot allocation</article-title>
          .
          <source>IEEE Transactions on Industrial Informatics</source>
          <volume>12</volume>
          (
          <issue>6</issue>
          ),
          <year>2020</year>
          {
          <year>2030</year>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Molnar</surname>
            ,
            <given-names>C.:</given-names>
          </string-name>
          <article-title>Interpretable Machine Learning</article-title>
          . Lulu. com (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Onaolapo</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akindeji</surname>
            ,
            <given-names>K.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adetiba</surname>
          </string-name>
          , E.:
          <article-title>Simulation experiments for faults location in smart distribution networks using ieee 13 node test feeder and arti cial neural network</article-title>
          .
          <source>In: Journal of Physics: Conference Series</source>
          . vol.
          <volume>1378</volume>
          , p.
          <fpage>032021</fpage>
          . IOP Publishing (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Saleh</surname>
            ,
            <given-names>K.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hooshyar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>El-Saadany</surname>
            ,
            <given-names>E.F.</given-names>
          </string-name>
          :
          <article-title>Hybrid passive-overcurrent relay for detection of faults in low-voltage dc grids</article-title>
          .
          <source>IEEE Transactions on smart grid 8</source>
          (
          <issue>3</issue>
          ),
          <volume>1129</volume>
          {
          <fpage>1138</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Spanos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Probability Theory and Statistical Inference: Empirical Modeling with Observational Data</article-title>
          . Cambridge University Press (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quinlan</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motoda</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McLachlan</surname>
            ,
            <given-names>G.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Philip</surname>
            ,
            <given-names>S.Y.</given-names>
          </string-name>
          , et al.:
          <article-title>Top 10 algorithms in data mining</article-title>
          .
          <source>Knowledge and information systems 14(1)</source>
          ,
          <volume>1</volume>
          {
          <fpage>37</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>