=Paper= {{Paper |id=Vol-2322/DARLIAP_11 |storemode=property |title=Transparently Mining Data from a Medium-voltage Distribution Network: A Prognostic-diagnostic Analysis |pdfUrl=https://ceur-ws.org/Vol-2322/DARLIAP_11.pdf |volume=Vol-2322 |authors=Matteo Nisi,Daniela Renga,Daniele Apiletti,Danilo Giordano,Tao Huang,Yang Zhang,Marco Mellia,Elena Baralis |dblpUrl=https://dblp.org/rec/conf/edbt/NisiRAGHZMB19 }} ==Transparently Mining Data from a Medium-voltage Distribution Network: A Prognostic-diagnostic Analysis== https://ceur-ws.org/Vol-2322/DARLIAP_11.pdf
           Transparently mining data from a medium-voltage
          distribution network: a prognostic-diagnostic analysis
            Matteo Nisi                          Daniela Renga                       Daniele Apiletti                 Danilo Giordano
    Department of Electronics              Department of Electronics            Department of Control and         Department of Control and
    and Telecommunications                 and Telecommunications                 Computer Engineering             Computer Engineering
    Politecnico di Torino, Italy           Politecnico di Torino, Italy         Politecnico di Torino, Italy      Politecnico di Torino, Italy
     m.nisi@studenti.polito.it              daniela.renga@polito.it              daniele.apiletti@polito.it       danilo.giordano@.polito.it

            Tao Huang                              Yang Zhang                         Marco Mellia                      Elena Baralis
      Department of Energy                   Department of Energy               Department of Electronics         Department of Control and
    Politecnico di Torino, Italy           Politecnico di Torino, Italy         and Telecommunications             Computer Engineering
       tao.huang@polito.it                   yang.zhang@polito.it               Politecnico di Torino, Italy      Politecnico di Torino, Italy
                                                                                 marco.mellia@polito.it             elena.baralis@polito.it

ABSTRACT                                                                         such dataset, and the capability to model system degradation,
With the shift from the traditional electric grid to the smart grid              are unknown, we address the predictive task by means of an
paradigm, huge amounts of data are collected during system                       exploratory predictive maintenance analysis. To this aim, two
operations. Data analytics become of fundamental importance in                   exploratory approaches are applied: a statistical data character-
power networks to enable predictive maintenance, to perform                      isation approach, and a transparent exhaustive method based
effective diagnosis, and to reduce related expenditures. The final               on association rule mining. The latter, automatically extracts all
goal is to improve the electric service efficiency and reliability to            correlations, above specific thresholds, among SCADA events oc-
the benefit of both the citizens and the grid operators themselves.              curring before each fault of interest (prognostic), and separately,
   This paper considers a dataset collected over 6 years in a real-              after the faults (diagnostic). Quality metrics are exploited to high-
world medium-voltage distribution network by the Supervisory                     light the most meaningful correlations. Finally, human-readable
Control And Data Acquisition (SCADA) system. A transparent,                      patterns describing such correlations are investigated.
exploratory, and exhaustive data-mining approach, based on as-                      To the best of our knowledge, our work is the first study that
sociation rule extraction, is applied to automatically identify                  investigates both the prognostic and diagnostic capabilities of a
correlations among SCADA events occurring before and after                       real-world historical dataset collected by a Supervisory Control
specific service interruptions, i.e., distribution network faults of             and Data Acquisition (SCADA) system in an electric grid, with
interest. Therefore, both the prognostic and the diagnostic poten-               respect to the occurrence of severe service interruptions. Thanks
tials of the dataset are investigated with respect to the occurrence             to the application of an exhaustive analysis methodology, by ex-
of permanent service interruptions. Our results highlight a lim-                 tracting association rules among faults and events, we addressed
ited predictive capability of the available set of SCADA events,                 the issue of providing smart grid operators an assessment of the
while they can be effectively exploited for diagnostic purposes.                 exploitation potential of currently available datasets for predic-
                                                                                 tive maintenance and diagnosis. The proposed methodology can
                                                                                 be applied to similar datasets from any grid operator.
1    INTRODUCTION
Electric grid operators welcome predictive maintenance to avoid
the costs of scheduled inspections and reactive maintenance in-                  2    DATASET
terventions. To this aim, datasets describing the electric grid                  The dataset under analysis contains events recorded by the SCADA
operations, with historical data about failures and alarm signals,               system of a leading Italian grid operator, on its medium-voltage
are under investigation. Although this data has been collected                   distribution network. The dataset is recorded over a period of
for different purposes, companies are interested in determining                  6 years (2010-2016), covering two northeastern Italian regions
their predictive maintenance capability: to reduce management                    (Veneto and Friuli-Venezia-Giulia). The dataset is characterised by
costs, to speed up intervention-time, and to improve efficiency                  3,901 faults of interest, 30 different affected components, 153,094
and reliability.                                                                 general SCADA events of network operations. The SCADA events
For our study, we rely on a big data dataset spanning over 6 years,              are divided into 67 different event types, with the generic fail-
collected by a leading Italian electric grid operator. The dataset de-           ure event type accounting 79,833 events. The faults of interest
scribes the operations of a medium-voltage distribution network                  correspond to those: (i) lasting more than 180 seconds, (ii) with
in northeastern Italy, and it records events and failure through                 the location in the network identified, and (iii) with the cause
the Supervisory Control And Data Acquisition (SCADA) system.                     determined. These events are named Permanent Service Inter-
Our aim is to assess whether this dataset could be exploited to (i)              ruptions (SIPs), tagged with a cause among 45 different reasons
predict future electric network failures (predictive maintenance)                and linked to one among the 30 affected components.
and/or (ii) effectively diagnose the failures after it is reported                  We briefly characterise the dataset by analysing the distribu-
by the maintenance system. Since the predictive capability of                    tion of SIPs causes and types of SCADA events.
© 2019 Copyright held by the author(s). Published in the Workshop Proceedings
                                                                                    Figure 1a reports the probability distribution of the most fre-
of the EDBT/ICDT 2019 Joint Conference (March 26, 2019, Lisbon, Portugal), on    quent causes of SIPs among the 45 available: the top 4 causes
CEUR-WS.org.                                                                     account 75% of the SIPs, with “electric fault” being the most
DARLI-AP 2019, ,                                                                                                                                                                                                                           M. Nisi et al.


frequent cause (45%). More than 20% of SIPs are due to natu-
ral causes, such as: weather issues, plant falls, snow overload,                                                                                                                               1.0                                    PFW - 30d
wind, and animal contact. All these causes are unpredictable with-                                                                                                                             0.9                                    PFW - 7d
out contextual knowledge outside the electrical grid operational                                                                                                                               0.8                                    PFW - 1d
events. Furthermore, another 20% of SIPs are due to unknown                                                                                                                                    0.7                                    AFW - 1h
“other causes” (second most frequent value).                                                                                                                                                   0.6                                    AFW - 1d




                                                                                                                                                                                        CCDF
   Figure 1b reports the probability distribution of the most com-                                                                                                                             0.5                                    AFW - 7d
mon SCADA events types. The distribution is skewed, with about                                                                                                                                 0.4
75% of SCADA events belonging to just 6 different types, and                                                                                                                                   0.3
with the most frequent one with a frequency above 30%.                                                                                                                                         0.2
                                                                                                                                                                                               0.1
                  0.5
                                                                                                                                                                                               0.0
                  0.4                                                                                                                                                                                0    5      10   15     20           25      30
      Frequency




                  0.3                                                                                                                                                                                             SCADA Events
                  0.2
                  0.1
                   0         El       Ot      Pl      Th At      Cu Sn Th       Me W        An Flo Un
                                                                                                                                                                                      Figure 2: Complementary Cumulative Distribution Func-
                               ec        h       a          h                         i
                                   tric er c nt f ird p mo stom ow o ird p cha nd                im
                                                                                                   al
                                                                                                        od ce
                                                                                                               rta                                                                    tion (CCDF) of the number of SCADA events registered
                                       fau aus all       ar      s              a   n
                                                            t fa phe er f verl rt d ical              co ing       in
                                                   es                                                   nta
                                           lt                   ult ric ault oad am       fau
                                                                       ev
                                                                          en
                                                                                     ag
                                                                                        e
                                                                                              lt
                                                                                                            ct
                                                                                                                                                                                      during various lengths of PFWs and AFWs.
                                                                             ts


                                                                Type of SCADA event
                                          (a) Causes of faults (SIPs)                                                                                                                     Figure 2 reports the Complementary Cumulative Distribution
                                                                                                                                                                                      Function (CCDF) of the number of SCADA events registered dur-
                  0.3
      Frequency




                                                                                                                                                                                      ing the PFWs (continuous curves) and the AFWs (dotted curves).
                  0.2
                                                                                                                                                                                      Comparing the CCDFs of PFWs and AFWs, in almost 90% of the
                  0.1
                                                                                                                                                                                      AFWs at least one SCADA event is observed, even within 1 hour;
                   0    Pe
                          ter
                             so
                                Op
                                   en
                                          He
                                              a
                                      ing vy g rven
                                                      Int
                                                          e
                                                                  Op
                                                                       en
                                                                              Op
                                                                                 e
                                                                                         Op
                                                                                             e
                                                                          ing ning ning ning ning
                                                                                                     Op
                                                                                                         e
                                                                                                                Op
                                                                                                                    e
                                                                                                                          Pe
                                                                                                                               rm
                                                                                                                                   an
                                                                                                                                        RT
                                                                                                                                            U
                                                                                                                                              de
                                                                                                                                                                                      instead, in 50% of the 7-day PFWs and in 60% of the 1-day PFWs,
                                nc         of         ro          tio                     of          o          w          of         en         tec
                                               MV und                 nn                     MV f MV ith                        MV t op
                                  oil
                                      int
                                          er
                                             ve
                                                    lin
                                                        ef
                                                               d isp
                                                                          on
                                                                             −r
                                                                                e
                                                                                                   lin
                                                                                                       ef
                                                                                                              lin
                                                                                                                  ef
                                                                                                                       D R A          lin
                                                                                                                                         ef
                                                                                                                                                en
                                                                                                                                                       tv
                                                                                                                                                          olt
                                                                                                                                                    ing age
                                                                                                                                                                                      no SCADA events are registered at all. Furthermore, PFW curves
                                                ntio        or        e rsio     s olv                    or         or        n ot          or         gr         ab
                                                     n         m   ax
                                                                       cu
                                                                             n2
                                                                                 (los
                                                                                       in g
                                                                                                             g ro
                                                                                                                  u n df
                                                                                                                        m ax
                                                                                                                                cu
                                                                                                                                    w  or
                                                                                                                                          k ing
                                                                                                                                                 g ro
                                                                                                                                                      un d2
                                                                                                                                                           ou
                                                                                                                                                              nd
                                                                                                                                                                      se
                                                                                                                                                                  1s ce
                                                                                                                                                                         n            show a more gradual descent with respect to the AFW: SCADA
                                                                          rre         so                                 au        rre                       nd t thr
                                                                              nt
                                                                                 3r
                                                                                    dt
                                                                                          f in
                                                                                               su
                                                                                                  la
                                                                                                                            lt 1
                                                                                                                                 st
                                                                                                                                    t hr
                                                                                                                                        nt
                                                                                                                                           2 nd                 t h re
                                                                                                                                                                      s
                                                                                                                                                                          es
                                                                                                                                                                              ho
                                                                                                                                                                                 ld
                                                                                                                                                                                      events are more likely to follow a SIP rather than preceding the
                                                                                       hr
                                                                                          es         tio
                                                                                                         n)                             es thre                         ho
                                                                                                                                                                           ld
                                                                                                                                                       sh
                                                                                             ho
                                                                                                ld
                                                                                                                                            ho
                                                                                                                                               ld         old                         fault of interest. This data-driven intuition is also confirmed by
                                        (b) Types of SCADA events                                                                                                                     domain knowledge: many types of SCADA events are known to
                                                                                                                                                                                      be triggered by a SIP.
Figure 1: Frequency distribution of the values of (a) causes                                                                                                                              Finally, the 1-hour AFW curve shows a steeper descend than
of faults and (b) types of SCADA events.                                                                                                                                              the longer-lasting AFWs, but with the same starting (leftmost)
                                                                                                                                                                                      values: most SCADA events are typically observed within the
                                                                                                                                                                                      first hour after a SIP, and then few events are collected after 1 or
                                                                                                                                                                                      7 days. On the contrary, the curves of the 7-day AFW and the
3     PROGNOSTIC-DIAGNOSTIC APPROACH                                                                                                                                                  30-day AFW show larger differences, since few events are col-
Since this work aims at investigating both the prognostic and                                                                                                                         lected in the immediately preceding days of a SIP. Most SCADA
diagnostic potential of SCADA events with respect to SIPs, we                                                                                                                         events occurring before a SIP are registered in the previous 1-7
focus on the analysis of those events occurring both before and                                                                                                                       days. Although few additional events are observed considering
after a SIP, in the same portion of the network, under the as-                                                                                                                        a 30-day-PFW, we also note that a higher number of SCADA
sumption that the time and space correlations might capture                                                                                                                           events in the PFW correlates with a higher probability of regis-
causalities of the system.                                                                                                                                                            tering another non-permanent service interruption during the
                                                                                                                                                                                      same PFW (results missing due to space limitations, partially
3.1           Pre-Fault and After-Fault Windows                                                                                                                                       discussed in Section 3.2), so a significant portion of the 30-day-
In the time dimension, we define a time window preceding the                                                                                                                          PFW events could be ideally associated to AFWs of those minor
occurrence of a SIP, denoted as Pre-Fault Window (PFW), and a                                                                                                                         service interruptions.
time window immediately following the SIP, denoted as After-                                                                                                                              All considerations tend to suggest a limited prognostic poten-
Fault Window (AFW). In the space dimension, we consider only                                                                                                                          tial of the SCADA events with respect to SIPs due to fewer events,
SCADA events observed in the same portion of the network                                                                                                                              more time-unrelated, also considering the high variety of SCADA
where the SIP occurs, i.e., reported by the same feeder as origin                                                                                                                     event types. Conversely, the diagnostic exploitation seems better
of the collected data, since according to the domain experts they                                                                                                                     supported by more data, nearer to the event of interest.
are more likely to be correlated to the considered SIP.
   Considering that the grid operator is interested in predicting                                                                                                                     3.2      Inter-Fault Window
future SIPs occurring within the next month at most, the time                                                                                                                         We define Inter-Fault Window the time interval between two
windows are defined with the following variable lengths: 1-7-30                                                                                                                       consecutive faults on the same portion of the network, denoted
days for PFW, and 1 hour, 1 day or 7 days for AFW. These values                                                                                                                       as IFW. The aim of such analysis is to determine how many events
result from wider preliminary analyses, with the aim of capturing                                                                                                                     following a SIP, i.e., in its AFW and inherently diagnostic, are
behaviours of the distribution network at different time scales of                                                                                                                    also included in a PFW before another SIP, thus being modelled
interest for domain experts of the electric grid company.                                                                                                                             also as prognostic features. Both SIPs and other minor Service
Transparently mining data from a medium-voltage distribution network                                                               DARLI-AP 2019, ,

                 1
                                                                                 current work, the attribute is either a SCADA event type, or an
                0.8                                                              alleged cause, or a failed component, and the value is 1 if that
    Frequency




                0.6                                                              attribute is true in the time window under exam (e.g., the SCADA
                0.4                                                              event is present, the component failed, or the specific cause was
                0.2                                           Case A             determined), or 0 otherwise. Note that a SCADA event might
                                                              Case B             represent another SIP or a minor fault occurring before or after
                 0
                                                                                 the analyzed SIP. An itemset I is a set of co-occurring events,
                  0

                      10

                           20

                                30

                                     40

                                          50

                                                60

                                                         70

                                                               80

                                                                       90

                                                                            10
                                                                             0
                                  Time interval [days]                           failed components, and alleged causes among the records r in the
                                                                                 dataset D. Such set of items I in a PFW or, separately, in an AFW
                 Figure 3: IFW lengths of various types of faults.               constitutes the input feature vector of the rule mining extraction.
                                                                                    The support count of an itemset I is the number of records r
                                                                                 containing I . The support s(I ) of an itemset I is the percentage of
Interruptions generate diagnostic SCADA events in their AFWs,                    records r containing I with respect to the total number of records
hence different IFWs can be defined, depending on the type of                    r in the full dataset D. An itemset is frequent when its support is
faults considered (SIPs only or all Service Interruptions). Figure 3             greater than or equal to a minimum support threshold MinSup.
shows the probability distribution of the duration of two types                     Association rule mining aims at identifying collections of item-
of IFWs:                                                                         sets (i.e., sets of co-occurring events) that are frequently present
     • Case A (dotted green curve): IFW between each pair of                     in the dataset under analysis, according to statistically relevant
        consecutive SIPs.                                                        metrics. The extracted rules are all and only those adhering to
     • Case B (continuous red curve): IFW between each regis-                    the thresholds of statistical relevance defined as parameters of
        tered SIP and the immediately preceding Service Interrup-                the mining process, hence being an exhaustive, thus powerful,
        tion of any type (either SIP or not).                                    exploratory approach within the boundaries of the problem for-
   In 80% of cases, the IFW between two consecutive SIPs lasts                   mulation (i.e., itemset definition and threshold settings).
more than 40 days, and there is only a 7% probability that two                      Association rules are usually represented in the form X → Y ,
SIPs are separated by an interval of less than 7 days (Case A).                  where X (rule antecedent) and Y (rule consequent) are disjoint
Hence, with a 7-day PFW, we limit the interference of AFWs                       itemsets (i.e., they include different attributes). To identify the
of other SIPs into the PFW of the current SIP under analysis,                    most meaningful rules among those extracted by the mining
by guaranteeing that prognostic and diagnostic events are kept                   process, quality measures can be exploited as ranking criteria.
separate for different SIPs.                                                     The following popular quality measures are used in the current
   However, in Case B, the duration of the IFW between a SIP                     work: rule support, confidence, and lift. Rule support s(X , Y ) is the
and the immediately preceding Service Interruption lasts up to                   percentage of records containing both X and Y . It represents the
30 days in almost 60% of the cases, with the probability of having               prior probability of X ∪ Y , i.e., the support of the corresponding
an IFW shorter than 7 days risen to 26%, three-fold with respect                 itemset I = X ∪Y in the dataset. Rule confidence is the conditional
to Case A. Hence, there exist SCADA events registered during a                   probability of finding Y given X . It describes the strength of the
                                                                                                                              s(X ∪Y )
PFW preceding a SIP that are generated as a consequence, i.e., in                implication and is given by c(X → Y ) = s(X ) [5].
the AFW, of a previously occurring Service Interruption.                            All and only association rules with support and confidence
                                                                                 above (or equal to) a support threshold MinSup and a confidence
3.3               Challenges                                                     threshold MinCon f are to be extracted. Among those surviv-
From the time-window-based data characterization, the following                  ing the thresholds, a rank based on descending support, con-
takeaways can be identified:                                                     fidence and lift values can drive the attention to focus on the
    • 60% of the SIPs have no SCADA events in their 7-day PFW.                   most statistically-relevant patterns. The lift [5] of a rule X → Y
    • 10% of the SIPs have no SCADA events in their 1-day AFW.                   measures the (symmetric) correlation between antecedent and
    • Most diagnostic events occur in the 1-hour AFW.                            consequent, and it is defined as follows.
    • Many apparently-prognostic events occur more then 1
                                                                                                               c(X → Y ) s(X → Y )
      week before the SIP (PFW), however, they include events                                   lift(X, Y) =            =                           (1)
      generated as a consequence of other minor faults, i.e., they                                                s(Y)    s(X) · s(Y)
      are in the AFW of non-permanent Service Interruptions,                     In Equation (1), c(X → Y ) and s(X → Y ) are the rule confidence
      in 60% of the cases for a 30-day PFW, and in 26% of cases                  and support; s(X ) and s(Y ) are the supports of the rule antecedent
      for a 7-day PFW.                                                           and consequent, respectively. If lift(X ,Y )=1, itemsets X and Y are
                                                                                 not correlated, i.e., they are statistically independent. Lift values
4               RULE MINING                                                      below 1 show a negative correlation between itemsets X and Y ,
To address challenges identified in Section 3.3, we exploited a                  while values above 1 indicate a positive correlation, with higher
transparent, exhaustive and exploratory data mining approach:                    lift indicating stronger rules, hence typically more meaningful
association rule mining. The technique and its evaluation metrics,               and interesting correlations.
as required by the scope of the current work, are defined as
follows.                                                                         4.2    Rule quality analysis
                                                                                 The analysis of the extracted rules has been performed for various
4.1               Association Rule Extraction                                    parameter values. Due to space constraints, we report only the
Let D be a dataset whose generic record r consists of a set of co-               most meaningful results based on the rules obtained by (i) setting
occurring events, i.e., events that occur in the same time window.               MinSup 0.02, then focusing on rules (ii) whose lift is higher than
Each event, also called item, is a couple (attribute, value). In the             1.5, and (iii) having a cause or component as conclusion.
DARLI-AP 2019, ,                                                                                                                                   M. Nisi et al.


   The number of rules resulting from such selection have been re-             6     CONCLUSIONS
ported in Figures 4a-4b for a 7-day PFW. They are scatter-plotted              The work analysed 6 years of data recorded from a medium-
according to support, confidence and lift values. For comparison,              voltage distribution network, with the purpose of estimating
the same results have been reported in Figures 4c-4d for an AFW                both the prognostic and diagnostic potential for severe faults, i.e.,
of 1 day. The diagnostic potential (AFW) is confirmed by a larger              permanent service interruptions. Time-window data characteri-
number of correlations with better quality metrics with respect                sation and exhaustive rule-mining results confirm the capability
to the prognostic capability (PFW):                                            of the collected data to support diagnostic tasks, whereas their
              • 45 rules extracted in the AFW vs 3 in the PFW.                 prognostic potential is limited since only few and poor predictive
              • 50% max rule confidence in AFW vs 25% in PFW.                  correlations are present in the data. Future works include wider
              • 2.73 max lift value in AFW vs 1.9 in PFW.                      analyses of the rules for different thresholds and changes into
              • 8% max support in AFW vs 4.5% in PFW.                          the transactional dataset derived from the raw data to enable the
                                                                               extraction of additional correlations. Finally, further investiga-
Eventually, top rules according to lift, confidence and support                tions of the predictive capability will be performed by testing the
have been inspected by domain experts from the grid company,                   effectiveness of the obtained rules in detecting actual failures.
allowing to transparently evaluate the correlation model and the
prognostic-diagnostic approach.                                                ACKNOWLEDGMENT
                                                                               The research leading to these results has been funded by Enel
                                                                               Italia, e-distribuzione, and the SmartData@PoliTO center for
 Confidence




              0.8                                   0.1
                                        Support




              0.4                                  0.05                        Data Science technologies and applications.
               0                                     0
               0.1
                            -1
                                 1                   0.1                 1     REFERENCES
                         Lift                                   Lift-1          [1] Y. Cai and M. Chow. 2009. Exploratory analysis of massive data for distribution
                                                                                    fault diagnosis in smart grids. In 2009 IEEE Power Energy Society General
    (a) PFW: Confidence vs lift−1                 (b) PFW: Support vs lift−1        Meeting. 1–6.
 Confidence




              0.8                                   0.1                         [2] Q. Cui, K. El-Arroudi, and G. Joos. 2017. An effective feature extraction
                                        Support




              0.4                                  0.05                             method in pattern recognition based high impedance fault detection. In 2017
                                                                                    19th International Conference on Intelligent System Application to Power Systems
               0                                     0                              (ISAP). 1–6.
               0.1               1                   0.1                 1
                            -1                                                  [3] Enrico De Santis, Lorenzo Livi, Alireza Sadeghian, and Antonello Rizzi. 2015.
                         Lift                                   Lift-1              Modeling and Recognition of Smart Grid Faults by a Combined Approach
                                                                                    of Dissimilarity Learning and One-class Classification. Neurocomput. 170, C
    (c) AFW: Confidence vs lift−1                 (d) AFW: Support vs lift−1        (Dec. 2015), 368–383.
                                                                                [4] Huaiguang Jiang, Xiaoxiao Dai, Wenzhong Gao, Jun Zhang, Yingchen Zhang,
Figure 4: Association rules extracted from the 7-day PFW                            and Eduard Muljadi. 2016. Spatial-Temporal Synchrophasor Data Characteri-
                                                                                    zation and Analytics in Smart Grid Fault Detection, Identification and Impact
(a-b) and from the 1-day AFW (c-d), with causes or compo-                           Causal Analysis. IEEE Transactions on Smart Grid 7 (09 2016), 1–1.
nents as conclusion (x-axis in log scale).                                      [5] Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2005. Introduction to
                                                                                    Data Mining. Addison-Wesley.
                                                                                [6] Chunming Tu, Xi He, Zhikang Shuai, and Fei Jiang. 2017. Big data issues in
                                                                                    smart grid – A review. Renewable and Sustainable Energy Reviews 79 (2017),
                                                                                    1099 – 1107.
                                                                                [7] Jian Wang. 2016. Early warning method for transmission line galloping based
5             RELATED WORK                                                          on SVM and AdaBoost bi-level classifiers. IET Generation, Transmission and
                                                                                    Distribution 10 (November 2016), 3499–3507(8). Issue 14.
With the shift from the traditional electric grid to the Smart Grid             [8] Xiaoyu Wang, Stephen McArthur, Scott Strachan, John D. Kirkwood, and
paradigm, data analytics and related applications are becoming                      Bruce Paisley. 2017. A Data Analytic Approach to Automatic Fault Diagnosis
                                                                                    and Prognosis for Distribution Automation. IEEE Transactions on Smart Grid
of fundamental importance in power networks, as shown by the                        PP (05 2017), 1–1. https://doi.org/10.1109/TSG.2017.2707107
several studies available in the literature focusing on this topic              [9] Yang Zhang, Tao Huang, and Ettore Francesco Bompard. 2018. Big data
[6, 9]. However, few research efforts have been specifically de-                    analytics in smart grids: a review. Energy Informatics 1, 1 (2018), 8.
                                                                               [10] Y. Zhang, Y. Xu, Z. Y. Dong, Z. Xu, and K. P. Wong. 2017. Intelligent Early
voted to predictive maintenance. Some studies aim at performing                     Warning of Power System Dynamic Insecurity Risk: Toward Optimal Accuracy-
fault detection in power networks, based on historical weather                      Earliness Tradeoff. IEEE Transactions on Industrial Informatics 13, 5 (Oct 2017),
data mining [7], on extreme learning machine models [10], or                        2544–2554.
on electrical feature extraction techniques [2]. Authors in [4]
deploy an effective method to detect faults in smart grids, trading
off the need for reducing the huge volume of available collected
data, related to the Phasor measurement unit, and the need for
keeping critical information. Other studies aim not only to detect
faults, but also to further characterise them by identifying and
exploiting significant features. Classifiers based on clustering
and dissimilarity learning techniques [3] or on feature extraction
algorithms [1] are used to analyse massive data to perform fault
recognition or distribution fault diagnosis. The deployment of
fault detection methods with prognostic purposes is not well
investigated in the literature. Authors in [8] aim at reducing the
outages in Medium Voltage distribution networks by exploiting
rule-based, data mining and clustering techniques to design a
method providing diagnostic and prognostic functions for Distri-
bution Automation systems.