<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Application of Naive Bayes and Decision Tree in the Prediction of Power Transformers Faults based on DGA</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yassine Mahamdi</string-name>
          <email>yassine.mahamdi@g.enp.edu.dz</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ahmed Boubakeur</string-name>
          <email>ahmed.boubakeur@g.enp.edu.dz</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abdelouahab Mekhaldi</string-name>
          <email>abdelouahab.mekhaldi@g.enp.edu.dz</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>andYoucef Benmahamed</string-name>
          <email>youcef.benmahamed@g.enp.edu.dz</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Power transformers are the basic elements of the power grid, the state of which is directly related to the reliability of the electrical system. Many techniques were used to prevent power transformers failures, but the Dissolved Gas Analysis(DGA) remains the most effective one. Based on the DGA technique, we describe in this paper the use of two of the most effective machine learning algorithms: Naive Bayes (NB) and Decision Tree (DT) to identify power transformers faults. In our investigation, we developed 9 different input vectors from widely known DGA techniques. We used 481 samples and considered 6 types of faults. The implementation of the proposed methods has achieved an effectiveness of 86.25% in power transformers faults diagnosis.</p>
      </abstract>
      <kwd-group>
        <kwd>1 DGA</kwd>
        <kwd>Decision Tree</kwd>
        <kwd>Naive Bayes</kwd>
        <kwd>Input vectors</kwd>
        <kwd>faults diagnosis</kwd>
        <kwd>Accuracy rate</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Dissolved Gas Analysis is the most common and effective method for detecting transformer faults.
It can immediately predict internal transformer failures, which generally avoids huge economic losses.</p>
      <p>
        A transformer in service is exposed to two types of stresses: electrical and thermal [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Due to
these stresses, the transformer oil and paper decompose, releasing a set of gases that reduce their
dielectric strength. The nature and quantity of each dissolved gas produced in transformer oil can
indicate the internal condition of the transformer.
      </p>
      <p>
        The most common gases produced by the decomposition of oil are: ethane (C2H6), ethylene
(C2H4),acetylene (C2H2), methane (CH4) and hydrogen (H2)[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], these differ mainly in the intensity
of the energy which is dissipated by the fault [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In addition tocarbon dioxide (CO2) and carbon
monoxide (CO) that are formed as a result of the decomposition of paper[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], while, the nitrogen (N2)
and the Oxygen (O2) are the non-fault gases.
      </p>
      <p>
        There are many approaches developed for the analysis of dissolved gases in transformer oil and
interpret their meaning including IEC Ratio, DORNENBURG Ratio, Rogers Ratio, Duval Triangle
and Pentagon, and,Key Gas method. However, these techniques have certain limitations such as the
existence of non-decision areas and erroneous results [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. To overcome this situation, several artificial
intelligence techniques have been used to improve the diagnostic accuracy of power transformers,
such as fuzzy logic inference systems [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], artificial neural networks [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], hybrid grey wolf optimization
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], support vector machines and K-nearest neighbors [
        <xref ref-type="bibr" rid="ref8 ref9">8-9</xref>
        ], and have impressive performance
[1012].
      </p>
      <p>In this paper, we examine the use of the Naive Bayes and the Decision Tree algorithms in faults
identification. The originality comes from the introduction of several input vectors formed using
widely known DGA techniques in order to identify the most suitable input data which gives the best
performance of each algorithm and achieves the best prediction of fault in power transformers.</p>
      <p>This article is arranged as follows, in the second section, we describe the collection of DGA data
then the construction of the proposed input space followed by a brief presentation of the two
classification algorithms used; Decision</p>
      <sec id="sec-1-1">
        <title>Tree (DT) and Naïve</title>
      </sec>
      <sec id="sec-1-2">
        <title>Bayes (NB). The results of implementing the two algorithms using our proposed input vectors are discussed in the third section where, the best input vector for each technique has been identified. Finally, the conclusions from this work were summarized and potential future work was mentioned.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
    </sec>
    <sec id="sec-3">
      <title>2.1. Data collection</title>
      <p>
        The construction of our proposed input space needs gas concentration values. For this purpose,
samples of transformer oil are taken periodically to check the gasesformed[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].Generally, mixtures of
all gases are present in an oil sample, where the relative amount of each, could be an indicator of the
existing faults, such as, partial discharges (PD), thermal faults &gt; 700 °C (T3), thermal faults of 300 °C
to 700 °C (T2), thermal faults &lt; 300 °C (T1), high energy discharges (D2) and low energy discharges
(D1)[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        In this work, a database of 481 samples has been used in training and testing the proposed
methods. This database has been extracted from the literature [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].The distribution of the training and
the testing samples according to their fault type is shown in Table 1.
      </p>
    </sec>
    <sec id="sec-4">
      <title>2.2. Proposed Input vectors:</title>
      <p>PD
T3
T2
T1
D2</p>
      <p>D1
Abbreviations</p>
      <sec id="sec-4-1">
        <title>Samples for testing Samples for training 32</title>
        <p>57
32
63
84
53
321</p>
        <p>The following attributes have been considered in the construction of our proposed input
vectors:
1. Using the concentration of the usual five key gases in ppm:
2. Using the ratios between key gases (The IEC Ratios):
16
28
16
32
42
26
160
(1)
(2)
(3)
(4)
3. Using the relative percentages of gases:</p>
      </sec>
      <sec id="sec-4-2">
        <title>4. Using ROGER's four-ratio: 5. Using DORNENBURG's four-ratios:</title>
        <p>X=[H2 , CH4 , C2H2 , C2H4 , C2H6 ]
X=[
H2</p>
        <p>C2H2 ,
C2H4</p>
        <p>C2H4 ]
C2H6
 = [% 2 6, % 2 4, % 2 2, %  4, % 2]

= [ 2 6  2 4  2 2   4</p>
        <p>, , ,
  4  2 6  2 4  2
]</p>
      </sec>
      <sec id="sec-4-3">
        <title>Where And</title>
      </sec>
      <sec id="sec-4-4">
        <title>Where</title>
        <p>And
The ai are calculated by the equations:</p>
        <p>And the bi could be obtained by replacing ‘’cos’’ with ‘’sin’’ in the last equations with α =
2π/3
7. Using Duval’s pentagon coordinates:</p>
        <p>The ai are calculated using the following equations:
6. Using Duval’s triangle coordinates:
 = [  ,   ]
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
 
=
=
3
3
 = [  ,   ]
 
 
=
=
6
6
 −1
 −1
1 ∑ =0 (  +  +1)(    +1−  +1  )</p>
        <p>−1
∑ =0 (    +1−  +1  )
1 ∑ =0 (  +  +1)(    +1−  +1  )</p>
        <p>−1
∑ =0 (    +1−  +1  )
 0 = % 2 cos
 1 = % 2 6 cos
 2 = %  4 cos
 3 = % 2 4 cos
 4 =  2 4 cos

2

2
+ 
+ 2
+ 3
+ 4

2

2

2
2π/5.</p>
        <p>Roger's and DORNENBURG's ratios:</p>
        <p>Also, the bi could be obtained by replacing ‘’cos’’ with ‘’sin’’ in the last equations with α =
8. In this case, a combination of two of the previously mentioned input vectorshas been done,
X=[</p>
        <p>C2H2 ,
C2H4
according to the pentagon one.
2.3.</p>
        <p>AI techniques:</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>2.3.1. Naive Bayes</title>
      <p>9. To further improve fault recognition by expanding the proposed input space , another
combination
was
made in the case of this input vector, Duval’s triangle-pentagon</p>
      <p>= [  1,   1,   2,   2]</p>
      <p>
        Where {Ca1, Cb1} are calculated using the triangle method, while {Ca2, Cb2} are calculated
is given by the following equation [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]:
 ( | ) =
 ( | )× (x)
      </p>
      <p>( )
evidence y and (x) is the prior probability of x.</p>
    </sec>
    <sec id="sec-6">
      <title>2.3.2. Decision tree</title>
      <p>
        The NAIVE BAYES algorithm is a simple probabilistic classifier that uses Bayes theorem,which
Where  ( | )refers to the subsequent possibility of the hypothesis x conditioned by some
The decision tree algorithm is a non-parametric supervised machine learning’s classifier used to
split data into a set of branches. The construction of the tree is conducted from top to bottom in a
recursive divide-and-conquer manner. The Decision Tree classifier training is based on finding the
best split at each node as long as the full data set is not analyzed [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The said principle leads to the
idea of partitioning the feature space until the interrupt criterion is satisfied in each list, or until all
points in a given leaf belong to one class. Figure 1 illustrates the basic structure of a decision tree.
(15)
(16)
( ,  ) is the information gain of set T (training data) on an attribute A and   is a

( ) = − ∑ =1  ( ) log  ( )
(18)
      </p>
      <p>Where  ( ) is the proportion of T belonging to a class i.</p>
    </sec>
    <sec id="sec-7">
      <title>3. Results and discussion:</title>
      <p>To evaluate the performance of Naïve Bayes and Decision tree algorithms using our proposed
input vectors according to six types of transformer faults, a set of 481 samples has been used to train
and test the two methods; 67% of the dataset were used for the training and 33% for the testing, using
the MATLAB software. Table 2 shows the results of the implementation of the two classifiers using
Faults diagnostic results in percent using the Naïve Bayes and the Decision Tree algorithms with all
the proposed input vectors.
was evaluated based on the accuracy of each fault type diagnosis (Figure 2).</p>
    </sec>
    <sec id="sec-8">
      <title>4. Conclusion:</title>
      <p>The Naïve Bayes and the Decision Tree classification algorithms were used to identify power
transformer faults. A dataset of 481 samples was employed and 9 different input vectors were
considered. The Naive Bayes algorithm achieved a diagnostic accuracy of 86.25% when using the 9th
input vector (Duval’s triangle-pentagon coordinates combination), compared to 83.75% in the case of
the Decision Tree using the 4th input vector (ROGER's four-ratio). These diagnostic results show an
improvement in the identification of transformer faults over other traditional DGA methods.
Significant differences in diagnostic accuracy were obtained when using the same classification
algorithm with different input vectors, this investigation shows the appropriate input vector for the
diagnosis of power transformers using the Naive Bayes and the Decision Tree algorithms.</p>
      <p>In a future work, we will extend the proposed input space using other input vectors with an
improved machine learning algorithm.</p>
    </sec>
    <sec id="sec-9">
      <title>5. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] ''Mineral Oil-Filled Electrical Equipment in Service - Guidance on the Interpretation of Dissolved and Free Gases Analysis'', IEC Standard IEC 60599, IEC</article-title>
          , Geneva,
          <source>Switzerland, Edition</source>
          <volume>2</volume>
          .1, May
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Jakob</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Dukarm</surname>
          </string-name>
          , “
          <article-title>Thermodynamic estimation of transformer fault severity”</article-title>
          ,
          <source>IEEE Trans. on Power Delivery</source>
          , vol.
          <volume>30</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>1941</fpage>
          -
          <lpage>1948</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Duval</surname>
          </string-name>
          and A. dePabla, “
          <article-title>Interpretation of gas-in-oil analysis using new IEC publication 60599 and IEC TC 10 databases”</article-title>
          ,
          <source>IEEE Electrical Insulation Magazine</source>
          , vol.
          <volume>17</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>41</lpage>
          , Mar.
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hoballah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Mansour and I. B. M. Taha</surname>
          </string-name>
          , “
          <article-title>Hybrid grey wolf optimizer for transformer fault diagnosis using dissolved gases considering uncertainty in measurements”</article-title>
          ,
          <source>IEEE Access</source>
          , vol.
          <volume>8</volume>
          , pp.
          <fpage>139176</fpage>
          -
          <lpage>139187</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. S. M.</given-names>
            <surname>Ghoneim</surname>
          </string-name>
          and
          <string-name>
            <given-names>I. B. M.</given-names>
            <surname>Taha</surname>
          </string-name>
          , '
          <article-title>'A new approach of DGA interpretation technique for transformer fault diagnosis'',</article-title>
          <string-name>
            <given-names>Int. J.</given-names>
            <surname>Electr</surname>
          </string-name>
          .
          <article-title>Power Energy Syst</article-title>
          ., vol.
          <volume>81</volume>
          , pp.
          <fpage>265</fpage>
          -
          <lpage>274</lpage>
          , Oct.
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Islam</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ledwich</surname>
          </string-name>
          , G. '
          <article-title>'A novel fuzzy logic approach to transformer fault diagnosis''</article-title>
          .
          <source>IEEE Trans. Dielectr. Electr. Insul</source>
          .
          <year>2000</year>
          ,
          <volume>7</volume>
          ,
          <fpage>177</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Souahlia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bacha</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Chaari</surname>
          </string-name>
          , “
          <article-title>MLP neural network-based decision for power transformers fault diagnosis using an improved combination of Rogers and Doernenburg ratios DGA”, Int</article-title>
          .
          <source>Journal of Electrical Power &amp; Energy Systems</source>
          , vol.
          <volume>43</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>1346</fpage>
          -
          <lpage>1353</lpage>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Benmahamed</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Teguar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Boubakeur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          '
          <article-title>'Application of SVM and KNN to Duval Pentagon 1 Transformer Oil Diagnosis''</article-title>
          .
          <source>IEEE Trans. Dielect. Electr. Inst</source>
          .
          <year>2017</year>
          ,
          <volume>24</volume>
          ,
          <fpage>3443</fpage>
          -
          <lpage>3451</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Kherif</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Benmahamed</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Teguar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Boubakeur</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          and Ghoneim,
          <string-name>
            <surname>S. S. M.</surname>
          </string-name>
          <article-title>"AccuracyImprovement of Power Transformer Faults Diagnostic Using KNN Classifier With Decision TreePrinciple,"</article-title>
          <source>in IEEE Access</source>
          , vol.
          <volume>9</volume>
          , pp.
          <fpage>81693</fpage>
          -
          <lpage>81701</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Yang</surname>
          </string-name>
          , M.-T.;
          <string-name>
            <surname>Hu</surname>
          </string-name>
          , L.-S. ''
          <article-title>Intelligent fault types diagnostic system for dissolved gas analysis of oil-immersedpower transformer''</article-title>
          .
          <source>IEEE Trans. Dielectr. Electr. Insul</source>
          .
          <year>2013</year>
          ,
          <volume>20</volume>
          ,
          <fpage>2317</fpage>
          -
          <lpage>2324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J. I.</given-names>
            <surname>Aizpurua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M.</given-names>
            <surname>Catterson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. G.</given-names>
            <surname>Stewart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D. J.</given-names>
            <surname>McArthur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lambert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ampofo</surname>
          </string-name>
          , G.Pereira, and
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Cross</surname>
          </string-name>
          , '
          <article-title>'Power transformer dissolved gas analysis through Bayesian networksand hypothesis testing''</article-title>
          ,
          <source>IEEE Trans. Dielectrics Electr. Insul.</source>
          , vol.
          <volume>25</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>494</fpage>
          -
          <lpage>506</lpage>
          , Apr.
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Mirowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; LeCun, Y. '
          <article-title>'Statistical Machine Learning and Dissolved Gas Analysis: A Review''</article-title>
          .
          <source>IEEE Trans. Power Deliv</source>
          .
          <year>2012</year>
          ,
          <volume>27</volume>
          ,
          <fpage>1791</fpage>
          -
          <lpage>1799</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Taha</surname>
            ,
            <given-names>I.B.M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hoballah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ghoneim</surname>
            ,
            <given-names>S.S.M.</given-names>
          </string-name>
          '
          <article-title>'Optimal ratio limits of Roger's four-ratios and IEC 60599 code methods using particle swarm optimization fuzzy-logic approach''</article-title>
          .
          <source>IEEE Trans. Dielect. Electr. Inst</source>
          .
          <year>2020</year>
          ,
          <volume>27</volume>
          ,
          <fpage>222</fpage>
          -
          <lpage>230</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Dimitoglou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adams</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Jim</surname>
            ,
            <given-names>C. M.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>''Comparison of the C4. 5 and a Naïve Bayes classifier for the prediction of lung cancer survivability''</article-title>
          .
          <source>arXiv preprint arXiv:1206.1121</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[15] John Ross Quinlan, "C4</source>
          .
          <article-title>5: Programs for Machine Learning"</article-title>
          , Morgan Kaufmann Publishers,
          <year>1993</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>