<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multidimensional Analysis of Aluminum Production Monitoring Data in Basic Operation Modes*</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computational Modeling of the Siberian Branch of the Russian Academy of Sciences</institution>
          ,
          <addr-line>50/44 Akademgorodok, Krasnoyarsk, 660036</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>This paper presents a comprehensive analysis of the technological parameters of aluminum production for three test sites by applying data mining techniques. Based on the principal component analysis and cluster analysis, structural features of the multidimensional monitoring data space were investigated, regularities in the aluminum production complex operation in its basic operating modes were detected, and typical conditions leading to technological disorders were determined. New knowledge and "analytical portraits" of the operation of an aluminum production complex can be used to develop algorithms for the prevention of technological disorders.</p>
      </abstract>
      <kwd-group>
        <kwd>Multidimensional Data Analysis</kwd>
        <kwd>Principal Component Analysis</kwd>
        <kwd>Cluster Analysis</kwd>
        <kwd>Aluminum Production</kwd>
        <kwd>Prevention of Technological Disorders</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        High technical and economic indicators in the aluminum industry are largely
determined by the quality of technology and timely assessment of the technological
state of an aluminum production complex and its separate units: reduction cells,
potrooms and series [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The comprehensive analysis of monitoring data allows one to
explain the reasons of a decrease in productivity and determine conditions that trigger
"disorders" in the technological process. The complexity of the object and data
features (i.e. large number of parameters, high inertia of processes, gaps, and noise in
the data) require applying modern technologies and big data processing methods.
      </p>
      <p>
        This paper presents a study of the features and patterns in the operation of the
aluminum production complex in its basic operating modes based on the data mining
techniques – principal component analysis and cluster analysis – applied to the
monitoring data of the process control system. Data mining techniques provide an
effective tool for discovering previously unknown, nontrivial, useful in practice, and
interpreted knowledge indispensable for decision-making [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
* Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
      </p>
      <p>
        The paper describes the results of a comprehensive multidimensional analysis of
technological parameters in aluminum production for three experimental areas:
Khakas aluminum smelter (KhAS) with RA-300 technology (potrooms No. 9 and 10,
336 retention cells) for the period from 2014 to 2019; Boguchansky aluminum
smelter (BoAS) with RA-300 technology (potrooms No. 1 and 2, 336 retention cells)
for 2019; Bratsk aluminum smelter (BrAS) with Soderberg technology (potroom No,
8, 90 retention cells) for the period from 2015 to 2019. For each experimental area,
the authors investigated the structural features of the multidimensional monitoring
data space, detected regularities in the operation and typical conditions leading to
technological disorders. The analysis results for individual years showed that the data
structure and the nature of the behavior in the studied facilities were the same in all
the periods. Within this research, the analysis and visualization of multidimensional
data were performed using the Pyton and ViDaExpert tools [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Data Description and Preprocessing</title>
      <p>According to the theory of multidimensional data analysis, the original data are
represented as a set of objects and a set of attributes. The set of objects contains
moments in the operation of the retention cells. The set of attributes contains
technological parameters registered by the Automated Process Control System. The
composition of the attributes is determined by the features of the technology and
typical disorders. The key data attributes are listed in Table 1.</p>
      <p>
        The most common technological disorders in the aluminum production process
include the anode effect and distortion of the anode surface relief [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. The anode
effect is a polarization phenomenon characterized by a significant increase in the cell
voltage. Distortion of the anode surface relief can be presented as “cracking”, “corner
shedding” or signs of buildups, such as a “spike” – a formation of regular cylindrical
or conical shape at the anodes; “lagging” – a rectangular protrusion or unevenness at
the base of an anode, occupying up to 50-60% of the anode area; “overglow” – a
formation at any face of the anode block (e.g. “ball”, “mushroom”, “chunk”, etc.).
      </p>
      <p>To prepare the data for analysis, they were subjected to preliminary processing.
The original formats were converted, and data were merged by time. For the
indicators of the chemical composition, gaps in the data were filled using
interpolation with algebraic polynomials. Additional parameters were calculated,
including statistical indicators of technological disorders (Number of "rolled anodes",
Number of anode effects, Number of "spikes", Number of “lagging”, etc.) and
generalized indicators (Consumption of alumina and Consumption of aluminum
fluoride). Finally, entries with the empty parameter values were excluded (appr.
3040% of the entries).</p>
      <p>Velocity ratio, kg/cm
Electrolyte temperature, С
Electrolyte level, cm
Cryolite ratio
CaF2 concentration, %
MgF2 concentration, %
Fe concentration, %
Se concentration, %
Alumina dose, kg
Number of alumina doses, pcs.</p>
      <p>Aluminum fluoride dose, kg
Number of aluminum fluoride
doses, pcs.</p>
      <p>Coefficient of the anode-cathode
distance, mV/s
Cell voltage, V
Back EMF, V
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32</p>
      <p>Service life, month
Anode consumption rate, cm/day
Number of pins not on the horizon, pcs.</p>
      <p>Distance from the anode base, cm
Sintering cone, cm
Leg size, cm
CPC level, cm
CPC temperature, С
Number of "cracks", pcs.</p>
      <p>Number of anode effects, pcs.</p>
      <p>Number of "corner sheddings", pcs.</p>
      <p>Number of "chunks", pcs.</p>
      <p>Number of "rolled anodes", pcs.</p>
      <p>Number of "spikes", pcs.</p>
      <p>Number of "laggings", pcs.</p>
      <p>Additionally, the original data set was subjected to a correlation analysis. The result
demonstrated quite a strong relationship between the following parameters: for KhAZ
and BoAZ: Dose of alumina and Number of alumina doses with r=-0.88, Electrolyte
temperature and Cryolite ratio with r=0.7, Duration of pouring and Velocity ratio
with r=-0.65; for BrAZ: Alumina feed time in automatic mode and Alumina feed time
in manual mode with r=-1.00; Distance from the anode base and Hollow of the anode
with r=0.94, Metal level and Service life with r=0.76, Metal level and Cell voltage
with r=0.71, Composite level and Sintering cone with r=-0.69. The established
dependences are largely explained by the features of technology and nature of
physical processes, which makes it possible to understand the general regularities of
the aluminum production complex.
3</p>
      <p>
        Multidimensional Analysis of Monitoring Data
In order to apply the multidimensional data analysis, each experimental area was
attributed a dataset: KhAZ dataset contains 200,434 objects and 15 attributes; BoAZ
dataset contains 95,497 objects and 15 attributes; BrAZ dataset contains 103,207
objects and 25 attributes. To reduce the dimension of the data space and identify
patterns in the data structure, it required the principal component analysis (PCA) to be
implemented [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. The combination of the Kaiser’s rule and Broken-stick model
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] allowed identifying five principal components (PC1, PC2, PC3, PC4, PC5) for
all the experimental areas. Further analysis and interpretation of the results were
performed in the context of the principal components.
      </p>
      <p>
        To identify the structure and reveal patterns within the data, the cluster analysis
was performed using the density-based spatial clustering algorithm [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. As a result,
for each experimental area, we identified the functioning features of the aluminum
production complex and characteristic conditions triggering technological disorders.
      </p>
      <p>The results of clustering on the PCA plot and in the internal coordinates of the
elastic map, obtained from the data of KhAZ for 2018, are presented in Fig. 1.
The analyzed objects are divided into four clusters: Cluster 1 (blue) – 93% of the
objects, Cluster 2 (red) – 4% of the objects, Cluster 3 (green) – 2% of the objects and
Cluster 4 (yellow) – 1% of the objects. Distinctive features and the nature of each
cluster are determined by the average values of the key parameters of the cluster
objects (Table 2). Cluster 1 describes the basic operating mode of the complex when
the specified technical conditions are maintained, and the parameters have standard
values. This cluster is characterized by high productivity and minimal risk of
technological disorders. Clusters 2, 3 and 4 stand out from Cluster 1 in terms of
changes in the technical conditions. Cluster 2 is characterized by a more frequent
occurrence of the “anode effect” and an increase in the Consumption of alumina while
maintaining a good level of metal production. Cluster 3 is characterized by the
technical conditions with a decrease in the Amperage and Electrolyte level which can
correspond to the expected outages of the cells or occurrence of emergency situations.
Cluster 4 is characterized by the presence of more serious technological disorders –
formation of “spikes” which is accompanied by an increase in the Electrolyte
temperature and a decrease in the Consumption of alumina. It should also be noted
that the Number of “rolled anodes” increases significantly. As a consequence of the
occurring disorders, we observed a decrease in the Metal level.</p>
      <p>In addition, the cluster analysis in the context of the parameters Alumina dose and
Number of alumina doses, allowed us to identify the features associated with the
consumption of alumina in the cells: one part receives alumina less often but in large
doses, another part, on the contrary, receives alumina more often but in small doses.
The study of the occurrence of technological disorders showed that in the second
case, the “spikes” formation occurred less frequently than in the first case. This
suggests that the Consumption of alumina is one of the key factors affecting the
occurrence of this type of disorder.</p>
      <p>The results of clustering on the PCA plot and in the internal coordinates of the
elastic map, obtained from the data of BoAZ for 2019, are presented in Fig. 2. The
distribution of the average values of the key parameters by the clusters is presented in
Table 3.</p>
      <p>The analysed objects are divided into three clusters: Cluster 1 (blue) – 93% of the
objects, Cluster 2 (red) – 4% of the objects and Cluster 3 (green) – 3% of the objects.
Cluster 1 describes the basic operating mode of the complex with high productivity
and minimal risk of technological disorders. Clusters 2 and 3 are characterized by a
more frequent occurrence of technological disorders. Cluster 2 corresponds to the
formation of "lagging", which is accompanied by a decrease in the Electrolyte
temperature and an increase in the Coefficient of the anode-cathode distance. There is
also an increase in the Number of “rolled anodes” and a decrease in the Metal
pouring interval. Cluster 3 corresponds to the conditions with the frequent occurrence
of the "anode effect". At the same time, we observe an increase in the Electrolyte
temperature, a decrease in the Consumption of alumina and a decrease in the
Coefficient of the anode-cathode distance. Technological disorders associated with
the "spike" formation are observed quite rarely and most of them fall into Сluster 3.</p>
      <p>Also, the results of the cluster analysis revealed the features associated with the
age distribution of the cells. The cells are divided into two groups: group 1 – cells
numbered 1001-1084 have a service life of 40-50 months, group 2 – cells numbered
1085-1168 have a service life of 1-9 months. The study of these groups showed that
the age did not significantly affect the occurrence of technological disorders and the
level of productivity.
The results of clustering on the PCA plot and in the internal coordinates of the elastic
map, obtained from the data of BrAZ for 2019, are presented in Fig. 3. The
distribution of the average values of the key parameters by the clusters is shown in
Table 4.</p>
      <p>The analyzed objects are divided into four clusters: Cluster 1 (blue) – 88% of the
objects, Cluster 2 (red) – 11% of the objects, Cluster 3 (yellow) – 0.8% of the objects
and Cluster 4 (green) – 0.2% of the objects. Clusters 1, 2, and 3 are located along the
same axis and differ significantly from Сluster 4. Moving from Cluster 1 to Cluster 3,
there is an increase in the Service life, an increase in the Electrolyte temperature, a
significant decrease in the Consumption of Alumina and a significant increase in the
Metal level. At the same time, Cluster 1 covers most of the objects and presents the
main mode of the complex operation, Cluster 3 covers a small percentage of the
objects with a special mode of operation.</p>
      <p>Metal level
Electrolyte level
Electrolyte temperature
Fe concentration
Si concentration
Number of alumina doses
Consumption of aluminum
fluoride
Service life
Distance from the anode base
Sintering cone
Leg size
CPC level
CPC temperature
Number of “cracks”
Number of anode effects
Number of “corner
sheddings”
Number of “chunks”</p>
      <p>Cluster 1
(blue)
37.334
17.755
953.071
0.238
0.047
2217.663
44.841
27.445
0.00
134.651
18.410
36.381
136.506
3.159
0.754
0.198
Cluster 4 corresponds to events with atypical conditions in the technological process.
The objects of this cluster are characterized by the average values of the Service life
and Consumption of aluminum fluoride and low values of the Metal level. Apart from
this, there is an increase in the Consumption of aluminum fluoride and a significant
increase in the values of the following parameters: Sintering cone, Leg size and
Distance from the anode base. Technological disorders do not form clear-cut clusters
with certain conditions, however, the average values show that Cluster 3 is
characterized by a more frequent occurrence of the Number of "cracks", whereas
Cluster 3 and Cluster 4 – by the occurrence of the "anode effects", and Cluster 1 – by
the occurrence of the "corner sheddings".</p>
      <p>A detailed analysis of the events with technological disorders confirmed the
cluster analysis results, suggesting that the conditions and nature of the occurring
disorders are different. In the case of "laggings" type disorders, the Electrolyte
temperature decreases, the Consumption of alumina increases, while in "spike" type
disorders, on the contrary, the Electrolyte temperature goes up, Consumption of
alumina drops. The formation of "chunks" is usually accompanied by an increase in
the Electrolyte Temperature and Consumption of Aluminum Fluoride, as well as a
decrease in the Consumption of Alumina and Average voltage of anode effect.</p>
      <p>Thus, the study of the structural features of the multidimensional monitoring data
space made it possible to obtain new knowledge on the operation of the complex and
basic regularities, as well as to determine the characteristic conditions for the
occurrence of technological disorders.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>This paper presents a study of features inherent to the aluminum production complex
based on applying multidimensional analysis methods to the monitoring data for three
experimental areas. The preliminary correlation analysis allowed us to establish
relations between the key parameters, determine the strength of their influence on
each other, and figure out the general characteristic patterns. The principal component
analysis and Cluster analysis allowed us to reveal the structural features and
dependences in the multidimensional space of monitoring data.</p>
      <p>The clustering of the KhAZ objects revealed certain technological features
associated with the consumption of alumina in a group of cells, identified conditions
when the energy balance was violated in the post-start-up period and emergency
operation. Also, it determined conditions for the occurrence of typical technological
disorders – the anode effect and formation of "spikes". The clustering of the BoAZ
objects revealed certain technological features associated with the age distribution of
the cells. In addition, it determined conditions for the occurrence of typical
technological disorders in the process – the anode effect and formation of "laggings".
The clustering of the BrAZ objects elucidated the age characteristics of the cells and
the effect of the service life on productivity.</p>
      <p>The results of this study made it possible to confirm many hypotheses of engineers
and to create the so-called "analytical portraits" of how separate units operate in the
aluminum production complex, which can, in turn, serve as a basis for algorithms to
prevent the occurrence and development of technological disorders.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Mikhalev</surname>
            ,
            <given-names>Yu.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polyakov</surname>
            ,
            <given-names>P.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yasinskiy</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shakhray</surname>
            ,
            <given-names>S.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bezrukikh</surname>
            ,
            <given-names>A.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zavadyak</surname>
            ,
            <given-names>A.V.</given-names>
          </string-name>
          :
          <article-title>Anode processes malfunctions. An overview</article-title>
          .
          <source>J. Sib. Fed. Univ. Eng. technol</source>
          .
          <volume>10</volume>
          (
          <issue>5</issue>
          ).
          <fpage>593</fpage>
          -
          <lpage>606</lpage>
          (
          <year>2017</year>
          ). doi:
          <volume>10</volume>
          .17516/1999-494X-2017
          <source>-10-5-593-606</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>G.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simoff</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          :
          <source>Data Mining: Theory</source>
          , Methodology, Techniques, and Applications. Springer (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Penkova</surname>
          </string-name>
          , Т.,
          <string-name>
            <surname>Korobko</surname>
            ,
            <given-names>A.V.</given-names>
          </string-name>
          :
          <article-title>Investigation of hydropower equipment functioning features using data mining techniques</article-title>
          .
          <source>In: Lecture Notes in Computer Science. Part I</source>
          , Vol.
          <volume>11619</volume>
          . pp.
          <fpage>434</fpage>
          -
          <lpage>446</lpage>
          Springer (
          <year>2019</year>
          ). doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -24289-3_
          <fpage>32</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gorban</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pitenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zinovyev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>ViDaExpert: user-friendly tool for nonlinear visualization and analysis of multidimensional vectorial data</article-title>
          . Cornell University Library (
          <year>2014</year>
          ), http://arxiv.org/abs/1406.5550
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Raschka</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Python Machine Learning</article-title>
          . Birmingham, UK: Packt Publishing Ltd (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mikhalev</surname>
            ,
            <given-names>Y.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polyakov</surname>
            ,
            <given-names>P.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yasinskiy</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polyakov</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Spikes generation on anode of aluminium reduction cell</article-title>
          .
          <source>Tsvetnye Metally</source>
          <volume>9</volume>
          .
          <fpage>43</fpage>
          -
          <lpage>48</lpage>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .17580/tsm.
          <year>2018</year>
          .
          <volume>09</volume>
          .06
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Sadler</surname>
            ,
            <given-names>B.A.</given-names>
          </string-name>
          :
          <article-title>Critical issues in anode production and quality to avoid anode performance problems</article-title>
          .
          <source>J. Sib. Fed. Univ. Eng. technol. 5</source>
          (
          <issue>8</issue>
          ).
          <fpage>546</fpage>
          -
          <lpage>568</lpage>
          (
          <year>2015</year>
          ).
          <source>doi: 0.17516/1999-494X2015-8-5-546-568</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Abdi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Principal Components Analysis</article-title>
          .
          <source>Wiley Interdisciplinary Reviews. Computational Statistics</source>
          <volume>2</volume>
          (
          <issue>4</issue>
          ).
          <fpage>439</fpage>
          -
          <lpage>459</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gorban</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zinovyev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Principal manifolds and graphs in practice: from molecular biology to dynamical systems</article-title>
          .
          <source>International Journal of Neural Systems</source>
          .
          <volume>20</volume>
          (
          <issue>3</issue>
          ).
          <fpage>219</fpage>
          -
          <lpage>232</lpage>
          (
          <year>2010</year>
          ). doi:
          <volume>10</volume>
          .1142/S0129065710002383
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Peres-Neto</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jackson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Somers</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>How many principal components? Stopping rules for determining the number of non-trivial axes revisited</article-title>
          .
          <source>Computational Statistics &amp; Data Analysis</source>
          <volume>49</volume>
          (
          <issue>4</issue>
          ).
          <fpage>974</fpage>
          -
          <lpage>997</lpage>
          (
          <year>2005</year>
          ). doi:
          <volume>10</volume>
          .1016/j.csda.
          <year>2004</year>
          .
          <volume>06</volume>
          .015
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>N.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Drab</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daszykowski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Revised DBSCAN algorithm to cluster data with dense adjacent clusters</article-title>
          .
          <source>Chemometrics and Intelligent Laboratory Systems</source>
          .
          <volume>120</volume>
          .
          <fpage>92</fpage>
          -
          <lpage>96</lpage>
          (
          <year>2013</year>
          ). doi:
          <volume>10</volume>
          .1016/j.chemolab.
          <year>2012</year>
          .
          <volume>11</volume>
          .006.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>