<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>ANOMALY DETECTION AND BREAKDOWN PREDICTION IN RF POWER SOURCE OUTPUT: A REVIEW OF APPROACHES</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Y. Donon</string-name>
          <email>yann.donon@cern.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Kupriyanov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D. Kirsh</string-name>
          <email>kirshdv@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Di Meglio</string-name>
          <email>alberto.di.meglio@cern.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>R. Paringer</string-name>
          <email>rusparinger@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>P. Serafimovich</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>S. Syomic</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CERN</institution>
          ,
          <addr-line>Espl. des Particules 1, 1211 Meyrin, Genève</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Image Processing Systems Institute оf the Russian Academy of sciences, - Branch of the FSRC “Crystallography and Photonics” RAS</institution>
          ,
          <addr-line>Molodogvardeyskaya 151, Samara 443001</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>Moskovskoye shosse 34, Samara 443086</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Yann Donon</institution>
          ,
          <addr-line>Alexander Kupriyanov, Dmitriy Kirsh, Alberto Di Meglio, Rustam Paringer, Pavel Serafimovich, Sergey Syomic</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>4</volume>
      <issue>2019</issue>
      <fpage>99</fpage>
      <lpage>104</lpage>
      <abstract>
        <p>The need for reliable operations of linear accelerators is critical for the spread of this technique in medical environment. At CERN, where LINACs are used for particle research, similar issues are encountered, such as the appearance of jitters in plasma sources (2MHz RF generators), that can have significant impact on the subsequent beam quality in the accelerator. The “SmartLINAC” project was established as an effort to increase LINACs' reliability by means of early anomaly detection and prediction in its operations, down to the component level. The research described in this article reviews the different techniques used to detect anomalies, from their earlier signals, using data from 2MHz RF generators. This research is an important step forward in the SmartLINAC project but represents only its beginning. The authors used four different techniques in an effort to determine the most appropriate one to detect anomalies on the generators' data. The main challenge came from the nature of the data having a noised signal and presenting several kinds of anomalies from different sources, and from the lack of available exhaustive and precise labelling. This research allowed us to understand better the nature of the data we are working with and start addressing the project's objectives, not only identifying and differentiating possible anomalies, but also forecasting to potential breakdowns.</p>
      </abstract>
      <kwd-group>
        <kwd>Anomaly detection</kwd>
        <kwd>time series</kwd>
        <kwd>big data</kwd>
        <kwd>data analysis</kwd>
        <kwd>statistics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>
        In this project, we investigate jitters on LINAC4’s 2MHz RF plasma generators’ forward
power’s history. LINAC4 is a linear accelerator designed to become CERN’s Large Hadron Collider’s
(LHC) source of proton beams after its 2019-2020 shutdown. It is designed to accelerate negative
hydrogen ions to 160 MeV for the LHC’s injection chain [
        <xref ref-type="bibr" rid="ref1">1, 2</xref>
        ]. 2MHz RF sources are used to create
the plasma from which particle are extracted to form the proton beam. This source is one of several
alternatives but is used as a reference in the framework of this specific research. Forward power,
measured in Watts, allows measuring jitters from the source. Those jitters are variations of high
intensity in the periodicity of the signal over a period. They heavily influence the beam quality and
availability. Therefore, periods of jittering should be identified and if possible, predicted in order to
realize preventive maintenance. This paper is based on LINAC4’s functioning, but it is included in a
greater project, SmartLINAC, which aims to create a support platform for medical and scientific linear
accelerators allowing anomaly detection and maintenance planning, powered by artificial intelligence.
Indeed, the need for simpler-to-maintain-and-operate medical LINACs was highly stressed by the
International Cancer Expert Corps (ICEC) and STFC in October 2017 [3]. Nowadays, jitters are first
perceived by their symptoms and are not labelled immediately as jitters, they usually appear after long
period of functioning and their cause is unknown. As such, the first step in SmartLINAC project was
to identify them automatically and so to do to analyze the signal obtained from the RF sources forward
power. Those signals are noised and presents over time a few periods of jitters. These periods
sometimes origins from human manipulations, sometimes from environmental factors. It is the second
category that provokes uncontrolled, long terms jitters. Those noise, human interactions and global
sensitivity of the signals makes challenging to even identify with certainty periods of jitters. In this
paper we present and compare the results of different methods we used to approach the problem of
jitter identification and prediction. One of the key challenge is the relative rarity of those jitters. Indeed
a few periods may appear on a period of several month, or none at all. Furthermore, those appearing
are of various intensity any size. Those elements made modelling anomalies challenging, different big
data specialists participating to the project coordinated themselves to each apply different techniques
they we experienced with in order to identify jittering periods and select the most appropriate
approach during the project.
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Data description</title>
      <p>Several sets of data, presenting different kinds and amounts of anomaly periods have been
used to in the framework of the experiments described in this paper. The series presented about 9
million entries from different RF sources. About 30 jittering periods caused by human manipulations
and 3 periods of jittering as investigated. Those data has been separated for training and test purposes.
In this chapter, the nature of the data will be described using the training set as a reference. Captions
are made every 1.2 seconds, they contain a date and a power in Watts mainly included between 30’000
W and 50’000 W, depending of the current configuration of the source. Some, rare and isolated data
range between 30’000 W and 0 W for unexplained reasons, sometimes, the source was captured as
powered down, registering 0 W.</p>
      <p>Relatively frequently, the source presents some especially violent jittering; those are power
scans, resulting from human manipulations and are referred as such in the present document. Power
scans are intentional change of power in order to observe effects on the source. The prime concern
treated in this article are anomalies appearing overtime and presenting constant and long jitter periods.
3
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Approaches</title>
      <sec id="sec-4-1">
        <title>Filtering, smoothing and distance from average</title>
        <p>In this approach, referred hereafter as “first approach” we chose to first divide the data into
two categories “significant” and “noise”. The data were then smoothed to normalize them over a
period of time. From then, it was possible to highlight the shift from average over time during and
before long jitter periods. Furthermore, it was possible to understand the structures of a jitter based on
the provided samples. Smoothing was used to understand tendencies in data. This step was necessary
to differentiate punctual peaks from increasing tendencies.. Graphically, outside periods of anomalies,
the tendency varies in different shades of dark red and black as represented on Figure 1.On the
opposite, a higher deviation value appears in shades of bright red as represented on Figure 2.</p>
        <p>This approach allowed us to detect jitters periods but not to differentiate efficiently their
nature (systemic or human manipulation). However, this approach showed itself informative in
another way. Indeed, jitters do not appear suddenly but progressively, with symptoms as early as days
before. Short periods of higher power delta are frequent at any time, but their density increase
systematically before periods of jitters. First symptom have been identified by the technique in 1) and
jitters where first observed it 2). In this example, the difference between the early symptoms and the
jitters is of more than two days. A low delta is represented by darker shades of red, the higher the
delta, the brighter the color. The empirical parameters for the estimation of noised areas is that deltas
higher than average by 100% on the last 100 samples, remaining so for at least one hour are
anomalies. This approach proved itself efficient to detect and predict jitters periods.
3.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Label-related clustering</title>
        <p>
          This approach is characterized by the fact that no information on LINAC4 internal
maintenance processes was used (unsupervised), for example, on the possible causes of jitter: human
manipulations or environmental factors. Thus, only contains RF power sources output and four
problematic time intervals that were manually marked by domain expert were used. The method is
based on the search for features that distinguish the marked problem intervals of jitter from the rest of
the data. The feature is a subsequence of data for which the distance to the selected subsequence does
not exceed the threshold determines the “proximity zone”. The Euclidean metric is chosen as a
measure of the distance between the subsequences. Algorithm 1 describes in detail the steps of the
method. Firstly, a subsequence of a given length k is randomly selected. Then, for different values of
the threshold t, the positions of the subsequences are determined, which are close to the selected
subsequence. The resulting set of X is clustered by the kernel density estimation (KDE) method [6, 7].
The Adjusted Rand Index (ARI) [
          <xref ref-type="bibr" rid="ref2">8</xref>
          ] is then calculated between the clustered X set and the labelled
set L. The best ARI value currently is stored together with its corresponding subsequence s and
threshold t. The advantages of this method are the scalability to the number of extracted features, the
ability to use domain experts to refine the results and the possibility of stopping the calculation at any
time, while having the result. Indeed, firstly, it is possible to vary the number of selected subsequences
depending on the specifics of the data. Secondly, domain experts can analyze the correctness of the
selected subsequences. Moreover, the complexity of this analysis is significantly lower than direct data
analysis. But it should clarify important features of the problem under consideration. Thirdly, at any
moment of the method operation there is a subsequence with the best ARI value, which can be
accepted as the result of the method.
3.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Sequence analysis using statistical features</title>
        <p>
          This method consists in processing the sequence by sliding window and calculating the
statistical features for the fragments of the sequence located in this window [
          <xref ref-type="bibr" rid="ref3">9</xref>
          ].The idea behind the
approach is based on the assumption that there are some statistical characteristics allowing predicting
the appearance of abnormal periods in time series (anomalies). The transition between the normal and
abnormal state do not occur instantly, meaning the sequence does not only contain normal and
abnormal intervals, but also transition stages. Meaning the detection of such transition stages can be
used predict anomalies. The exact amount of transition intervals being unknown, clustering algorithms
must be used to determine their number and characteristics. Thus, the problem is reduced to the
division of the initial sequence into N clusters based on the values of statistical features. Processing of
the sequence will be carried out using a sliding window of size L with a shift K. the Features are the
statistical features of the sequences: mean, variance, asymmetry, kurtosis and percentile [
          <xref ref-type="bibr" rid="ref4">10</xref>
          ].
3.4
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>Kalman filter and rolling metrics</title>
        <p>
          In this approach, statistical calculations are based on time series’ statistical metrics. Kalman
filter is usually performant at describing the random structure of experimental measurements [
          <xref ref-type="bibr" rid="ref6">12</xref>
          ].
This filter is able to take into account quantities that may be neglected by other techniques [
          <xref ref-type="bibr" rid="ref7">13</xref>
          ], such
as the variance of the initial state estimation and the model error variance [
          <xref ref-type="bibr" rid="ref8">14</xref>
          ]. It provides information
about the quality of the estimation by representing the estimation error probability. This type of filter
is well applicable to real-time digital processing [
          <xref ref-type="bibr" rid="ref9">15</xref>
          ], because of its recursive structure allowing
execution without storing observations or past estimations [
          <xref ref-type="bibr" rid="ref10">16</xref>
          ].
4
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Review</title>
      <p>All approaches showed themselves able to detect anomalies when they were occurring, each
bringing their own information. 3.1 Filtering, smoothing and distance from average highlighted the
first signals of an anomaly and showed its growths structure. 3.2 Label-related clustering, showed the
possibility to solve the problem using machine learning approach, with an excellent scalability, which
is essential for the adaptability of the solution to our project. 3.3 Sequence analysis using statistical
features highlighted the possible clustering in the data, giving us an opportunity to differentiate in
depth states and stages of anomalies. Finally, 3.4 clarified for us the nature of the noise present in the
data and allowed to differentiate jittering by their origins. Figure shows the jittering period labelling.</p>
      <p>As shown on the figure 3, all techniques developed detects first symptoms of jittering before it
was signaled on the original dataset (labelling represented on the last line of the figure). As developed
in chapter 3.1, the first method presented detects anomalies way ahead of their apparition. The two
methods using noise filtering techniques (3.1 and 3.4) are less prone to punctual false positive
labelling and more importantly the results doesn’t seems to be altered by the absence of this
information, which seems to signify the data removed were indeed non-informative as assumed. Those
characteristics will allow us to develop an adaptable and scalable technique to detect, identify and to
some extents forecast anomalies using machine learning, in order to maximize the adaptability of our
method. The statistical method that will be used is however still to be defined.
5</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>
        The initial approach of this study, to use different approaches and focus on their respective
input showed itself rewarding as in allowed us to discover and understand previously supposed
features in data we initially had very little information about. The core objective of the SmartLINAC
project is to realize predictive maintenance. In other words to predict anomalies. If it has been done
successfully in this project, this application is yet far from what is needed in the project. What we
observed in the framework of this study is the informativeness of one specific data source. This study
should now be adapted for its use in production in LINAC4’s facilities, thus also allowing the testing
of its abilities. If the breakdown forecasts obtained in this study might be interesting at CERN’s
facilities, where maintenance for the instruments we are working with is available day and night with a
Mean Time To Repair rounding under an hour [
        <xref ref-type="bibr" rid="ref11">17</xref>
        ], it is not sufficient for hospitals or radiotherapy
station deployed in countries having a shortage in qualified personal for LINACs maintenance. It will
be necessary in the future to model and study in depth the LINACs environment, in order to discover
not only symptoms of breakdowns but their possible source and patterns. This study is in conclusion a
success in itself, with results beyond our initial objectives and its sets a great kick forward to the
SmartLINAC project.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was made in collaboration with CERN openlab. It was partially financially
supported by the Russian Foundation for Basic Research under grant # 19-29-01135, # 18-37-00418, #
17-01-00972 and by the Ministry of Science and Higher Education within the State assignment to the
FSRC “Crystallography and Photonics” RAS No. 007-GZ/Ch3363/26 (theoretical results).
[2] CERN, "Linear accelerator 4," CERN, [Online].
https://home.cern/science/accelerators/linear-accelerator-4. [Accessed 17 August 2019].
Available:
[3] V. Greco, "A partnership-mentorship approach," in ICEC workshop, CERN, Geneva, 2017.
[4] G. P. M. S. V. K. H. Xiong, " Enhancing data analysis with noise removal," IEEE Transactions
on Knowledge and Data Engineering, vol. 18, no. 3, pp. 304 - 319, 2006.
[5] R. P. Y. G. A. K. Yann Donon, "Key point detection on images: A new polyvalent method," in
ITNT proceedings, Samara, 2019.
[6] M. Rosenblatt, "Remarks on Some Nonparametric Estimates of a Density Function," The Annals
of Mathematical Statistics, vol. 27, no. 3, p. 832–837, 1956.
[7] E. Parzen, "On Estimation of a Probability Density Function and Mode," The Annals of
Mathematical Statistics, vol. 33, no. 3, p. 1065–1076, 1962.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L. t. G.</given-names>
            <surname>Bellodi</surname>
          </string-name>
          ,
          <article-title>"LINAC4 Comissioning status and challenges to nominal operation," in 61st ICFA ABDW on High-Intensity and</article-title>
          <string-name>
            <surname>High-Brightness Hadron</surname>
            <given-names>Beams</given-names>
          </string-name>
          , Daejeon,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L. H. a. P.</given-names>
            <surname>Arabie</surname>
          </string-name>
          ,
          <article-title>"Comparing partitions,"</article-title>
          <source>Journal of Classification</source>
          , vol.
          <volume>2</volume>
          , no.
          <issue>1</issue>
          , p.
          <fpage>193</fpage>
          -
          <lpage>218</lpage>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A. G. P. I. R. M.</given-names>
            <surname>Mayur Datar</surname>
          </string-name>
          ,
          <article-title>"Maintaining Stream Statistics over Sliding Windows,"</article-title>
          <source>SIAM Journal on Computing</source>
          , vol.
          <volume>31</volume>
          , no.
          <issue>6</issue>
          , p.
          <fpage>1794</fpage>
          -
          <lpage>1813</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A. S. B. S.</given-names>
            <surname>Everitt</surname>
          </string-name>
          , The Cambridge Dictionary of Statistics, Cambridge, UK New York: Cambridge University Press,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Salkind</surname>
          </string-name>
          , Encyclopedia of Research Design, Thousand Oaks: Sage,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [12]
          <string-name>
            <surname>R. G. P. Y. H. Brown</surname>
          </string-name>
          ,
          <article-title>Introduction to random signals and applied Kalman filtering</article-title>
          , vol.
          <volume>3</volume>
          , New York: Wiley,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [13]
          <string-name>
            <surname>S. D. Shumway R.H.</surname>
          </string-name>
          ,
          <article-title>Time series analysis and its applications: with R examples</article-title>
          , New York: Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G. M. S.</given-names>
            ,
            <surname>Kalman</surname>
          </string-name>
          <string-name>
            <surname>Filtering</surname>
          </string-name>
          , Heidelberg: Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <article-title>"Fading Kalman filter-based real-time state of charge estimation in LiFePO4 batterypowered electric vehicles,"</article-title>
          <source>Applied Energy</source>
          , no.
          <year>2016</year>
          , pp.
          <fpage>40</fpage>
          -
          <lpage>48</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [16]
          <string-name>
            <surname>K. C.</surname>
          </string-name>
          ,
          <article-title>"Optimization approach to adapt Kalman filters for the real-time application of accelerometer and gyroscope signals' filtering,"</article-title>
          <source>Digital Signal Processing</source>
          , vol.
          <volume>21</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>131</fpage>
          -
          <lpage>140</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [17]
          <string-name>
            <surname>A. A. G. G. B. M. S. S</surname>
          </string-name>
          .-E. a. J.
          <string-name>
            <surname>U. O. Rey Orozco</surname>
          </string-name>
          ,
          <article-title>"Performance evaluation of LINAC 4 during the reliability run,"</article-title>
          <source>in 9th International Particle Accelerator Conference</source>
          , Vancouver,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>