Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019) Budva, Becici, Montenegro, September 30 – October 4, 2019 ANOMALY DETECTION AND BREAKDOWN PREDICTION IN RF POWER SOURCE OUTPUT: A REVIEW OF APPROACHES Y. Donon 1,2,3 *, A. Kupriyanov1,3, D. Kirs h1,3, A. Di Meglio2 , R. Paringer1,3 , P. Serafimovich3 , S. Syomic1 1 Samara National Research University, Moskovskoye shosse 34, Samara 443086, Russia 2 CERN, Espl. des Particules 1, 1211 Meyrin, Genève, Switzerland 3 Image Processing Systems Institute оf the Russian Academy of sciences, – Branch of the FSRC “Crystallography and Photonics” RAS, Molodogvardeyskaya 151, Samara 443001, Russia E-mail: a yann.donon@cern.ch, b akupr@ssau.ru, c kirshdv@gmail.com, d alberto.di.meglio@cern.ch, e rusparinger@gmail.com, e serp@smr.ru, e ssyomik@ya.ru The need for reliable operations of linear accelerators is critical for the spread of this technique in medical environment. At CERN, where LINACs are used for particle research, similar issues are encountered, such as the appearance of jitters in plasma sources (2MHz RF generators), that can have significant impact on the subsequent beam quality in the accelerator. The “SmartLINAC” project was established as an effort to increase LINACs’ reliability by means of early anomaly detection and prediction in its operations, down to the component level. The research described in this article reviews the different techniques used to detect anomalies, from their earlier signals, using data from 2MHz RF generators. This research is an important step forward in the SmartLINAC project but represents only its beginning. The authors used four different techniques in an effort to determine the most appropriate one to detect anomalies on the generators’ data. The main challenge came from the nature of the data having a noised signal and presenting several kinds of anomalies from different sources, and from the lack of available exhaustive and precise labelling. This research allowed us to understand better the nature of the data we are working with and start addressing the project’s objectives, not only identifying and differentiating possible anomalies, but also forecasting to potential breakdowns. Keywords: Anomaly detection, time series, big data, data analysis, statistics Yann Donon, Alexander Kupriyanov, Dmitriy Kirsh, Alberto Di Meglio, Rustam Paringer, Pavel Serafimovich, Sergey Syomic Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 99 Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019) Budva, Becici, Montenegro, September 30 – October 4, 2019 1 Introduction In this project, we investigate jitters on LINAC4’s 2MHz RF plasma generators’ forward power’s history. LINAC4 is a linear accelerator designed to become CERN’s Large Hadron Collider’s (LHC) source of proton beams after its 2019-2020 shutdown. It is designed to accelerate negative hydrogen ions to 160 MeV for the LHC’s injection chain [1, 2]. 2MHz RF sources are used to create the plasma from which particle are extracted to form the proton beam. This source is one of several alternatives but is used as a reference in the framework of this specific research. Forward power, measured in Watts, allows measuring jitters from the source. Those jitters are variations of high intensity in the periodicity of the signal over a period. They heavily influence the beam quality and availability. Therefore, periods of jittering should be identified and if possible, predicted in order to realize preventive maintenance. This paper is based on LINAC4’s functioning, but it is included in a greater project, SmartLINAC, which aims to create a support platform for medical and scientific linear accelerators allowing anomaly detection and maintenance planning, powered by artificial intelligence. Indeed, the need for simpler-to-maintain-and-operate medical LINACs was highly stressed by the International Cancer Expert Corps (ICEC) and STFC in October 2017 [3]. Nowadays, jitters are first perceived by their symptoms and are not labelled immediately as jitters, they usually appear after long period of functioning and their cause is unknown. As such, the first step in SmartLINAC project was to identify them automatically and so to do to analyze the signal obtained from the RF sources forward power. Those signals are noised and presents over time a few periods of jitters. These periods sometimes origins from human manipulations, sometimes from environmental factors. It is the second category that provokes uncontrolled, long terms jitters. Those noise, human interactions and global sensitivity of the signals makes challenging to even identify with certainty periods of jitters. In this paper we present and compare the results of different methods we used to approach the problem of jitter identification and prediction. One of the key challenge is the relative rarity of those jitters. Indeed a few periods may appear on a period of several month, or none at all. Furthermore, those appearing are of various intensity any size. Those elements made modelling anomalies challenging, different big data specialists participating to the project coordinated themselves to each apply different techniques they we experienced with in order to identify jittering periods and select the most appropriate approach during the project. 2 Data description Several sets of data, presenting different kinds and amounts of anomaly periods have been used to in the framework of the experiments described in this paper. The series presented about 9 million entries from different RF sources. About 30 jittering periods caused by human manipulations and 3 periods of jittering as investigated. Those data has been separated for training and test purposes. In this chapter, the nature of the data will be described using the training set as a reference. Captions are made every 1.2 seconds, they contain a date and a power in Watts mainly included between 30’000 W and 50’000 W, depending of the current configuration of the source. Some, rare and isolated data range between 30’000 W and 0 W for unexplained reasons, sometimes, the source was captured as powered down, registering 0 W. Relatively frequently, the source presents some especially violent jittering; those are power scans, resulting from human manipulations and are referred as such in the present document. Power scans are intentional change of power in order to observe effects on the source. The prime concern treated in this article are anomalies appearing overtime and presenting constant and long jitter periods. 100 Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019) Budva, Becici, Montenegro, September 30 – October 4, 2019 3 Approaches 3.1 Filtering, smoothing and distance from average In this approach, referred hereafter as “first approach” we chose to first divide the data into two categories “significant” and “noise”. The data were then smoothed to normalize them over a period of time. From then, it was possible to highlight the shift from average over time during and before long jitter periods. Furthermore, it was possible to understand the structures of a jitter based on the provided samples. Smoothing was used to understand tendencies in data. This step was necessary to differentiate punctual peaks from increasing tendencies.. Graphically, outside periods of anomalies, the tendency varies in different shades of dark red and black as represented on Figure 1.On the opposite, a higher deviation value appears in shades of bright red as represented on Figure 2. Figure 1 Fragment without anomaly, the shades are Figure 2 Fragment representing an anomaly period, between dark red and black, which signifies a low the shades are between dark red and black, which delta from the average distance between points signifies a high delta from the average distance between points This approach allowed us to detect jitters periods but not to differentiate efficiently their nature (systemic or human manipulation). However, this approach showed itself informative in another way. Indeed, jitters do not appear suddenly but progressively, with symptoms as early as days before. Short periods of higher power delta are frequent at any time, but their density increase systematically before periods of jitters. First symptom have been identified by the technique in 1) and jitters where first observed it 2). In this example, the difference between the early symptoms and the jitters is of more than two days. A low delta is represented by darker shades of red, the higher the delta, the brighter the color. The empirical parameters for the estimation of noised areas is that deltas higher than average by 100% on the last 100 samples, remaining so for at least one hour are anomalies. This approach proved itself efficient to detect and predict jitters periods. 3.2 Label-related clustering This approach is characterized by the fact that no information on LINAC4 internal maintenance processes was used (unsupervised), for example, on the possible causes of jitter: human manipulations or environmental factors. Thus, only contains RF power sources output and four problematic time intervals that were manually marked by domain expert were used. The method is based on the search for features that distinguish the marked problem intervals of jitter from the rest of the data. The feature is a subsequence of data for which the distance to the selected subsequence does not exceed the threshold determines the “proximity zone”. The Euclidean metric is chosen as a measure of the distance between the subsequences. Algorithm 1 describes in detail the steps of the method. Firstly, a subsequence of a given length k is randomly selected. Then, for different values of the threshold t, the positions of the subsequences are determined, which are close to the selected subsequence. The resulting set of X is clustered by the kernel density estimation (KDE) method [6, 7]. The Adjusted Rand Index (ARI) [8] is then calculated between the clustered X set and the labelled set L. The best ARI value currently is stored together with its corresponding subsequence s and threshold t. The advantages of this method are the scalability to the number of extracted features, the ability to use domain experts to refine the results and the possibility of stopping the calculation at any time, while having the result. Indeed, firstly, it is possible to vary the number of selected subsequences 101 Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019) Budva, Becici, Montenegro, September 30 – October 4, 2019 depending on the specifics of the data. Secondly, domain experts can analyze the correctness of the selected subsequences. Moreover, the complexity of this analysis is significantly lower than direct data analysis. But it should clarify important features of the problem under consideration. Thirdly, at any moment of the method operation there is a subsequence with the best ARI value, which can be accepted as the result of the method. 3.3 Sequence analysis using statistical features This method consists in processing the sequence by sliding window and calculating the statistical features for the fragments of the sequence located in this window [9].The idea behind the approach is based on the assumption that there are some statistical characteristics allowing predicting the appearance of abnormal periods in time series (anomalies). The transition between the normal and abnormal state do not occur instantly, meaning the sequence does not only contain normal and abnormal intervals, but also transition stages. Meaning the detection of such transition stages can be used predict anomalies. The exact amount of transition intervals being unknown, clustering algorithms must be used to determine their number and characteristics. Thus, the problem is reduced to the division of the initial sequence into N clusters based on the values of statistical features. Processing of the sequence will be carried out using a sliding window of size L with a shift K. the Features are the statistical features of the sequences: mean, variance, asymmetry, kurtosis and percentile [10]. 3.4 Kalman filter and rolling metrics In this approach, statistical calculations are based on time series’ statistical metrics. Kalman filter is usually performant at describing the random structure of experimental measurements [12]. This filter is able to take into account quantities that may be neglected by other techniques [13], such as the variance of the initial state estimation and the model error variance [14]. It provides information about the quality of the estimation by representing the estimation error probability. This type of filter is well applicable to real-time digital processing [15], because of its recursive structure allowing execution without storing observations or past estimations [16]. 4 Review All approaches showed themselves able to detect anomalies when they were occurring, each bringing their own information. 3.1 Filtering, smoothing and distance from average highlighted the first signals of an anomaly and showed its growths structure. 3.2 Label-related clustering, showed the possibility to solve the problem using machine learning approach, with an excellent scalability, which is essential for the adaptability of the solution to our project. 3.3 Sequence analysis using statistical features highlighted the possible clustering in the data, giving us an opportunity to differentiate in depth states and stages of anomalies. Finally, 3.4 clarified for us the nature of the noise present in the data and allowed to differentiate jittering by their origins. Figure shows the jittering period labelling. As shown on the figure 3, all techniques developed detects first symptoms of jittering before it Figure 3. Comparison of methods with the period of anomaly they detect, over a fragment of data. From top to bottom, raw data representation, 1) 3.1 Filtering, smoothing and distance from average, 2) 3.2Label-related clustering, 3) 3.3 Sequence analysis using statistical features, 4) 3.4 Kalman filter and rolling metrics, and manual labelling of the data 102 Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019) Budva, Becici, Montenegro, September 30 – October 4, 2019 was signaled on the original dataset (labelling represented on the last line of the figure). As developed in chapter 3.1, the first method presented detects anomalies way ahead of their apparition. The two methods using noise filtering techniques (3.1 and 3.4) are less prone to punctual false positive labelling and more importantly the results doesn’t seems to be altered by the absence of this information, which seems to signify the data removed were indeed non-informative as assumed. Those characteristics will allow us to develop an adaptable and scalable technique to detect, identify and to some extents forecast anomalies using machine learning, in order to maximize the adaptability of our method. The statistical method that will be used is however still to be defined. 5 Conclusion The initial approach of this study, to use different approaches and focus on their respective input showed itself rewarding as in allowed us to discover and understand previously supposed features in data we initially had very little information about. The core objective of the SmartLINAC project is to realize predictive maintenance. In other words to predict anomalies. If it has been done successfully in this project, this application is yet far from what is needed in the project. What we observed in the framework of this study is the informativeness of one specific data source. This study should now be adapted for its use in production in LINAC4’s facilities, thus also allowing the testing of its abilities. If the breakdown forecasts obtained in this study might be interesting at CERN’s facilities, where maintenance for the instruments we are working with is available day and night with a Mean Time To Repair rounding under an hour [17], it is not sufficient for hospitals or radiotherapy station deployed in countries having a shortage in qualified personal for LINACs maintenance. It will be necessary in the future to model and study in depth the LINACs environment, in order to discover not only symptoms of breakdowns but their possible source and patterns. This study is in conclusion a success in itself, with results beyond our initial objectives and its sets a great kick forward to the SmartLINAC project. Acknowledgments This work was made in collaboration with CERN openlab. It was partially financially supported by the Russian Foundation for Basic Research under grant # 19-29-01135, # 18-37-00418, # 17-01-00972 and by the Ministry of Science and Higher Education within the State assignment to the FSRC “Crystallography and Photonics” RAS No. 007-GZ/Ch3363/26 (theoretical results). References [1] L. t. G.Bellodi, "LINAC4 Comissioning status and challenges to nominal operation," in 61st ICFA ABDW on High-Intensity and High-Brightness Hadron Beams, Daejeon, 2018. [2] CERN, "Linear accelerator 4," CERN, [Online]. Available: https://home.cern/science/accelerators/linear-accelerator-4. [Accessed 17 August 2019]. [3] V. Greco, "A partnership-mentorship approach," in ICEC workshop, CERN, Geneva, 2017. [4] G. P. M. S. V. K. H. Xiong, " Enhancing data analysis with noise removal," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 3, pp. 304 - 319, 2006. [5] R. P. Y. G. A. K. Yann Donon, "Key point detection on images: A new polyvalent method," in ITNT proceedings, Samara, 2019. [6] M. Rosenblatt, "Remarks on Some Nonparametric Estimates of a Density Function," The Annals of Mathematical Statistics, vol. 27, no. 3, p. 832–837, 1956. [7] E. Parzen, "On Estimation of a Probability Density Function and Mode," The Annals of Mathematical Statistics, vol. 33, no. 3, p. 1065–1076, 1962. 103 Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019) Budva, Becici, Montenegro, September 30 – October 4, 2019 [8] L. H. a. P. Arabie, "Comparing partitions," Journal of Classification, vol. 2, no. 1, p. 193–218, 1985. [9] A. G. P. I. R. M. Mayur Datar, "Maintaining Stream Statistics over Sliding Windows," SIAM Journal on Computing, vol. 31, no. 6, p. 1794–1813, 2012. [10] A. S. B. S. Everitt, The Cambridge Dictionary of Statistics, Cambridge, UK New York: Cambridge University Press, 1998. [11] N. Salkind, Encyclopedia of Research Design, Thousand Oaks: Sage, 2010. [12] R. G. P. Y. H. Brown, Introduction to random signals and applied Kalman filtering, vol. 3, New York: Wiley, 1992. [13] S. D. Shumway R.H., Time series analysis and its applications: with R examples, New York: Springer, 2017. [14] G. M. S., Kalman Filtering, Heidelberg: Springer, 2011. [15] K. Lim, "Fading Kalman filter-based real-time state of charge estimation in LiFePO4 battery- powered electric vehicles," Applied Energy, no. 2016, pp. 40-48, 2016. [16] K. C., "Optimization approach to adapt Kalman filters for the real-time application of accelerometer and gyroscope signals' filtering," Digital Signal Processing, vol. 21, no. 1, pp. 131-140, 2011. [17] A. A. G. G. B. M. S. S.-E. a. J. U. O. Rey Orozco, "Performance evaluation of LINAC 4 during the reliability run," in 9th International Particle Accelerator Conference, Vancouver, 2018. 104