Exploration of Anomalies in Cyclic Multivariate Industrial Time Series Data for Condition Monitoring Josef Suschnigg Belgin Mutlu Anna Katharina Fuchs Pro2Future, AVL List Pro2Future, Know-Center GmbH AVL List Graz, Austria Graz, Austria Graz, Austria Vedran Sabol Stefan Thalmann Tobias Schreck Know-Center GmbH University of Graz Graz University of Technology Graz, Austria Graz, Austria Graz, Austria ABSTRACT As the first contribution of this paper we characterize testbed Industrial product testing is frequently performed in cycles, re- data and study how engineers achieve their data analysis goals. sulting in cycle-dependent test data. Monitoring the condition of Based on the design study we propose an (1) extendable and products under test involves analysis of large and complex test (2) versatile glyph based visual analytics approach for anomaly data sets. Main tasks are to detect anomalies and dependencies detection in multivariate sensor data as our second contribution. between observation variables, which appears to be challenging The extendibility and versatility is achieved by making underly- to engineers. In this paper, we present a flexible and extend- ing data analysis methods exchangeable and flexible by a variable able visual analytics approach for anomaly detection focusing number of anomaly detectors. Additionally, we applied a tech- on cycle-depended data. It is based on a glyph representation to nique enabling users to visually identify conspicuous sensor data visualize anomaly scores of cycles with respect to interactively by a matrix representation for drill-down and further compar- selected reference data. Our approach is built on a design study ative analysis. As a constraint, the concept of this work is only in collaboration with an industrial engineering corporation, and applicable for cyclic (also periodic or seasonal) data. Nevertheless, is demonstrated on real data from engines tested on automotive cyclic data is related to the repetitive behavior of many industrial testbeds. Based on findings from evaluation results, we provide a applications. The approach has been designed for automotive discussion and an outlook for future work. testbed data with sensor-intensive technology, where anomalous or erroneous events should be anticipated by visual data analysis. The third contribution of this paper are results of the pair ana- 1 INTRODUCTION lytics evaluation [2], which has been conducted in collaboration with the target user group on the given use case data set. Results The trend of digitization in industry (often synonym for so-called are encouraging and open promising directions for future work. Industry 4.0) generates large amounts of data by sensors and data recordings from almost all machines and devices of the produc- tion process with the promise of creating new usage opportunities 2 RELATED WORK [39]. Over all stages of the industrial product life cycle (PLC), This section discusses the related work conducted in analyzing extracting valuable knowledge from generated data can lead to data using either algorithmic data analysis or visual analytics an improvement regarding costs, quality and increased flexibility approaches. Furthermore, we detail the glyph representation as (incl. safety, durability, reliability) [3][31]. However, for human this technique has been proven to be an effective manner to perception, it can be overwhelming to observe and analyze large represent time series. industrial data sets. Another important requirement of analyzing industrial data is that extensive professional and domain-specific 2.1 Automated data analysis approaches for knowledge of users is required [42]. Addressing those challenges, visual analytics research proposes tools supporting domain ex- anomaly detection perts to explore large and complex data sets [26]. Further, to One property of many typical industrial applications is the rep- identify and address the industry’s needs, our case study has etition of specific tasks. To give an example, Maier et al. [21] been conducted in close collaboration with an industry partner emphasized reoccurring processes (cycles) for automation and from the automotive sector, focusing, among others, on engine production. In theory, data generated within such cycles should testbeds. The main driver of the focus on the anomaly detection be highly comparable for anomaly detection. Anomalies are is the common occurrence of real-world problems in automotive generally understood to represent patterns in data that do not condition monitoring. Due to the large amount of testbed data, conform to a well-defined normal behavior [1]. The literature data analysis is time-consuming and there is a risk that events provides a comprehensive collection of algorithms for the de- which lead to the failure of an engine under test, may not be tection of anomalies in multivariate time series data. Anomaly anticipated or overlooked by engineers. We hence research the detection also often refers to the term novelty detection [23] development of visual analytics approaches to support test en- or semi-supervised learning [6], whereas for those methods the gineers in creating hypotheses and early warnings of potential definition of normal or rather reference data is needed. Anomaly failure cases. detection in time series [16] found attention from industry for sev- eral applications, such as predictive maintenance [17], condition © 2020 Copyright for this paper by its author(s). Published in the Workshop Proceed- monitoring [40] or decision support systems [28]. Two groups ings of the EDBT/ICDT 2020 Joint Conference (March 30-April 2, 2020, Copenhagen, of anomaly detection algorithms are used in this work. The first Denmark) on CEUR-WS.org. Use permitted under Creative Commons License At- tribution 4.0 International (CC BY 4.0) group are correlation-based approaches, which have been effec- tively applied on industrial sensor data [40]. The idea behind EDBT 2020, March 30-April 2, 2020, Copenhagen, Denmark Suschnigg et al. this approach is, that changes of the bivariate correlation be- 120 tween two sensors can be interpreted as an anomaly. The second Temperatur in °C group are regression-based methods [12] or reconstruction-based novelty detection methods [23]. The basic idea of this type of anomaly detection is to build a regression model on reference 115 (normal) data. Estimation errors of the model in comparison to measured data are rated as anomalies if they exceed a predefined threshold. 110 0 400 800 1,200 1,600 2,000 2.2 Visual analytics for industrial application Hours Recently, a survey on visualization and visual analytics applica- tions for smart manufacturing has been published [42]. It reveals Figure 1: Temperature incident the diversity of several studies performed for industrial appli- Temperature of the critical component over all cycles. For each cations and the need of visual analytics. A few examples are cycle over the whole durability test time the mean temperature available, on how to solve the problem of finding anomalies in has been calculated for trend analysis. multivariate time series data by visual analytics. An application for finding anomalies in the power consumption of buildings has been proposed by Janetzko et al. [18]. It suggests a model- engine power density, speed and durability, or of legal nature, based and a similarity-based anomaly score and visualizes them such as, along with others, fuel economy, noise pollution and in several visualization techniques such as, recursive patterns exhaust gas emissions. For this research work, we analyzed data [19], spiral graphs [34] and line charts. In the work of Wu et from a durability test of an internal combustion engine. The al. [35] anomalies are detected for condition monitoring by a main goal of the test is to ensure the durability, reliability and life model-based approach. The deviation of estimated and real val- time expectations of the engine. Therefore, durability tests are ues is visualized in a river plot view [9]. As an ongoing challenge, conducted to let the engine undergo sufficiently high mechanical the authors outline the problem of analysts to trust and make and thermal loads (stresses) and a sufficient number of fatigue use of the algorithms for condition monitoring. Many different cycles (e.g., hundreds of hours) [37]. During durability tests, a algorithms are available for several applications and finding ap- vast amount of data is collected by sensors which either are propriate models and parameters is a hard task. Considering this commonly build in modern vehicles and accessed through the problem, Xia et al. [36] proposed a visual analytics application engine control unit (ECU), or sensors which have been mounted to support users in finding the right model for dimensionality on the engine and the testbed for testing purposes. reduction. Another work addressing this challenge is presented Throughout durability tests, engineers are observing the test, by the EnsembleLens [38]. It is a visual analytics system to help and are responsible for the performance and the condition of the data mining experts to evaluate, compare and select available testee. For that condition monitoring task, engineers generally anomaly detection algorithms. monitor a few familiar sensors for threshold violations, manually selected and defined by their given domain knowledge or by the 2.3 Glyph representation of cyclic time series customer. However, for novel engine design, which has been data recently developed, there is no knowledge on all sensors and their thresholds, and it is often not appropriate to apply earlier Besides the major summaries and surveys [7] [25], recently a experiences. In fact, an important task in testing of novel engine systematic review of experimental studies on data glyphs has designs is the comparison for differences with previous designs. been presented by Fuchs et al. [14]. To visualize multivariate Therefore, our work is motivated by making use of the time series time series data, glyphs are an appropriate choice and can en- data of all sensors and the information they might contain. able quick visual comparison of data values over time [13]. In To give a practical example of the challenges engineers face Ward and Lipchak [33], the visualization of a circular glyph for during durability tests, in Figure 1 a line plot describing the prob- recognition of the evolution of a measurement of interest has lem of a use case is shown. After 1, 200 hours of a 2, 000 hours been proposed. Another glyph-based design for outlier detection durability test a fatal error occurred, which increased the tem- in social networks has been proposed by Cao et al. [10]. In their perature of a critical component part of the engine by up to 8 work, glyphs visualize suspicious behavior of users, based on °C over time. In the end, the durability test failed because of the the z-score of several attributes, in a star-glyph like design. The temperature increase of the critical component. The failure could anomaly scores of entities are visualized by the intensity of the not be anticipated, because of the large amount of sensors. Con- red color in the cyclic glyph center. As examples for glyph-based sequently, it was a hard task to define rule-based thresholds or time series visualizations, a few techniques for glyph designs for measurements of concern, indicating the failure upfront. How- comparison purposes are evaluated in the work of Fuchs et al. ever, in comparison to a simple rule-based anomaly detection [13]. approach, more complex data analysis and models which also take the interplay of sensor data into account may lead to better 3 BACKGROUND ON AUTOMOTIVE analysis results. Domain experts assume that it should be pos- TESTBEDS AND USE CASE sible to anticipate such failures by advanced data analysis and Our work has been conducted for automotive engines in the visualization techniques. Therefore, to understand the current context the validation and verification phase of the industrial PLC data analysis workflow of engineers, we carried out a design [3] [30]. After an engine has been developed, its requirements study as basis for our proposed visual analytics application. More are verified and validated in automotive testbed environments. details on the data and the design study will be given in the next Those requirements can be of functional nature, such as the section. EDBT 2020, March 30-April 2, 2020, Copenhagen, Denmark 4 DESIGN STUDY 4.3 Tasks As the first contributions of this paper, we characterize testbed This design study is based on the domain question, if an engine data and studied how engineers fulfill their condition monitoring is in a non-critical condition during a durability test. For that task through data analysis. This section relates to Miksch and purpose, we define test cycles as the population unit (or entity, Aigners “design triangle” [22] and is generally based on the design or unit of analysis) [20]. Furthermore, engineers have a high study methodology of Sedlmair et al. [27]. For the tasks aspects level of understanding of using cycles as a granularity level for of the design triangle we bridge from goals to tasks with the their analysis. Due to the repetitive behavior of cycles, they are design study analysis report as proposed by Lam et al. [20]. highly comparable, and therefore we define the data analysis goals of engineers as multiple population analysis. Consequently, we identified that engineers are pursuing all three multiple popu- 4.1 Data lation goals defined in the design study analysis report framework [20]: (a) compare entities (b) explain differences and (c) evaluate Input data used in our proposed visual analytics approach for hypothesis. In the following, we further investigate the char- anomaly detection is taken from an automotive engine testbed. acterization of those goals by their input, output and analysis One common task within engine development is to carry out dura- steps. bility tests. For such a durability test, a test cycle is specified to verify the durability of an engine. The test cycle, which is defined 4.3.1 Compare Entities. Engineers attempt to detect popu- by a given engine speed and engine torque profile over time, is lation differences as their top level analysis goal. In order to repeated in the period of several month until the target operating achieve that, engineers are observing trends of several familiar hours are reached. During a durability test, hundreds of sensor channels by calculating the mean values of channels per cycle measurement signals are acquired and stored continuously, while and visually explore changes over cycles in a time-ordered line the engine drives the given profile. In the automotive domain, plot (see Figure 1). In trend charts, up to ten time series are com- those sensor measured time series are called channels, which pared either in juxta- and/or superpositioned lineplots [15]. The we adopted throughout our research. Among others, channels output of the compare entities analysis goal, is the observation mainly record several engine speed, engine torque, temperatures, of a conspicuous trend or anomaly, which is investigated in the pressures and exhaust gas measures. explain differences goal. One cycle is stored within one file and can be seen as a N × n- 4.3.2 Explain Differences. Regarding the domain question, dimensional matrix, where N is the number of channels and n the output of the explain differences goal can be either that the the length of the time series. All signals originally are recording observation is not relevant for the engines condition, or as a numerical values in a frequency of 10Hz. Note that channels are hypothesis one specific component of the engine is in a bad aligned according to the given engine speed profile and therefore condition. In the next analysis goal, the hypothesis needs to be cycles of the same length can be extracted. The target data con- evaluated. tains records of c = 860 cycles of a 2, 000 hours’ durability test, in which each cycle has a duration of 140 minutes. Overall, the 4.3.3 Evaluate Hypothesis. Evidence that a component or part dataset has 860 cycles x 480 channels x 84, 000 numerical values. of the engine is in a bad condition needs to be evaluated by To conclude, the dataset can be characterized as a cyclic, piece- engineers. The analysis steps of that goal don’t differ from the wise equi-length segmented, multivariate streaming time series preceding goal. Domain experts are exploring channel time series (accordingly to the characterization of Shurkhovetskyy et al. [29]). by their domain knowledge and attempt to find differences and For our research work data could be seen as stationary, since interesting patterns by comparing different channel line plots the durability has finished and the entire dataset was available of multiple populations. The final confirmation or rejection of for our work. Although, we kept a streaming time series sce- hypotheses is made after evidence has been collected through nario in mind where engineers use our proposed visual analytics data analysis and if required investigated directly on the engine. approach throughout a durability test. Through that characterization of higher level analysis goals, we can derive lower level task definitions T1 - T5 to address them in our visual analytics design considerations and automated data 4.2 Users analysis: Users of our proposed visual analytics application are devel- T1 Identify population contrasts. Test cycles are the unit of opment engineers with mechanical engineering background, analysis, or population, for engineers. As the first task, popu- working with powertrains and engines on a regular basis. They lation contrasts or differences are explored. This is achieved have long-standing experience with engines in testbed environ- by trend analysis of a few familiar channels. The problem of ments and as front-line analyst, also practically analyzing data dealing with big channel amounts engineers face should be to achieve their analysis goals (i.e., condition monitoring). Three considered in the design by taking all channels into account. users collaborated in our project systematically by participating T2 Application and visualization of semi-supervised anom- at the design study and the pair analytics evaluation (see section 4 aly detection methods. Engineers detect interesting pat- and section 7). In general, the work with testbed data is essential terns and anomalies mainly by visually exploring line plot for development engineers and offers the opportunity to measure trends of channels. Comparing past cycles to current cycles indicators regarding functional or legal requirements and engine is related to a semi-supervised learning scenario and should performance. During our research work, we also collaborated be considered for the choice of automated data analysis and with data scientists who are daily working with testbeds and the visualization. First, the automation of data analysis to powertrains. They constantly provided informal feedback from a highlight interesting or conspicuous channels should be in- different view throughout our work. cluded into the visualization. Second, we assume that the combination of several anomaly detection algorithms leads EDBT 2020, March 30-April 2, 2020, Copenhagen, Denmark Suschnigg et al. to more significant findings, for which reason an ensemble anomaly score calculation. Note, that the baseline anomaly score method [38] should be considered for the visualization. is calculated by a baseline cycle, which data should be recorded T3 Examine conspicuous channels in multiple populations. temporally close to the reference cycle. Hence, we specify the After conspicuous channels have been detected by trend cycle subsequent to the reference as the baseline cycle. Defining analysis, engineers drill-down to examine and find differ- the upper limit (threshold) is critical and can be changed inter- ences between channel line plots of different populations. actively in the visualization. In the following, we discuss two The comparison of interesting channels in different popula- methods, which have been applied on industrial sensor data in tions should be considered in the design. prior research. Another criterion for selecting those two methods T4 Detection of conspicuous channel relations. With excep- is the capability of identifying conspicuous channels separately tions on visualizing multiple trend line plots in juxtaposition, for further drill-down and comparative analysis (T3). engineers generally detect anomalies by univariate time se- ries analysis of multiple channels. They, also compare line plots with the given engine speed and engine torque. Consid- 5.2 Correlation-based anomaly score ering relations between channels at a broader scale should be Inspired by the approach presented by Zhao et al. [40], we as- considered for the automated data analysis and visualization. sume that the change of linear correlations between two sensors T5 Reduce amount of data. To address the large amount of over time refers to an anomaly. During our research work, we channels, data reduction techniques should be take into ac- investigated the application of correlation-based anomaly de- count for the visual analytics design. The main consideration tection on testbed data. Despite its limitations, the method has for the visual analytics design is that interesting data should also strengths that are beneficial in solving the tasks defined be highlighted to support engineers in their decision making. in the design study. First, we highlight the main limitation in detecting anomalies by the change of linear sensor correlations, Overall, data can only be analyzed by including extensive domain if throughout the durability test no linear correlation between knowledge of users to the data analysis. Modern powertrains and two specific sensors exist. Nevertheless, we examined that in engines are highly complex machines and therefore domain ex- testbed data many linear correlations between sensors exist. For perts are necessary to interpret results of automated data analysis example, data from several temperature channels are likely to through a visual analytics approach. In the next section we in- correlate. Experiments demonstrated that this method can detect troduce the anomaly detection methods we applied to the visual anomalies in testbed data, and therefore has been applied to our analytics approach. visual analytics approach. The basis for the correlation-based anomaly score is the corre- 5 AUTOMATED DATA ANALYSIS: lation difference matrix, which represents the deviation of linear ANOMALY DETECTION METHODS channel relations between two testbed cycles. The correlation In research, anomaly detection often refers to a two-class classi- matrix for each sensor combination of the reference cycle, the fication problem, in which data either is classified as an anomaly baseline cycle and the unseen cycle are calculated by using Pear- or not. In general, a model is built on normal data, considering son’s correlation coefficient. Then, the correlation matrix of the that the model can calculate an anomaly score on unseen data unseen cycle is subtracted by the correlation matrix of the ref- sets (apart from unsupervised methods). If the anomaly score erence cycle, which results in the correlation difference matrix. exceeds a predefined threshold, the data record or the entire set As a result, the anomaly score is calculated as the average of is classified as an anomaly. We consider the application of semi- all values in the difference matrix and is mapped to the unified supervised anomaly detection methods to our design T2. Most anomaly score accordingly to the method explained before. techniques are specific to different observational features, in con- sequence of which we assume that an ensemble-based approach obtains more robust anomaly scores [38]. Therefore, we propose 5.3 Regression-based anomaly score to map results of different anomaly detection algorithms to a As the second anomaly score, we make use of regression models unified value for comparison purposes and describe two anomaly for regression-based anomaly detection [12]. For this research detection methods, used in the visual analytics approach. work, we train regression models to estimate a time series. Con- sequently, the model is applied to an unseen data set, in which 5.1 Unified anomaly score the difference between the estimation and the real values (resid- Test cycles are engineer’s unit of analysis for which reasons uals) can be interpreted as anomalies. Considering this method we choose them as the granularity level for data analysis (T1). for T1 and T2, an anomaly score between populations or cycles To make different anomaly detection methods comparable in needs to be calculated in a semi-supervised manner. Therefore, an ensemble-based approach, we propose the following to map regression models with data of a user-defined reference cycle are anomaly scores to unified values between 0 and 1: (1) Interac- trained for all channels separately. To make those channel regres- tively select a reference cycle as input data for the training of the sion models comparable, it is necessary to standardize data first, anomaly detection model. (2) A baseline cycle is selected to cal- i.e. standardization of the entire time series to values between culate a baseline anomaly score. (3) Define a threshold anomaly 0 and 1. The regression models can now be used to estimate all score, based on prior knowledge, domain knowledge or historical channel time series for unseen cycles and the anomaly score of data. (4) Further, the anomaly score of cycles are calculated as one cycle can be calculated by the average mean average error the linear scaling from 0 (baseline) to 1 (threshold). Therefore, over all channels of a cycle. In the following it can me mapped our approach needs the definition of a reference cycle for model to a unified anomaly score accordingly to the method explained training, a cycle for baseline definition and the definition of a above. threshold. The baseline anomaly score is used to consider a train- We chose Random Forest regressor, as suggested by Breiman ing error and therefore is taken as the lower limit of the unified [8], considering that this model has been proven to perform well EDBT 2020, March 30-April 2, 2020, Copenhagen, Denmark comprehensible to the target group. However, our approach has advantages over line-plot-based visualizations. In general, testbed cycles as granularity level are highly comprehensible for engi- neers. Therefore, cycles are visualized as individual and complete entities, whereas the glyph design offers the following opportu- nities: (1) As a visual entity it can be clearly selected by users for further exploration, reasoning and drill-down. Also, glyphs can be selected interactively to be defined as reference for the underlying semi-supervised learning algorithms to identify con- trasts between populations (T1, T2). (2) The glyph design can be extended with several anomaly detection algorithms by adding Figure 2: Proposed glyph design additional outer circular segments. (3) Similar glyphs can be ag- The glyph visualizes two anomaly scores and their ensemble gregated, clustered and arranged for further interpretation by (aggregate). Anomaly Score 1 visualizes a value of 0.8, Anomaly domain experts and to save screen space. (4) The glyph can be Score 2 visualizes a value of 0.2, whereas the equally weighted visualized on its own as a quick overview of an engine’s con- center visualized the Ensemble Anomaly Score of 0.5. dition. This is also related to the idea of involved engineers of having a simple "traffic light like" system, which also encouraged in many domains [41]. As input data for training channel regres- us, developing the presented glyph-based approach. sion models, engine speed and engine torque are chosen, since those two channels are given by the test and strongly relate to 6.2 Identification of anomalous channels the majority of the channels (T4). Also, sliding window features As stated above, we choose two anomaly scores by their capability for those two channels are extracted, whereas sliding windows to further explore single anomalous channels. After an anomalous contain differences and mean values of three seconds into the channel has been identified in the glyph representation, users are past. We assume that in this time frame the most relevant infor- interested in the cause of that anomaly. Therefore, we visually mation can be extracted for our models. The aim of this approach represent anomalies for both anomaly scores, as follows: is not to estimate each channel as accurate as possible, but to Matrix-based identification of anomalous channels. The detect change of anomaly scores between populations. As the correlation deviation matrix calculated for the correlation-based correlation-based method, we are aware of the limitation that this anomaly score is shown in step (c) of Figure 3. Basically, in this method may not return a decent estimation for all channels, but it symmetrical matrix, deviations of correlations of channels within may be effective for some types of anomalies. This consideration a given cycle with respect to the selected reference cycle are vi- should also emphasize the choice of an ensemble method. sualized. More specifically, we compute the difference of the correlation matrices of these two cycles, and show the result by 6 VISUAL ENCODING AND color-coding the cells of the difference matrix. Hence, levels of CONSIDERATIONS red representing the anomaly score of channel correlations. This This section explains how we use two anomaly scores for a glyph- matrix representation supports the analysis goal to determine based visualization. Also, an example on how to identify conspic- and quantify visual patterns for pattern-driven visual exploration. uous channels within a cycle either in a matrix representation Together with appropriate matrix reordering methods, we can use and a ranked channel list is given. The visual considerations this display to search for typical patterns in matrix visualizations, explained in this section will be brought together in the proto- including line patterns and block patterns [4]. Most importantly, type, describing the visual analytics approach by the prototype if one sensor shows an anomalous behavior, its correlation dif- implementation. ference values to many or all other sensors will be rather large, leading to line patterns. Such visual patterns attract the attention 6.1 Cycle anomaly glyph of the analyst and are a starting point for drilling-down into the The proposed glyph in Figure 2 is flexible and independent of the respective sensor data (Figure 3 (e2)). underlying analytical methods for anomaly detection, as long as it Ranked mean average error list. The regression-based anom- implements the framework for calculating unified anomaly scores aly score can be explored by the ranked mean average error between 0 and 1 (subsection 5.1). Anomaly scores are visualized list as proposed in Figure 3 (d). Channels that deviate from the in the outer circular segments of the glyph representation as reference are listed and ranked by their anomaly score. This en- the opacity value of the red background color. Our aim, when ables a guided approach for exploring anomalies and simplifies designing the glyph was that no algorithm can detect all kind data analysis. By clicking on channel names users can explore of anomalies, relevant for different applications. As a result, we the reference and the anomalous channel time series by visually choose an extensible glyph design, achieved by its circular shape, comparison in juxtaposition for hypothesis generation (Figure 3 which offers the capability of adding and removing anomaly (e1)). scores in their according circular segments. The main visual focus of the glyph stays at the center circle, which represents an 6.3 Prototype equally weighted average of anomaly scores combined, labeled as the ensemble anomaly score. The workflow of the approach, applied to data of the given use During our work, the main concern of visualization experts case, is exhibited and briefly described in Figure 3. It shows screen- regarding the presented glyph design was the benefit compared shots of the implemented prototype, whereas further explana- to simpler visualizations, such as line plots. As stated in the de- tions are given in the following: In (a) glyphs are placed in a sign study, line plots are a well-known visualization type and grid, with each cell representing a test cycle in chronological EDBT 2020, March 30-April 2, 2020, Copenhagen, Denmark Suschnigg et al. Figure 3: Proposed visual analytics tool (a) Differences between the selected reference cycle and and other cycles can be explored, whereas glyphs are positioned in a time ordered grid. (b) Besides some filter capabilities, the anomaly score threshold can be interactively changed by a ruler. (c) Anomalies found by the correlation-based anomaly score can be explored in the matrix representation (d) Anomalous channels found by the regression-based anomaly score can be explored by the ranked mean average error list. (e) Hypothesis can be evaluated by comparing channel time series in the reference cycle and the cycle of interest. order (from top left to bottom right, inspired by the calendar- 7 EVALUATION based view [32]). Note, that the reference cycle is interactively We conducted a pair analytics evaluation [2] with three subject selectable and represented by a white circle, as visible in the top matter experts (SME), who represent the target user group identi- left, or first, glyph. In (b) three interaction possibilities are visible: fied in subsection 4.2, and the dataset described in subsection 4.1. (i) to add flexibility to the visual comparison of glyphs the user The main target was to evaluate either the comprehensibility of can interactively change the anomaly score threshold to values the different views and the underlying automated data analysis, between 50 % - 200% of the original value (ii) the user can change along with the capabilities and limitations in supporting users the amount of displayed glyphs by filtering them by a ’‘from - to’ with their daily condition monitoring analysis goals. According range slider (iii) glyphs can be filtered by the definition to visu- to the pair analytics protocol, the evaluation is done by a human- alize every x t h glyph only. Both anomaly scores of interesting to-human interaction of one SME and one visual analytics expert glyphs can be selected for further exploration by a drill-down (VAE), in which the SME acts as the navigator and the VAE as the in (c) and (d): In (c) a drill-down example to inspect and identify driver (operator) of the visual analytics tool. In general, all three one or more conspicuous channels within the selected cycle by a SME participants stated that the visual analytics tool can be of matrix representation visualizing the correlation-based anomaly great benefit to support them in their daily work for two reasons: score is shown. An example for a visual perceptive line pattern The visual analytics tool supports engineers in analyzing testbed is outlined, representing a possibly conspicuous channel. Fur- data (1) more efficiently by highlighting interesting data on dif- ther, the conspicuous channel can be selected in the matrix for ferent granularity levels and (2) more effective by enabling the exploration and comparative analysis with the reference line analysis of the entire dataset and not only a subset of well-known plot in (e2). In (d) an example of the ranked mean average error channels. To give evidence to that statement, we connect partici- representation of the regression-based anomaly scores is given, pants comments and actions during the pair analytics evaluation in which its drill-down capabilities are visible in (e1). In general, to the task definition of subsection 4.3 in the following: drill-down information needs to be investigated and interpreted Each evaluation session started by an introduction to the visual by domain experts. However, our approach supports users in the analytics approach and a short demonstration of the prototype. It identification of interesting data by visually highlighting deviat- is notable, that all three participants (P1, P2, P3) gained a quick ing cycles and sensors. As a side note, interactive line plots and understanding of the concept for two reasons: First, we conducted heatmaps in the prototype have been created with the JavaScript the design study with the same engineers and connected findings visualization library Plotly.js [24] and are anonymized in Figure 3 of the study with explanations of our visual analytics tool. Second, screenshots. the design study clearly identifies tasks and goals of engineers, EDBT 2020, March 30-April 2, 2020, Copenhagen, Denmark therefore the visual analytics prototype accurately addresses the of the proposed visual analytics approach and are interested in needs of engineers. In general, participants appreciated our effort using the prototype in a productive scenario. In the next section, in developing a decision support system supporting engineers in we discuss some of the aspects of the evaluation in greater detail handling the big amount of data for their condition monitoring and also consider generalizability and future work. tasks. The actual pair analytics evaluation sessions started by defin- 8 DISCUSSION AND FUTURE WORK ing a reference cycle in the glyph-based overview (T2). P1 and P3 Our visual analytics approach has been designed and evaluated appreciate the capability of selecting the reference cycle interac- on testbed data, but we emphasize that it is not limited to the tively in the visualization. However, P2 questioned the necessity automotive domain. At least the glyph-based overview should of interactively selecting the reference cycle, because the testee is be applicable on any other cyclic multivariate data set, as long as likely to be in a good condition before the first cycle, considering the underlying automated data analysis methods and dependent that the testee runs through an extensive health check at the visualization techniques are adapted to the specific domain. The beginning of the entire durability test. We are aware of the fact visual analytics prototype has been evaluated to be useful for that selecting the first cycle may be an appropriate default choice, collaborators and they clearly identified advantages in terms of but we wanted to keep the analysis more flexible. efficiency and effectiveness in comparison to their current work- After the reference cycle has been selected, other glyphs in the flow. Also, the evaluation opened up many directions for future overview turned red regarding their anomaly scores (see Figure 1 work: Analyzing up to a thousand of cycles can be critical regard- (a)). All participants were immediately curious in exploring those ing the screen space. Applied filtering techniques in Figure 3 (b) anomalies by the visualization and easily identified cycles that can be improved by a more scalable solution and clearly needs appear interesting to them (T1). P2 and P3 pointed at cycles that further attention. For example, similar glyphs can be aggregated had a more intensive color of red then the majority of all cycles, to save space on the display. We also address the scalability for whereas P1 mentioned that all glyphs that visualize at least a the matrix-representation for future work. As a generic abstrac- small anomaly score are interesting. However, in a productive tion of anomalies in the glyph-based overview, the calculations use scenario the exploring strategy may differ since not all cy- of anomaly scores are exchangeable and more extensive research cles are available from the beginning, and new data would be on additional available anomaly detection methods for the use explored incrementally on a regular basis as it becomes available. case needs to be done. The evaluation demonstrated that anom- We emphasize that at this stage of the visual data analysis we alies can be characterized in three manners. Hence, engineers successfully reduced the amount of data (T5) and enabled the should be able to provide feedback on their findings, i.e., to clas- further exploration of anomalies in the succeeding views. sify the relevance of anomalies consequently for analyzing data T3 and T4 are both achieved by exploring one of the two anom- in further iterations more efficiently. Even if the target users are aly scores of a specific cycle: (1) The correlation-based anomaly not data mining experts, we experienced that they gain a quick score and the correlation difference visualization in Figure 1 (b) understanding of the proposed workflow. However, for future were comprehensible to the participants as they were able to work we will address guidance for visual analytics [11] to reduce identify conspicuous channels. However, participants articulated system complexity from the user perspective and support users the need of a more guided approach to engage engineers using to further achieve their analysis goals. One promising venue we the matrix visualization, because it appears overloaded and thus see to this end is the application of visual interestingness mea- overwhelming to engineers. (2) The regression-based anomaly sures [5] to automatically select cycle pairs from the database score was also highly comprehensible to participants, since they showing significant visual patterns. have a general understanding of regression models. On the other hand, we avoided explaining the actual regression model Random 9 CONCLUSION Forest to participants in detail. In comparison, to the correlation We propose a visual analytics approach to improve the engineer’s difference matrix, participants commented that the exploration daily work by the glyph-based visual analytics workflow. We have of conspicuous channels is easier by the ranked mean average found promising results on the given use case, but the concept error list (Figure 2 (c)). Also, they expressed their interest of ad- still needs to be proven with other datasets. We contribute to ditional guided approaches in the other views, considering that the need of visual analytics approaches for condition monitor- such rankings represent a clear order on what channels to focus ing or anomaly detection in cyclic time series data. Further, our on, especially if they are short of time during their analysis. approach devises a methodology to reduce large amounts of in- As the last step of the visual analysis, participants evaluated dustrial data, by drawing attention to anomalous cycles on a hypothesis of channels being anomalous by comparing anoma- higher granularity level to increase efficiency and effectiveness lous line plots with their reference cycle equivalents (see Figure 1 of engineers’ data analysis work. (e1 + e2)). From a data perspective, engineers approved that all explored anomalies are interesting, because they highlight a sig- ACKNOWLEDGMENTS nificant difference to the reference. From a domain perspective, This research work is done by Pro2 Future and AVL List. Pro2 Future some of the anomalies were interesting, but others were expli- is funded within the Austrian COMET Program-Competence cable and irrelevant for the condition monitoring task. Another Centers for Excellent Technologies- under the auspices of the type of anomaly, that has been detected during evaluation are Austrian Federal Ministry of Transport, Innovation and Tech- defect or unconnected sensors. Line plots of those anomalies nology, the Austrian Federal Ministry for Digital and Economic visualize a constant or noisy signal. Therefore, we characterize Affairs and of the Provinces of Upper Austria and Styria. COMET three types of anomalies that have been found during evaluation: is managed by the Austrian Research Promotion Agency FFG. (1) Domain irrelevant (2) Domain relevant and (3) Defect sensors. Overall, we evaluated that the visual analytics prototype re- ceive acceptance from all participants. They confirm the benefit EDBT 2020, March 30-April 2, 2020, Copenhagen, Denmark Suschnigg et al. REFERENCES on Visualization and Computer Graphics 18, 12 (2012), 2431–2440. [1] Shikha Agrawal and Jitendra Agrawal. 2015. Survey on Anomaly Detection [28] Maxim Shcherbakov, Adriaan Brebels, N.L. Shcherbakova, V.A. Kamaev, O.M. using Data Mining Techniques. In Procedia Computer Science, Vol. 60. Elsevier, Gerget, and D. Devyatykh. 2017. Outlier detection and classification in sen- 708 – 713. sor data streams for proactive decision support systems. Journal of Physics: [2] Richard Arias-Hernandez, Linda T Kaastra, Tera M Green, and Brian Fisher. Conference Series 803 (2017). 2011. Pair Analytics: Capturing Reasoning Processes in Collaborative Visual [29] Georgiy Shurkhovetskyy, N Andrienko, G Andrienko, and Georg Fuchs. 2018. Analytics. In 44th Hawaii International Conference on System Sciences. IEEE, Data Abstraction for Visualizing Large Time Series. Computer Graphics Forum 1–10. 37, 1 (2018), 125–144. [3] Eric Armengaud. 2017. Industry 4.0 as Digitalization over the Entire Product [30] Stefan Thalmann, Gursch Heimo, Josef Suschnigg, Milot Gashi, Helmut Enns- Lifecycle: Opportunities in the Automotive Domain. In Systems, Software and brunner, Anna Katharina Fuchs, Tobias Schreck, Belgin Mutlu, Jürgen Mangler, Services Process Improvement, 24th European Conference, EuroSPI 2017. Springer, Gerti Kappl, Christian Huemer, and Stefanie Lindstaedt. 2019. Cognitive Deci- 334–351. sion Support for Industrial Product Life Cycles: A Position Paper. In Cognitive [4] Michael Behrisch, Benjamin Bach, Nathalie Henry Riche, Tobias Schreck, and 2019, Vol. 11. IARIA, 3–9. Jean-Daniel Fekete. 2016. Matrix Reordering Methods for Table and Network [31] Stefan Thalmann, Juergen Mangler, Tobias Schreck, Christian Huemer, Marc Visualization. Computer Graphics Forum 35, 3 (2016), 693–716. Streit, Florian Pauker, Georg Weichhart, Stefan Schulte, Christian Kittl, [5] Michael Behrisch, Benjamin Bach, Michael Hund, Michael Delz, Laura von Christoph Pollak, Matej Vukovic, Gerti Kappel, Milot Gashi, Stefanie Rinderle- Rüden, Jean-Daniel Fekete, and Tobias Schreck. 2017. Magnostics: Image- Ma, Josef Suschnigg, Nikolina Jekic, and Stefanie N. Lindstaedt. 2018. Data Based Search of Interesting Matrix Views for Guided Network Exploration. Analytics for Industrial Process Improvement A Vision Paper. In IEEE 20th Transactions on Visualization and Computer Graphics 23, 1 (2017), 31–40. Conference on Business Informatics (CBI), Vol. 02. 92–96. [6] Gilles Blanchard, Gyemin Lee, and Clayton Scott. 2010. Semi-supervised [32] Jarke J Van Wijk and Edward R Van Selow. 1999. Cluster and calendar based novelty detection. Journal of Machine Learning Research 11 (2010), 2973–3009. visualization of time series data. In Proceedings IEEE Symposium on Information [7] Rita Borgo, Johannes Kehrer, David H. S. Chung, Eamonn Maguire, Robert S. Visualization. 4–9. Laramee, Helwig Hauser, Matthew Ward, and Min Chen. 2013. Glyph-based [33] Matthew O. Ward and Benjamin N. Lipchak. 2000. A visualization tool for Visualization: Foundations, Design Guidelines, Techniques and Applications. exploratory analysis of cyclic multivariate data. Metrika 51 (2000), 27–37. Eurographics 2013 - State of the Art Reports (2013), 39–63. [34] Marc Weber, Marc Alexa, and Wolfgang Müller. 2001. Visualizing time-series [8] Leo Breiman. 2001. Random Forrest. Machine Learning 45 (2001), 5–32. on spirals. In IEEE Symposium on Information Visualization, 2001. 7–13. [9] P. Buono, C. Plaisant, A. Simeone, A. Aris, B. Shneiderman, G. Shmueli, and W. [35] Wenchao Wu, Yixian Zheng, Kaiyuan Chen, Xiangyu Wang, and Nan Cao. Jank. 2007. Similarity-Based Forecasting with Simultaneous Previews: A River 2018. A Visual Analytics Approach for Equipment Condition Monitoring in Plot Interface for Time Series Forecasting. In 11th International Conference Smart Factories of Process Industry. In IEEE Pacific Visualization Symposium. Information Visualization. IEEE, 191–196. 140–149. [10] Nan Cao, Conglei Shi, Sabrina Lin, Jie Lu, Yu Ru Lin, and Ching Yung Lin. [36] Jiazhi Xia, Fenjin Ye, Wei Chen, Yusi Wang, Weifeng Chen, Yuxin Ma, and 2016. TargetVue: Visual Analysis of Anomalous User Behaviors in Online Anthony K.H. Tung. 2018. LDSScanner: Exploratory Analysis of Low- Communication Systems. IEEE Transactions on Visualization and Computer Dimensional Structures in High-Dimensional Datasets. IEEE Transactions Graphics 22 (2016), 280–289. on Visualization and Computer Graphics 24 (2018), 236–245. [11] Davide Ceneda, Theresia Gschwandtner, Thorsten May, Silvia Miksch, Hans- [37] Qianfan Xin. 2013. 2 - Durability and reliability in diesel engine system design. Jörg Schulz, Marc Streit, and Christian Tominski. 2017. Characterizing Guid- In Diesel Engine System Design, Qianfan Xin (Ed.). Woodhead Publishing, 113 ance in Visual Analytics. IEEE Transactions on Visualization and Computer – 202. Graphics 23 (2017), 111–120. [38] Ke Xu, Meng Xia, Xing Mu, Yun Wang, and Nan Cao. 2019. EnsembleLens: [12] Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly Detec- Ensemble-based Visual Exploration of Anomaly Detection Algorithms with tion: A Survey. ACM Computer Surveys 41, 3, Article 15 (2009), 58 pages. Multidimensional Data. IEEE Transactions on Visualization and Computer [13] Johannes Fuchs, Fabian Fischer, Florian Mansmann, Enrico Bertini, and Petra Graphics 25 (2019), 109–119. Isenberg. 2013. Evaluation of Alternative Glyph Designs for Time Series Data [39] Shen Yin and Okyay Kaynak. 2015. Big Data for Modern Industry: Challenges in a Small Multiple Setting. In Proceedings of the SIGCHI Conference on Human and Trends [Point of View]. Proc. IEEE 103, 2 (2015), 143–146. Factors in Computing Systems (CHI ’13). ACM, 3237–3246. [40] Pushe Zhao, Masaru Kurihara, Junichi Tanaka, Tojiro Noda, Shigeyoshi [14] Johannes Fuchs, Petra Isenberg, Anastasia Bezerianos, and Daniel Keim. 2017. Chikuma, and Tadashi Suzuki. 2017. Advanced correlation-based anomaly A Systematic Review of Experimental Studies on Data Glyphs. IEEE Transac- detection method for predictive maintenance. In IEEE International Conference tions on Visualization and Computer Graphics 23 (2017), 1863–1879. on Prognostics and Health Management. 78–83. [15] Michael Gleicher. 2018. Considerations for Visualizing Comparison. IEEE [41] Xun Zhao, Yanhong Wu, Dik Lun Lee, and Weiwei Cui. 2018. iforest: Inter- Transactions on Visualization and Computer Graphics 24, 1 (2018), 413–423. preting random forests via visual analytics. IEEE transactions on visualization [16] Manish Gupta, Jing Gao, Charu C Aggarwal, and Jiawei Han. 2014. Outlier and computer graphics 25, 1 (2018), 407–416. Detection for Temporal Data: A Survey. IEEE Transactions on Knowledge and [42] Fangfang Zhou, Xiaoru Lin, Chang Liu, Ying Zhao, Panpan Xu, Liu Ren, Data Engineering 26, 9 (2014), 2250–2267. Tingmin Xue, and Lei Ren. 2019. A survey of visualization for smart manufac- [17] Clemens Gutschi, Nikolaus Furian, Josef Suschnigg, Dietmar Neubacher, and turing. Journal of Visualization 22 (2019), 419–435. Siegfried Voessner. 2019. Log-based predictive maintenance in discrete parts manufacturing. In 12th CIRP Conference on Intelligent Computation in Manu- facturing Engineering, Vol. 79. Elsevier, 528 – 533. [18] Halldór Janetzko, Florian Stoffel, Sebastian Mittelstädt, and Daniel A. Keim. 2014. Anomaly detection for visual analytics of power consumption data. Computers & Graphics 38 (2014), 27–37. [19] Daniel Keim, H.-P Kriegel, and Mihael Ankerst. 1995. Recursive pattern: A technique for visualizing very large amounts of data. In Proceedings of the IEEE Visualization Conference. 279–286. [20] Heidi Lam, Melanie Tory, and Tamara Munzner. 2018. Bridging from Goals to Tasks with Design Study Analysis Reports. IEEE Transactions on Visualization and Computer Graphics 24 (2018), 435–445. [21] Alexander Maier, Tim Tack, and Oliver Niggemann. 2012. Visual Anomaly De- tection in Production Plants. In Proceedings of the 9th International Conference on Informatics in Control, Automation and Robotics. SciTePress, 67–75. [22] Silvia Miksch and Wolfgang Aigner. 2014. A matter of time: Applying a data–users–tasks design triangle to visual analytics of time-oriented data. Computers & Graphics 38 (2014), 286 – 290. [23] Marco AF Pimentel, David A Clifton, Lei Clifton, and Lionel Tarassenko. 2014. A review of novelty detection. Signal Processing 99 (2014), 215–249. [24] Plotly Technologies Inc. 2015. Collaborative data science. Montreal, QC. https://plot.ly [25] Timo Ropinski, Steffen Oeltze, and Bernhard Preim. 2011. Survey of glyph- based visualization techniques for spatial multivariate medical data. Computers & Graphics 35 (2011), 392–401. [26] Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum Chul Kwon, Geoffrey Ellis, and Daniel A. Keim. 2014. Knowledge generation model for visual analytics. IEEE Transactions on Visualization and Computer Graphics 20 (2014), 1604–1613. [27] Michael Sedlmair, Miriah Meyer, and Tamara Munzner. 2012. Design Study Methodology: Reflections from the Trenches and the Stacks. IEEE Transactions