<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Explainable Anomaly Detection in Renewable Energy Power Plants by Learning Multidimensional Normality Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Carsten Kleiner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Applied Sciences &amp;Arts Hannover, Faculty IV</institution>
          ,
          <addr-line>Ricklinger Stadtweg 120, 30459 Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Renewable energy production is one of the strongest rising markets and further extreme growth can be anticipated due to desire of increased sustainability in many parts of the world. With the rising adoption of renewable power production, such facilities are increasingly attractive targets for cyber attacks. At the same time higher requirements on a reliable production are raised. In this paper we propose a concept that improves monitoring of renewable power plants by detecting anomalous behavior. The system does not only detect an anomaly, it also provides reasoning for the anomaly based on a specific mathematical model of the expected behavior by giving detailed information about various influential factors causing the alert. The set of influential factors can be configured into the system before learning normal behaviour. The concept is based on multidimensional analysis and has been implemented and successfully evaluated on actual data from diferent providers of wind power plants.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Anomaly detection</kwd>
        <kwd>Attack detection</kwd>
        <kwd>Resiliency</kwd>
        <kwd>Multidimensional analysis</kwd>
        <kwd>Wind power plant</kwd>
        <kwd>Normality model</kwd>
        <kwd>Explainable anomaly detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Motivation</title>
      <p>Published in the Proceedings of the Workshops of the EDBT/ICDT 2024
Joint Conference (March 25-28, 2024), Paestum, Italy
$ ckleiner@acm.org (C. Kleiner)
0000-0001-9497-0312 (C. Kleiner)</p>
      <p>© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License ing the system itself as well as its application scope will
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) be presented in section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
    </sec>
    <sec id="sec-3">
      <title>3. Concept</title>
      <p>Several papers in the context of anomaly detection for 3.1. Requirements and Context
renewable energy systems can be found in the literature.</p>
      <p>
        In a more generalized context, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] describes a learning Based on the research project SecDER1 which aims to
approach similar to the one in this paper even for any increase the resilience of renewable virtual and physical
type of IoT system. Whereas this approach could also be power plants, the requirements for an anomaly detection
applied to renewable power plants, it is not clear which system have been identified as follows:
part of the learning can be carried out in an automated Reason agnostic Both anomalies originating from
fashion. Similarly, results do not provide explanations known and unknown attacks as well as non-attack based
for anomalies. A focus on attacks, more specifically intru- anomalies shall be detected, ideally based on a single
sion detection, is described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, the approach detection system.
is not extensible to outage detection and only provides Explainable alerts The identified anomalies should
non-explainable alert messages. More specifically for be used to raise alerts that can be handled by human
power plants, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] uses many very general input param- domain experts. In order to simplify and substantiate
eters. However, this approach also does not provide ex- the decisions by the experts explainable alerts should be
plainable anomalies as results. provided, detailing the reason and context why the alert
      </p>
      <p>
        Other interesting wind power specific concepts include has been issued.
[
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. However, these approaches also do not provide Adaptability The concept shall be usable for diferent
explainable results. The first, in addition, requires a semi- types of wind power plants as well as diferent types of
supervised learning approach which is not feasible for renewable power plants in general. The learned
normalpreviously unknown attack types. Also, annotated train- ity models can be specific for each plant, however, the
ing data is often times not available. The second approach concept to learn the model should be generic.
focuses on system failure detection rather than attacks. General normality model While a single set of
nor
      </p>
      <p>
        On the other hand, [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] focuses on attacks and is specific mality models for all plants is not a goal, it is preferable,
for wind power plants. It is not extensible to other types if normality models can be learned for groups of similar
of energy sources and the degree of explainability of the plants. This way the model becomes more stable, and the
results is not obvious. Papers [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ] also only focus on number of extensive learning processes can be reduced.
specific attacks for wind power plants and thus do not Continuous learning and adjustment The system
achieve the general detection capabilities of our concept. should be capable of adjusting the learned system
beThe latter is concerned with false data injection attacks haviour continuously, thus improving the quality of the
which are also the focus of several other publications. normality models over time. Thus can also update the
Moreover, [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] provides a good overview of the security models in cases of concept drift over time.
challenges from attacks that have to be considered, but The system described in the following part of the paper
it does not present a comprehensive solution. will satisfy all of these requirements. On the other hand,
      </p>
      <p>
        Finally, there are also papers with a pretty similar con- there are also limitations of the approach that have been
cept to ours, but with diferent detection approaches, such accepted in order to keep the complexity manageable. In
as Markov chains in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and a more complex detection particular, detection is only considered up to explainable
model in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. However in both cases, while the approach alert generation, alert handling itself is not in scope.
Hanis specific to wind power plants and an extensibility is not dling can be considered orthogonal as long as
explainabildocumented, the explainability of the generated alerts is ity of the generated alerts is secured. For alert handling,
uncertain. This is also true for [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] which also uses a cor- generic procedures and manual update concepts can be
relation based approach, yet it is only one-dimensional considered as an extension, see e. g. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] for an approach
and requires and includes many specific sensors, so that based on rule-based anomaly detection. Similarly, we
it is also tied to the domain of wind turbines only. Even only consider anomaly-based detection concepts, since
more specific to wind turbine gearboxes is [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The au- most attack patterns (and even some of the
non-attackthors do not limit their approach to attacks, also use a based outage patterns) are previously unknown, so
rulemultidimensional analysis and generate at least partially or pattern-based detection will not be powerful enough
explainable alerts. However, it is not obvious whether to detect these. As attacks on virtual power plants are
and how this can be extended beyond gearboxes. executed by designated experts, advanced attacks will
      </p>
      <p>
        In summary, none of the discussed references is able be used which are unique to the specific target and thus
to provide the comprehensive features of our approach typically not previously known.
(cover attacks and outages, generate explainable alerts,
capable of detecting unknown attacks and useable for
diferent types of power generation).
3.2. Multidimensional Normality Models The goal of the learning process by looking at
histor(MNM) ical data is to compute a statistical description of the
metric attribute for each cell of the cube. This is done by
The basic concept for anomaly detection is learning mul- assuming a normal distribution for the metric readings
tidimensional normality models (MNM) based on his- in each cell and approximating that normal distribution
toric data of the power plant (or a set of similar power by estimating mean and standard deviation for the
metplants) and then assessing the deviation from this MNM ric attribute based on learning from historical data. For
for current readings of a logical record of the plant. The current readings the anomaly score is computed as
difconcept called cellwise estimator (CE) of the MNM has ference to the mean of each relevant cell as number of
already been described in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] in detail; thus, we will standard deviations. The higher this factor, the more
only present a high level description here. Originating likely the current reading is an outlier. As known from
from online analytical processing (OLAP) cubes, the idea statistics a factor of 3 is a natural choice as a threshold
is to describe normal behaviour of certain metrics (such to generate an alert. As will be seen in section 4, solely
as power production in a windmill) based on several or- looking at this factor as an anomaly measure is not
sufthogonal dimensions (such as weather conditions, plant ifcient, though, to properly assess the importance of an
sensor readings and others). The reason for this multidi- alert.
mensional treatment is that measurements of the metrics In summary, each cell’s normality model in our
conmay be within a permissible range when looking at them cept consists of an estimation of normal distributions
globally, whereas they may be an anomaly, when consid- (with mean and standard deviation each) of one or more
ering the specific context in more detail. The context is measurements per cube cell over a timeslice. Cube cells
described by the dimensions which are used in learning are defined by combinations of discrete values of relevant
the MNMs. Conversely, potentially abnormal measure- dimensions, with wildcards allowed for cells with
irrelements on the global level may actually be normal when vant values in a dimension. The anomaly score is then
looking at their specific context. Thus, it is important computed based on the number of standard deviations
to be able to base a decision whether a logical record that any current reading of a measure deviates from the
constitutes an anomaly on both global as well as contex- expected mean. Alerts are typically only raised for cube
tual, i. e. dimensional, information. To account for these cells with anomaly scores higher than a threshold of 3.
challenges a specific normality model is learned for each In addition to the anomaly score the computed normality
of the cube cells, i. e. every contextual situation. model as distribution estimation is also provided with the
      </p>
      <p>
        Unfortunately, the higher the number of dimensions alert along with information about the cell’s dimensional
and the number of values within a dimension, the larger values that caused the alert. This combination of
inforthe number of combinations to consider becomes. Since mation (metric measurement, anomaly score, contextual
the growth is exponential, these numbers have to be values, normality model) comprises the explanation for
limited. In addition the concept of iceberg cubes ([
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]) the human expert. Thus, an informed decision about
known from the OLAP domain can also be used to restrict proper reaction to the alert is facilitated.
the number of cubes to consider to relevant ones.
      </p>
      <p>In order to deal with continuous data streams as needed
for monitoring a power plant, the cubes are computed 3.3. Application of MNM to Wind Power
per timeslice with a configurable timeslice length. The Plants
metric attribute whose normal behavior is to be learned
is aggregated by some configurable aggregation function
over all readings within a timeslice. For the domain of
wind power plants for instance, the power production
output of a mill is a logical choice as a metric with
multiple readings being aggregated by using the average over
a timeslice. Typical dimensions for this metric can be
wind speed, wind direction, rotor position and outside
temperature. Since the dimensions are used to form an
OLAP-like cube, all dimensions must be of discrete types.</p>
      <p>Thus, continuous readings such as wind speed and
temperature need to be assigned to a set of classes in order
to be used as dimensions. As known from OLAP rollups,
there is also a symbolic value of * in each dimension
that aggregates all classes in that dimension and thus
provides a cube cell where the class is irrelevant.</p>
      <p>In order to apply our concept as explained in section 3.2
to renewable energy plants in general and wind power
plants in particular, we have to define the metrics with
aggregation functions for which normality models shall
be learned as well as the discrete influential dimensions
that might influence the metrics and be important for
assessing an alert. Candidates for choosing the metrics
are any elements of a monitoring reading that can be
used to describe the operational behaviour of a
windmill. The assumption is that attacks or outages will lead
to unexpected behavior in this metric. Primarily, this
is the efective electrical power production of the mill
computed as an average over a timeslice. For consistency
checks the number of measurement readings per
timeslice can also be used as a metric. Alternative options that
have not been evaluated in the experiments described in
section 4 could be the positions of the pod or the blades
of the windmill or other operational features.</p>
      <p>There are much more options for choosing the
dimensions than the metrics. In the evaluation in section 4
we have experimented with diferent choices, but there
are actually many more. Obvious dimensions include
wind speed, wind direction, pod position, air
temperature, air pressure. More possible options include power
factor, pitch angles of each blade, angle between pod and
wind direction and anemometer readings. The choice
of discretization of each of these factors (cf. 3.2) can be
considered another hyperparameter of the application.</p>
      <p>Specific choices for the dimensions and discretizations
for the experiments will be explained in section 4, but it
has to be pointed out that those are only initial selections
and much more experiments will have to be carried out
in the future to optimize the approach, cf. section 5.2.</p>
      <sec id="sec-3-1">
        <title>4.1. Validation of Concept</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <sec id="sec-4-1">
        <title>As an initial validation we used the data from 2020 of</title>
        <p>the first dataset as training set and the readings from
In order to evaluate the capabilities of the concept in 2021 for testing. We chose the average electrical power
detail, we used historical data from actual wind power production over timeslices of 4 hours as primary metric.
plants that are operated by project partners in the SecDER We experimented with some attributes as dimensions,
project. We had two diferent datasets, one from each the results in this subsection have been achieved with
operator. Data did not contain any known attacks, yet wind speed, wind direction and diference between
gonsome anomalies due to maintenance or unusual weather dola angle and wind direction. The continuous values
conditions. in these dimensions have been linearly assigned to 9, 12</p>
        <p>The first dataset consists of operational log data from and 5 classes, respectively. The number of classes of the
a single wind mill over the time range from January 2020 ifrst two features has been determined heuristically by
to August 2021 at a sampling rate of 15 minutes. Each log assigning equally sized intervals of the total range of
reading consists of 22 attributes in total, one of which is values to classes. For the third feature where original
the timestamp and the others can be used as metrics or data had a strongly non-linear distribution we decided
dimensions as will be explained in section 4.1. to use fewer classes to primarily account for major and</p>
        <p>The second dataset provides operational log data from medium outliers in each of the two directions and have
9 diferent wind parks, comprising 42 windmills in total at most data in the no diference class.
a sampling rate of 5 minutes. Data provides 30 attributes Figure 1 shows the test results for the global cell, i. e. no
per reading and readings were available for the year 2020. ifxed value in any of the dimensions. As we can see, there</p>
        <p>In both cases, a first part of the data has been used for are only few significant anomaly scores, primarily those
training and the remainder for testing. In the sequel, re- on January 20th, March 11th and March 29th. At this
sults will be presented based on output from a specifically general level (no fixed dimensional values), this behavior
developed GUI tool. In the figures the testing period will can be expected as the threshold for raising an alert is
be used horizontally to display the results for individual around 1900 kW which is already pretty close to the 2400
test instances. Each timeslice’s reading can be considered kW nominal power of the mill. However, the first two of
a test case. The graph shows the results for a specific those scores will not be reported by an alert as all subcells
cell of our cube, as selected from diferent dimensions, into the wind speed direction do not have an anomalous
values and combinations at the top. Within a figure the score. This means that the power production seemed
red curve shows the computed metric value (scale on unusually high from a global point of view (which is
left) whereas the blue curve shows the anomaly score information that could have been observed without our
(i. e. the number of standard deviations that the value is approach but would have raised a false positive), yet in
from the mean in this particular cell), scale on the right. reality it is simply explainable by the rather high wind
Typically, scores above 3 can be considered anomalous. speed on those days. For the remaining high anomaly
In addition, a yellow line displays the learned mean value score the dimensional analysis shows reduced anomaly
for the metric for this cell and green and lightblue lines scores the further detailed the cells become, yet it remains
show mean +/- 3 standard deviations. above 3, thus raising an alert. Looking at the data in</p>
        <sec id="sec-4-1-1">
          <title>4.3. Evaluation against Known Outages</title>
          <p>detail in the evaluation, this score can be considered a
false positive. The reason is that this specific context
situation had not been observed in the whole training
period. Such errors can be remedied by increasing the
training data set.</p>
          <p>Even more interesting is the analysis looking into some
of the dimensions, as the learned normality behavior is Figure 4: Efective Power and Anomaly Scores (Plant group,
much more specific in those cases as seen in figure 2. In dimensionally restricted view, speed class 6, direction class 8)
that figure we have focused the display on the wind speed
class 2 (pretty low speed) and the wind direction class
2. The figure shows that the learned model with mean parks as well as specifc wind speed and wind direction
around 140 kW and 80 kW standard deviation is very all showing anomalouss scores in one alert as those are
specific. Still, the only remaining alert with an anomaly all dependent cells in the cube. This shows that the score
score of 3.1 shows up at April 11th. This could be a false is indeed an anomaly for these mills (cf. figure 4) and
positive due to a too specific cell model or a true alert should thus be reported as an anomaly alert. This can
due to a malfunction with too high generated power. A be considered a true positive that is recognized by the
human operator seeing the alert would be able to classify system. It can be further explained to the human expert
this alert based on his domain knowledge. Due to space by providing the specific wind park, speed and direction
constraints we only present these exemplary results here. that causes the alert to be raised.</p>
          <p>In general, the increased size of the training data leads
4.2. Common Model for Plant Groups to more precisely learned models in the cells. This
potentially increases the number of false positives, since
anomaly scores are more likely with smaller standard
deviation. However, by judging an anomaly score in
combination with the standard deviation of its cell, most of
the false positives can be identified easily and thus do not
lead to raising alerts. On the other hand the benefit of
the more precise models is that false negatives are much
less likely in that case.</p>
          <p>Also, only precise cell models facilitate discovery of
anomalies in cases with unusual low power production
particularly relevant in case of attacks. This is due to the
fact that low production is only observed as an anomaly
if the learned mean - 3 standard deviations is above 0 kW.</p>
          <p>This can only be achieved with rather precise cell models
which need large training datasets.</p>
          <p>For the second validation data from the set of windparks
has been used. Here, January to August 2020 has been
used as training data and September to December 2020
for testing. Metrics and dimensions shown are identical
to the ones in the previous subsection for comparability
purposes. In addition, the specific wind mill has also
been used as another dimension in order to be able to
analyze the outcome per mill and over all mills together.</p>
          <p>Data from 17 of the mills with identical nominal power
production of 2300 kW have been used.</p>
          <p>Figure 3 again shows the overall view of the scores
with no fixed dimensional values. We can see that the
learned normality model is much more specific than the
one in figure 1 due to the extended training set (standard
deviation around 200 kW as opposed to 500 kW).</p>
          <p>Two cases with higher anomaly scores can be
identiifed, namely Nov 2nd and Nov 19th/20th. The first of
those shows a similar behavior as already noted in the
previous subsection, i. e. an anomaly score that does not
show up in any of the dimensionally restricted models
and thus, it would not be reported as alert. The latter
anomaly score would be tied to two of the four
windThe evaluations in the previous subsections were only
able to show that anomalous behavior can be detected
in principle, since the data did not contain any known
attacks or outages of the power plants. In order to get a
qualitative impression of how well the detected
anomalies correspond with actual unusual behavior, we evalu- PMS issue
ated the concept against data from a single windmill that
was available over a 2.5 years time frame. In addition, CE anomaly alert false true
for this plant information from the plant management
system (PMS) was available that listed all known and Table 1
recorded system problems during that time. Confusion matrix for outage anomaly detection (at least 40</p>
          <p>It should be noted that this evaluation is not well suited minute outage per timeslice considered anomalous)
for a thorough quantitative analysis of the algorithm
since the dataset only provides information about events
afecting the operation of the mill that were known to on the other hand it is questionable whether a full
timesthe PMS. Thus, since no attacks are known there are lice shall be considered anomalous just based on a single
no attack labels and thus no evaluation against attack event. For the following evaluation we used thresholds of
detection is possible. Similarly, anomalous situations 40 and 5 minutes within a 4 hour timeslice as a condition
due to an unusual behavior of the mill unknown to the for an anomalous timeslice. Note that an anomaly due to
PMS are not labeled as anomalous in the ground truth. an outage is usually rarely a very short incident.
Thus we can expect some (seemingly) false positives for Another aspect is the management of missing
readthe anomalous situations not recorded in the PMS and ings from the windmill which is often times caused by
thus labeled as normal. This will lead to a rather low anomalous operation. If no data readings are present for
precision when comparing our anomaly messages with a whole timeslice the CE algorithm will not detect an
the events recorded in the plant management system as anomaly for the power production, since missing data
ground truth. does not get any anomaly score. However, with the
sec</p>
          <p>In addition, the events in the PMS record any unusual ond metric (number of readings per timeslice) we can
situation in the windmill regardless of their impact on easily detect timeslices where no power readings are
the actual power production. Since we consider output present and thus report them as an anomaly as well.
Fipower production as our analysis target, it is obvious nally, a single anomalous cube cell per timeslice will
that we will not be able to detect events that have no or make the entire timeslice anomalous. This is one of the
minimal influence on the power production 2. Such situa- primary strengths of the algorithm to also detect only
tions will be recorded as seemingly false negatives in the specific anomalies within a large set of non-anomalously
comparison, impacting the recall negatively. However, seeming other cells at the same time. The explaination
we do not anticipate too many of such messages so that of the anomaly for a timeslice will contain all anomalous
aiming for a high recall is still a desirable target. cube cells for that timeslice together with the additional</p>
          <p>Both efects mentioned previously will also impact data, so that the human expert can further examine the
other measures such as accuracy (to some degree) and incident.</p>
          <p>F1 score (to large degree). Still a good, albeit not perfect,
accuracy score is also a valid goal to target.
4.3.2. Exemplary results
4.3.1. Evaluation Setup
For this evaluation we used windmill data from 2 years
as training set for our algorithm and data from the
remaining 0.5 years as a test set. We used an algorithm
configuration similar to the one in section 4.1. We had to
clean training data by removing the readings for times
which had been recorded in the PMS as anomalous in
order to only learn normal behavior of the system.</p>
          <p>Since the events recorded in the PMS used timestamps
with 5 minute diference, we first need to align the time
resolution, i. e. define how many anomalous events within
a 4 hour timeslice make such a timeslice anomalous in
total. While it is desirable on one hand to even realize
anomalies that only occur at a single instance in time,</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>2From a practical point of view detecting such events with our algo</title>
        <p>rithm is not necessary, as these have only minimal impact on the
power production and are already known from the PMS and thus
do not require advanced detection.</p>
      </sec>
      <sec id="sec-4-3">
        <title>With the setup as described before and 40 minute anomaly</title>
        <p>threshold we achieved a recall of 0.72 and an accuracy
of 0.89 as the primary targets of the algorithm. The
precision was low at 0.27 as expected and explained above;
this makes an F1 score of 0.39. The matrix in table 1
summarizes the results.</p>
        <p>Again, the seemingly high number of false positives
is due to the fact that the CE detects anomalies that are
not part of the PMS failure ground truth, either because
they are attacks or because they did not lead to events
in the PMS. As another baseline an auto-encoder based
algorithm trying to detect only outages on the same data
set only achieved a 0.31 F1 score, mainly because of a
higher number of false negatives.</p>
        <p>If we reduce the threshold how many anomalous events
in the groud truth make a timeslice anomalous to a
single event (i. e. 5 minutes of the 4 hour timeslice), the
recall reduces somewhat to 0.60, however accuracy and
precision remain pretty much the same such that the F1
PMS issue
CE anomaly alert
score reduces to 0.37 (cf. table 2). This behavior is due
to an increased number of false negatives, which could
be expected as some minor issues in plant operation do
not necessarily cause anomalous power production. The
auto-encoder baseline increased its F1 score to 0.33 in
this case.</p>
        <p>A final evaluation shows that there is still potential in
the CE based algorithm by fine tuning the learned cell
models. Increasing the threshold anomaly score for alerts
to 4 standard deviations, we obtain the confusion matrix
in table 3. This increases the accuracy to 0.94 and
specifically the precision to 0.43. The recall is slightly reduced
to 0.68 for an overall F1 score of 0.53. This improvement
is primarily due to the reduced number of seemingly false
positives in situations where no outage is recorded in
the PMS. However, it remains unclear whether this is an
actual improvement in practice or not. It simply leads
to a reduction of detected anomaly candidates. Yet from
the data provided it is unknown where these situations
would actually belong to anomalous or regular behavior.</p>
        <p>In summary, the evaluation in this section has shown
that the algorithm introduced in chapter 3 is capable of
detecting unusual system behavior of a wind power plant
which had also been recorded in a PMS, particularly with
good accuracy and recall. Precision and thus F1 score are
somewhat lower which can be attributed to the algorithm
also detecting anomalous behavior that had not been
recorded in the PMS, e. g. because it was due to a specific
wind condition. This is exactly what the main advantage
of the CE algorithm is, namely also detecting anomalous
behavior in specific conditions which could be caused
by an attack. We have also shown optimizing some of
the hyper parameters of the approach (such as message
thresholds and timeslice aggregation) might improve the
detection quality further in addition to larger training
sets and more dimensions.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <sec id="sec-5-1">
        <title>5.1. Summary</title>
        <p>In this paper we have presented a concept and
implementation to detect anomalous behavior in renewable power
plants. The concept is based on learning normal
behavior of key performance figures such as efective power
production. The normal behavior is learned for many
specific situations which can be expressed as
multidimensional cells in an OLAP-like data cube. On one hand,
this reduces the number of false negatives by learning
very specific models for the individual cells
representing specific situations. On the other hand, the number
of false positives can still be kept low by using larger
training data sets. Also, assessing the specificity of the
learned model to put a mere anomaly score into context
and thus facilitate appropriate treatment before raising
alerts can be done by a human inspector and to some
degree even an automation such as in section 4.3. This is
an important advantage of the explainability achieved by
the learned behavior models for each cell. The concept
has been successfully evaluated on actual data from wind
power plants as shown in section 4 both in general and
also on a set of known outages as one possible reason for
anomalous behavior.</p>
        <p>In summary, the concept presented in this paper ofers
a promising approach to detect anomalous behaviour in
renewable power plants by learning specific models
according to a configurable set of dimensions reflecting
relevant circumstances for power production. The anomaly
scores based on learned mathematical models provide
traceable explanations for the detected anomalies which
may originate from attacks or regular operational issues.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Outlook</title>
        <p>While the evaluation presented in section 4 already showed
the usefulness of the concept, much more experiments
are needed to reveal its full potential. Much more
analysis with regard to identifying interesting and relevant
dimensions in the base data to be used for the cube is
required. Some promising dimensions such as temperature,
air pressure and power factor have not been included
yet. Moreover, using larger time ranges for the training
data will be one of the next steps to further verify the
positive impact of more precisely learned models. This
should also further reduce some issues detecting unusual
low power production due to normality models with too
large standard deviations that do not raise high enough
anomaly scores even for zero power production in certain
situations.</p>
        <p>Also, some experiments have shown that using a
normal distribution as foundation of estimating cell models
is not always appropriate. We saw several cases where
most metric training data lies around a rather small value
with a few high outliers. For such distributions a normal
distribution is not a good estimator. Instead,
alternative models should be used which will be added to our
implementation soon.</p>
        <p>Finally, we have currently only evaluated the concept
on wind power production. We have similar datasets
from photovoltaics which we plan to use for a second
evaluation. Metric will be similarly the efective power
production, but regarding dimensions there will have to
be an extensive evaluation which are most promising.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. . I. S.</given-names>
            <surname>Agency</surname>
          </string-name>
          ,
          <source>Russian Government Cyber Activity Targeting Energy and Other Critical Infrastructure Sectors</source>
          ,
          <year>2018</year>
          . URL: https://www.cisa.gov/ uscert/ncas/alerts/TA18-074A.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Onuchowska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Samtani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Jank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wolfram</surname>
          </string-name>
          ,
          <article-title>Machine learning for automated industrial iot attack detection: An eficiencycomplexity trade-of</article-title>
          ,
          <source>ACM Trans. Manage. Inf. Syst</source>
          .
          <volume>12</volume>
          (
          <year>2021</year>
          ). URL: https://doi.org/10.1145/3460822.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Junejo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Goh</surname>
          </string-name>
          ,
          <article-title>Behaviour-based attack detection and classification in cyber physical systems using machine learning</article-title>
          ,
          <source>in: Proc. of the 2nd ACM Int. Workshop on Cyber-Physical System Security, CPSS '16</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA,
          <year>2016</year>
          , p.
          <fpage>34</fpage>
          -
          <lpage>43</lpage>
          . URL: https://doi.org/10.1145/2899015.2899016.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <surname>X. Zhang,</surname>
          </string-name>
          <article-title>Wind turbine anomaly detection using normal behavior models based on scada data</article-title>
          ,
          <source>in: 2014 ICHVE International Conference on High Voltage Engineering and Application</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . URL: https: //doi.org/10.1109/ICHVE.
          <year>2014</year>
          .
          <volume>7035504</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Min</surname>
          </string-name>
          , et al.,
          <article-title>A semi-supervised anomaly detection method for wind farm power data preprocessing</article-title>
          ,
          <source>in: 2017 IEEE Power &amp; Energy Society General Meeting</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . URL: https: //doi.org/10.1109/PESGM.
          <year>2017</year>
          .
          <volume>8273883</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>McKinnon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carroll</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McDonald</surname>
          </string-name>
          , et al.,
          <article-title>Investigation of anomaly detection technique for wind turbine pitch systems</article-title>
          ,
          <source>in: The 9th Renewable Power Generation Conference</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>277</fpage>
          -
          <lpage>282</lpage>
          . URL: https://doi.org/10.1049/icp.
          <year>2021</year>
          .
          <volume>1401</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Badihi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jadidi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , N. Lu,
          <article-title>Smart cyber-attack diagnosis and mitigation in a wind farm network operator</article-title>
          ,
          <source>IEEE Transactions on Industrial Informatics</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . URL: https://doi. org/10.1109/TII.
          <year>2022</year>
          .
          <volume>3228686</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Datta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Rahman</surname>
          </string-name>
          ,
          <article-title>Cyber threat analysis framework for the wind energy based power system</article-title>
          ,
          <source>in: Proc. of the 2017 Workshop on CyberPhysical Systems Security and PrivaCy</source>
          , CPS '17,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA,
          <year>2017</year>
          , p.
          <fpage>81</fpage>
          -
          <lpage>92</lpage>
          . URL: https://doi.org/10.1145/3140241.3140247.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Guibene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Messai</surname>
          </string-name>
          , et al.,
          <article-title>A data mining-based intrusion detection system for cyber physical power systems</article-title>
          ,
          <source>in: Proc. of the 18th ACM Int. Symposium on QoS and Security for Wireless and Mobile Networks</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA,
          <year>2022</year>
          , p.
          <fpage>55</fpage>
          -
          <lpage>62</lpage>
          . URL: https://doi.org/10.1145/3551661.3561367.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jindal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Marnerides</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Scott</surname>
          </string-name>
          , D. Hutchison,
          <article-title>Identifying security challenges in renewable energy systems: A wind turbine case study</article-title>
          ,
          <source>in: Proc. of the 10th ACM Int. Conf. on Future Energy Systems</source>
          , ACM, New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>370</fpage>
          -
          <lpage>372</lpage>
          . URL: https://doi.org/10.1145/3307772.3330154.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.-S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>McMillan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rimoni</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Zhang,</surname>
          </string-name>
          <article-title>Analyzing wind speed data through markov chain based profiling and clustering</article-title>
          ,
          <source>in: Proc. of the 2nd Workshop on Machine Learning for Sensory Data Analysis, MLSDA'14</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA,
          <year>2014</year>
          , p.
          <fpage>67</fpage>
          -
          <lpage>73</lpage>
          . URL: https: //doi.org/10.1145/2689746.2689756.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Anomaly detection of wind turbine generator based on temporal information</article-title>
          ,
          <source>in: Proceedings of the 2019 7th Int. Conference on Information Technology: IoT and Smart City</source>
          ,
          <source>ICIT '19</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA,
          <year>2020</year>
          , p.
          <fpage>477</fpage>
          -
          <lpage>482</lpage>
          . URL: https://doi.org/10.1145/3377170.3377271.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.-W.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-G.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-T.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>An approach for utilizing correlation among sensors for unsupervised anomaly detection of wind turbine system</article-title>
          ,
          <source>in: 2021 Int. Conf. on Information and Communication Tech. Convergence</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>104</fpage>
          -
          <lpage>109</lpage>
          . URL: https://doi.org/10.1109/ICTC52510.
          <year>2021</year>
          .
          <volume>9621198</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jing</surname>
          </string-name>
          , M. Han,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>F. Zhang,</surname>
          </string-name>
          <article-title>Condition monitoring of wind turbine gearbox using multidimensional hybrid outlier detection</article-title>
          ,
          <source>in: Int. Conf. on Smart-Green Technology in Electrical and Inf. Systems</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>112</fpage>
          -
          <lpage>117</lpage>
          . URL: https: //doi.org/10.1109/ICSGTEIS53426.
          <year>2021</year>
          .
          <volume>9650387</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Renners</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Heine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kleiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Dreo-Rodosek</surname>
          </string-name>
          ,
          <article-title>Concept and practical evaluation for adaptive and intelligible prioritization for network security incidents</article-title>
          ,
          <source>International Journal on Cyber Situational Awareness</source>
          <volume>4</volume>
          (
          <year>2019</year>
          )
          <fpage>99</fpage>
          -
          <lpage>127</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Heine</surname>
          </string-name>
          ,
          <article-title>Outlier detection in data streams using OLAP cubes</article-title>
          ,
          <source>in: New Trends in Databases and Information Systems - ADBIS Short Papers and Workshops</source>
          , Nicosia, Cyprus, volume
          <volume>767</volume>
          of Communications in Computer and Information Science, Springer,
          <year>2017</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>36</lpage>
          . URL: https://doi.org/10. 1007/978-3-
          <fpage>319</fpage>
          -67162-
          <issue>8</issue>
          _
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Pei, G. Dong,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Eficient computation of iceberg cubes with complex measures</article-title>
          ,
          <source>SIGMOD Rec</source>
          .
          <volume>30</volume>
          (
          <year>2001</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          . URL: https://doi.org/ 10.1145/376284.375664.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>