=Paper= {{Paper |id=Vol-2139/189-196 |storemode=property |title=Predicting the Probability of Exceeding Critical System Thresholds |pdfUrl=https://ceur-ws.org/Vol-2139/189-196.pdf |volume=Vol-2139 |authors=Peter Krammer,Marcel Kvassay,Ladislav Hluchý |dblpUrl=https://dblp.org/rec/conf/ukrprog/KrammerKH18 }} ==Predicting the Probability of Exceeding Critical System Thresholds== https://ceur-ws.org/Vol-2139/189-196.pdf

UDC 004.4'24

PREDICTING THE PROBABILITY OF EXCEEDING CRITICAL
SYSTEM THRESHOLDS

Peter Krammer, Marcel Kvassay, Ladislav Hluchý
In this paper we show how regression modelling can be combined with a special kind of data transformation technique that improves model
precision and produces several “preliminary” estimates of the target value. These preliminary estimates can be used for interval estimates of
the target value as well as for predicting the probability that it has or will exceed arbitrary predefined thresholds. Our approach can be
combined with various regression models and applied in many domains that need to estimate the probability of system malfunctions or other
hazardous states brought about by system variables exceeding critical safety thresholds. We rigorously derive the formulas for the probability
of crossing an upper bound and a lower bound both separately (one-sided intervals) and together (a two-sided interval), and verify the
approach experimentally on a real dataset from the electric power industry.
Key words: regression, data transformation, interval estimation, probability, statistical modelling.
У цій статті показано, як регресійне моделювання можна комбінувати зі спеціальним видом перетворення даних, яке покращує
точність моделі і дає кілька «попередніх» оцінок цільового значення. Ці попередні оцінки можна використовувати для інтервальних
оцінок цільового значення, а також для прогнозування ймовірності того, що воно прийме або перевищить довільні попередньо
визначені порогові значення. Наш підхід можна комбінувати з різними регресійними моделями і застосовувати в багатьох
областях, які повинні оцінювати вірогідність збоїв системи або інших небезпечних станів, викликаних системними змінними, що
перевищують критичні пороги безпеки. Ми строго виводимо формули для ймовірності перетину верхньої та нижньої межі як
окремо (односторонні інтервали), так і разом (двосторонній інтервал) і перевіряємо наш підхід експериментально на реальному
наборі даних з електроенергетики.
Ключові слова: регресія, перетворення даних, оцінка інтервалів, ймовірність, статистичне моделювання.
В этой статье показано, как регрессионное моделирование можно комбинировать со специальным видом преобразования данных,
которое улучшает точность модели и дает несколько «предварительных» оценок целевого значения. Эти предварительные оценки
могут использоваться для интервальных оценок целевого значения, а также для прогнозирования вероятности того, что оно примет
или превысит произвольные предопределенные пороговые значения. Наш подход можно комбинировать с различными
регрессионными моделями и применять во многих областях, которые должны оценивать вероятность сбоев системы или других
опасных состояний, вызванных системными переменными, превышающими критические пороги безопасности. Мы строго выводим
формулы для вероятности пересечения верхней и нижней границы как отдельно (односторонние интервалы), так и вместе
(двухсторонний интервал), и проверяем наш подход экспериментально на реальном наборе данных из электроэнергетики.
Ключевые слова: регрессия, преобразование данных, оценка интервалов, вероятность, статистическое моделирование.

Introduction
Our lives are directly or indirectly influenced by information technologies in a variety of ways, and data
collection and analysis, as well as modelling and prediction of important variables, are now routinely performed in
domains as diverse as electric power industry, hydrology, public health or banking. In all these areas there exists a
need to increase the precision and reliability of existing models. More precise and robust models can improve not
only the productivity of existing systems but also their security and safety, thus helping to save human lives in cases
of emergency. Modelling tasks in these domains often include the prediction of error, risk and various hazardous
states. This paper focuses on the estimation of probability that a given system variable has or will exceed predefined
safety thresholds and, as a result, the operation of the system could be severely compromised. We also touch upon
the problem of improving model precision since our regression model is used in conjunction with a special data
transformation technique formulated in [1–3] for that purpose.
This data transformation technique was inspired by ensemble learning methods of machine learning. In
machine learning, model accuracy and robustness are typically enhanced through various forms of ensemble learning
[1–5], such as Boosting, Bagging, Dagging, Stacking, Additive regression, etc. These methods exploit techniques
like aggregation of different types of models, multiple training phases, submodels voting and weighting of data
records. First ensemble methods were intended primarily for classification; those for regression appeared later [6].
Some of the more recent ones include evolutionary ensembles [7], multiple network fusion [8] and hybrid ensembles
[9]. Several studies, e.g. [4, 10] analyze the suitability of ensemble methods with respect to the type of data or
properties of submodels.
Let us consider a homogeneous data table whose rows represent measurement records and whose columns
represent their attributes. Each attribute describes the quantity or quality of some physical variable (pressure,
throughput, voltage, etc.) and each row comprises attributes measured at the same time or place. Let us further
assume that all the attributes (the input ones as well as the target one) are numerical and continuous, i.e. real-valued.
Moreover, the input attributes have already been normalized and selected for their relevance with respect to the
target one. The regression task then consists in modelling the target attribute as a suitable function of the input ones.
It is typically approached by training various types of regression models on the available data.
In our previous work [3] we proposed a data transformation technique enhancing the precision of machine
learning models and predictors for real-valued target variables. Its application to several realistic data sets

189
considerably improved prediction accuracy. Moreover, it could be easily combined with various types of regression
models. These benefits, however, had to be “paid for” in terms of longer calculations when compared to regression
on the original untransformed data. One positive side-effect of this technique is the generation of several
“preliminary” target estimates in an interim step. The final target estimate is then calculated as their simple
arithmetic mean although, in principle, we could use a weighted mean too. While the standard version of our
technique uses only the final estimate and discards the preliminary ones, these preliminary estimates contain further
valuable information besides the mean, which can be extracted through more advanced statistical principles and
formulas, as we explain further below.

1. A Brief Outline of the Data Transformation Technique
The basic data transformation was already published in [3], [11] which demonstrated both its advantages and
disadvantages on several synthetic and real datasets. We therefore sketch only its main idea here and do not elaborate
on its properties, limitations or parameter settings. The essence of the transformation consists in the creation of all
possible pairs of records from the original dataset except the identical pairs (i.e. we do not pair any record with
itself). In this way we transform the original dataset (shown schematically in Tab. 1) into a new one shown in Tab. 2.

Table 1. Structure of the original data set

Record ID Input Attribute Z Input Attribute Y Target Attribute O

{1} z1 y1 o1
{2} z2 y2 o2
{3} z3 y3 o3
{4} z4 y4 o4

Table 2. Structure of the transformed data set

ID of Used Records Input Attribute Z Input Attribute Y ∆Z ∆Y ∆O

{1}, {2} z1 y1 z1 - z2 y1 - y2 o1 - o2
{1}, {3} z1 y1 x1 - z3 y1 - y3 o1 - o3
{1}, {4} z1 y1 z1 - z4 y1 - y4 o1 - o4
{2}, {1} z2 y2 z2 - z1 y2 - y1 o2 - o1
... ... ... ... ... ...
{4}, {3} z4 y4 z4 - z3 y4 - y3 o4 - o3

Each of N records in the original dataset is paired with the remaining N-1 records, and all these pairs are
added to the transformed dataset. The size of the transformed dataset is then N2–N (note that the pairing is not
symmetrical, because [{i}, {k}] does not equal [{k}, {i}]). The number of input attributes in the transformed dataset
doubles, because for each original one there is now added the difference in its value between the two paired records.
Moreover, the difference between the two target values becomes the new target attribute for prediction (the last
column ∆O in Tab. 2). This reification of attribute differences into new standalone attributes emphasizes their
similarity or dissimilarity and, in effect, highlights the resemblance (or lack of it) between the original data records.
Model training is thus more sensitive to attribute differences compared to classical training without transformation,
in which attribute dynamics are not emphasized and attribute values from different records have no means to “meet”
and influence each other. In the next step, a chosen regression model is trained on the transformed data. Since the
regression model trained on the transformed data predicts the difference in the target value, we need to apply a
correction (inverse transformation) in order to arrive at the prediction of the original target value. Essentially, each
transformed pair involving a given original data record can be used to convert (or “correct”) the prediction of the
target difference into that of its original target value. Because our transformed training set contains several such pairs

190
for each original data record, we can produce several “preliminary” estimates of its target value. We describe this
process in more detail in [3, 11].

2. Interval Estimation
As mentioned above, one advantage of our data transformation is the production of several “preliminary”
estimates of the target value. Statistical principles enable us to extract from them further important information, e.g.
an interval estimate or the probability that a predefined threshold has been or will be exceeded.
An interval estimate consists of the lower and the upper bounds within which the target value should stay with
a given probability. It represents the uncertainty of our calculations, because even a well-trained model’s predictions
will have some margin of error. In some cases the predictions will be close to the real value, in others they may be
quite far. A narrower interval estimate for a given probability signals lower uncertainty and vice versa.
In our context the interval estimates should not be calculated using the formula for confidence intervals [12]
because these are meant for population parameters, such as the mean or the variance, whereas we now need to bound
a certain proportion of the population itself, i.e. our individual data points or measurement records. For this purpose,
tolerance region [13] defined by formula (1) is an appropriate method:

. (1)

In our context, the meaning of variables and functions in formula (1) is as follows. (Please note that we also
explain here some additional variables used in subsequent derivations further below.) Function symbols tcdf() and tinv()
follow the convention used in Matlab [14].
X – a variable whose values are the “preliminary” estimates of the target value
– the bounds of the tolerance region
– the number of available “preliminary” estimates of the target value
– an average of the M “preliminary” estimates
– standard deviation of the M “preliminary” estimates
– a significance level, which determines the probability that the tolerance interval will encompass the target value.
For a 95 % interval,
= Pr(X < XTHR) – probability that X takes on a smaller value than the threshold XTHR
tcdf (Z, M-1) – Student's t cumulative distribution function1 for value Z, with M-1 degrees of freedom.
tinv (p, M-1) – inverse of Student's t cumulative distribution function2, with M-1 degrees of freedom, and probability p.
Tolerance regions calculated in this way provide information about the bounds for the value of X at a given
significance level α. This can be tested by repeated model training and interval calculation: the ratio of “successful”
cases (in which the tolerance region does encompass the actual value of X) should converge towards .
Nevertheless, practical application of this information is rather limited and many domain-oriented applications
tend towards the inverse task – the calculation of the probability that a certain predefined threshold level of X has been
or will be exceeded. We need to keep in mind that even when the prediction model’s final estimate of X (calculated as a
mean of several “preliminary” estimates) does not cross the threshold, this estimate is not error-free and so, in fact, X
may have crossed the threshold anyway. The calculation of the probability that X has exceeded the threshold therefore
needs to consider not only the distance between the threshold and the final estimate, but also the variance of the
“preliminary” estimates from which it was derived.

3. Derivation of Probability of Exceeding a Threshold
Formula (1) determines both ends of tolerance region at once. For our purposes a one-sided simplification
will be more helpful: formula (2) gives only the upper bound XTHR for a pre-specified probability defined as
= Pr(X < XTHR):

(2)

In the next step we isolate the function tinv() on the right-hand side by shifting all the other terms to the left-hand side:

1
https://www.mathworks.com/help/stats/tcdf.html
2
https://www.mathworks.com/help/stats/tinv.html

191
. (3)

We can then apply the function tcdf(), with degrees of freedom, to both sides:

. (4)

Because the function tinv() is the inverse of tcdf() and both share the same number of degrees of freedom (M-1), they
cancel each other and leave just the desired probability = Pr(X < XTHR) on the right-hand side:

. (5)

This result can be more conveniently expressed as formula (6):

. (6)

If we are interested in the lower bound, we can easily derive the formula from the probability of the
complementary event:

. (7)

Finally, by combining the two, we can derive the probability that the actual value of X is confined between a
lower bound XTHR1 and an upper bound XTHR2. This is expressed by formula (8):

(8)

In general, formulas (6), (7) and (8) enable us to calculate the probability that the actual value of some predicted
variable X is confined within some predefined region. Conversely, through formulas (1) and (2) we can calculate the
bounds for a given predefined probability or significance level.
In practical applications the threshold XTHR would typically represent some structural limit, the crossing of which
might result in system damage, compromised security or danger to human lives. The formulas derived above help us to
quantify the probability of such undesirable developments.
The proposed approach is primarily suited for
A.) Modelling and prediction of future values of X from its past and present values, e.g. in various early warning
systems;
B.) Modelling and prediction of some variable X whose direct measurement would be dangerous,
technologically demanding, or costly in terms of energy, time, money, etc. In these situations it is preferable to measure
other related variables, formulate and train on them a good regression model, and use that to predict X.
We therefore do not focus on one specific scenario but try to keep a more general perspective. Our goal is to
formulate an approach applicable in various situations that require modelling and prediction of one or more system
variables in combination with a warning system that guards a set of predefined system constraints.

4. Experiments
We tested our approach experimentally on a publicly available dataset called Energy Efficiency [6], which
contained 768 records and 9 numeric attributes including the target one. We deliberately chose this smaller dataset
because our data transformation (with 15-fold repetition of experiments in order to make them more representative),
took quite long to compute. For regression modelling we used a feedforward neural network of perceptrons with one
hidden layer and sigmoid activation function. Learning rate was set to 0.3 and max. training epochs to 500. Table 3
shows some prediction examples for this dataset as well as the corresponding threshold-crossing probabilities and 90 %
tolerance intervals.

192
Table 3. Sample predictions of X ( ) along with their corresponding threshold-crossing probabilities Pr(X < 8.0)
and Pr(X > 40.0) and 90 % tolerance intervals. The real value of X is given in the last column

Record 90%
sX Pr(X < 8.0) Pr(X > 40.0) XREAL
Number interval

5.6189
1 7.5901 1.1125 0.638447 0.0 6.04
9.5612
0.8266
2 8.9423 4.5804 0.421507 1.241341E-6 8.50
17.0580
8.2784
3 10.4178 1.2074 0.032779 5.551115E-16 10.64
12.5570
19.9708
4 22.3606 1.3488 1.415263E-9 4.545264E-11 23.75
24.7505
14.1326
5 22.3789 4.6541 0.003560 7.687263E-4 24.77
30.6251
34.3159
6 36.9891 1.5087 5.122037E-14 0.033197 36.45
39.6623
38.0776
7 39.7402 0.9383 1.519116E-18 0.394970 39.04
41.4028
37.9608
8 39.7640 1.0177 6.820442E-18 0.411707 39.83
41.5673
41.5394
9 43.1567 0.9128 1.334947E-19 0.998410 41.73
44.7739

We performed this particular experiment only once, because our goal was just to illustrate the process for a
few concrete target values. The table lists the final estimate calculated as an average of 20 “preliminary” estimates
produced by the neural network as well as the standard deviation sX of these preliminary estimates. It next shows the
threshold-crossing probabilities Pr(X < 8.0) and Pr(X > 40.0) calculated from (6) and (7), and the corresponding 90
% tolerance interval. The real value of X in the last column (XREAL) is shown just for verification – it was not
available to the regression models.
By comparing the individual rows in Table 3 we can see the effect of the changing average on the threshold-
crossing probabilities, which increase or decrease sharply as enters or leaves each guarded region. For example, as
the average changes from 36.9891 to 39.7402 between rows six and seven, the probability Pr(X > 40.0) increases
more than tenfold from 0.033197 to 0.394970. Similarly, as changes from 8.9423 to 10.4178 between rows two
and three, Pr(X < 8.0) sharply decreases from 0,421507 to 0,032779.
Another important fact can be seen in rows four and five – a surprisingly large influence of the standard
deviation sX on the threshold-crossing probabilities. Both rows share a very similar value of , yet their threshold-
crossing probabilities for both thresholds differ by several orders of magnitude. This is caused by the difference in
the deviation sX, since the higher the sX, the higher the probability of exceeding the threshold.
In order to cross-check the soundness of our calculations, we have substituted the bounds of our 90 %
tolerance intervals from the sixth column of the table into formula (8), expecting to get the same result (0.9) for each
row. The values that we obtained were indeed very close to 0.9 – if we denote the error as , our results were 0.9 +
, with the maximum absolute value of the error | MAX | < 2.6E-13, which we interpret as the confirmation of
sufficient precision in our calculations.
As can be seen from our derivations above, the task of predicting the probability of exceeding a pre-defined
threshold is inverse to that of estimating the bounds of an interval within which the target value should reside with a
given probability. Both exploit the same mathematical relationship between the bounds and the probability, but they
approach it from the opposite sides. This entitled us to verify the correctness of our technique by testing the validity
of our 90 % tolerance intervals, which is much easier to do than to verify the inverse task. Accordingly, we have

193
trained regression models on various subsets of our transformed data and repeatedly calculated the bounds for 90 %
tolerance intervals, each time also noting whether or not they did contain the actual value of X (which was known to
us but hidden from the regression models). We display these results in Table 4.
Table 4 lists averaged values for model precision, elapsed calculation time and the success rate of interval
estimates for each size of the training set (specified in the column NumRec). Each row in the table shows the values
averaged over 15 independent experiment runs with different random seeds. The seeds governed the inclusion of the
records in the training set as well as random initialization of neural networks generating the models.
The average precision of our trained regression models is expressed through Correlation Coefficients (column
CorrCoef) and Root Mean Squared Error (column RMSE). Column Time shows average time in seconds needed for
one modelling cycle, i.e. for training the model on NumRec randomly selected records in the training set and then
predicting the target value for the remaining records in the dataset. It should be noted that regression on the
transformed data takes considerably longer than traditional regression on the original untransformed dataset.
Column Intervals shows the total number of interval estimates performed and column Correct the number of
those that did encompass the real value XREAL. Column Ratio then shows their success rate, which is defined as the
ratio of Correct / Intervals. Since these estimates were meant to represent 90% tolerance intervals, the values in
column Ratio should be no less than 0.90. We can see that it is indeed so for all the rows except the last two, where
the small size of the training set (less than 100 records) negatively impacted the success rate as well as precision
(lowering the Correlation Coefficients and increasing the Root Mean Squared Errors).
Nevertheless, we consider the experimental evaluation a success, because for sufficiently representative
training sets (with more than 100 records) the success rate of interval estimates reached or crossed 90 %, as
expected. In fact, for training sets with 120 records or more the success rate consistently exceeded 94 %. We thus
feel entitled to conclude that the proposed approach can be practically deployed in various experimental scenarios. In
the future we plan to test our approach on other datasets and conduct an in-depth analysis of experimental results.
We also intend to investigate the lower bound for the training set size below which the interval estimates fail to reach
the expected success rate.

Table. 4. Dependence of model precision (CorrelCoef, RMSE), calculation time (Time) and the success rate of 90 %
tolerance interval estimates (Ratio) on the number of records in the training set (NumRec)

NumRec CorrelCoef RMSE Time Correct Intervals Ratio

260 0.999317 0.7364 67.3 7440 7620 0.9764
250 0.998838 0.7929 65.7 7572 7770 0.9745
240 0.999166 0.7501 60.8 7707 7920 0.9731
230 0.998847 0.7848 56.3 7886 8070 0.9772
220 0.998761 0.7919 51.7 7969 8220 0.9695
210 0.998387 0.8271 47.1 8065 8370 0.9636
200 0.998571 0.8058 42.8 8246 8520 0.9678
190 0.998135 0.8442 39.5 8392 8670 0.9679
180 0.997246 0.9321 35.0 8496 8820 0.9633
170 0.998015 0.8550 31.5 8615 8970 0.9604
160 0.997539 0.9104 28.4 8736 9120 0.9579
150 0.997075 0.9581 25.0 8759 9270 0.9449
140 0.997196 0.9439 22.0 8973 9420 0.9525
130 0.996817 0.9785 18.9 9079 9570 0.9487
120 0.994856 1.1462 16.2 9237 9720 0.9503
110 0.989257 1.3735 13.7 9143 9870 0.9263
100 0.989170 1.3626 11.2 9263 10020 0.9245

194
NumRec CorrelCoef RMSE Time Correct Intervals Ratio

90 0.974928 1.9311 9.6 9122 10170 0.8970
80 0.976517 1.9920 7.6 9103 10320 0.8821

Conclusions
In this paper we have presented a new approach exploiting a special data transformation for regression tasks. The
transformation improves model precision through the production and utilization of a number of “preliminary” estimates
of the target value. Moreover, it can be combined with different types of regression models. We have shown that this
technique is suitable for tolerance interval estimates as well as for predicting the probability of exceeding arbitrary pre-
defined thresholds. We have rigorously derived the formulas for the probability of crossing an upper bound and a lower
bound both separately (one-sided intervals) and together (a two-sided interval). Our experimental evaluation confirmed
a satisfactory performance of the proposed technique in terms of model precision and success rate for sufficiently large
training sets. The proposed approach can be applied in various domains for predicting the probability of hazardous
situations brought about by important system variables exceeding predefined safety thresholds.
In the future we plan to investigate in more detail and for various datasets how the minimum required size of the
training set depends on the total size of the dataset and on the required model precision.

This work was supported by projects: VEGA 2/0167/16 (2016 - 2019) and PROCESS EU H2020-777533 (2017-2020).

References
1. Hastie T., Tibshirani R., Friedman J. The elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition 2009,
Springer. P. 463–470, 605–622. http://web.stanford.edu/~hastie/Papers/ESLII.pdf
2. Ian H. Witten, Eibe Frank: Data Mining: Practical Machine Learning Tools and Techniques, Second Edition, Elsevier. P. 315–334, 414–418.
3. Peter Krammer, Marcel Kvassay, Ladislav Hluchý: Improved regression method with interval estimation. In ICNC-FSKD 2017: 2017 13th
international conference on natural computation, fuzzy systems and knowledge discovery. - Guilin, China: IEEE, 2017. P. 2402–2408.
4. Jain Anil K., Robert P. W. Duin, Jianchang Mao: Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 2000. P. 4–37.
5. Dietterich T.G. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and
Randomization, Machine Learning 2000, 40(2). P. 139–157.
6. UCI Machine Learning Repository: Energy Efficiency, Center for Machine Learning and Intelligent Systems,
https://archive.ics.uci.edu/ml/datasets/Energy+efficiency
7. K. Krishnamoorthy: Statistical Tolerance Regions: Theory, Applications, and Computation. 2009. John Wiley and Sons. P. 1–6.
8. Liu Y., Yao X., Higuchi T. Evolutionary Ensembles with Negative Correlation Learning. IEEE Transactions on Evolutionary Computation.
2000. P. 380–387.
9. CHO Sung-Bae, and Jin H. KIM: Multiple Network Fusion Using Fuzzy Logic. IEEE Transactions on Neural Networks. 1995, 6(2).
P. 497–501.
10. Chandra Arjun, and Xin Yao: Evolving Hybrid Ensembles of Learning Machines for Better Generalisation, Neurocomputing. 2006. 69(7–9).
P. 686–700.
11. Krammer Peter, Habala Ondrej, Hluchý Ladislav. Transformation regression technique for data mining. In IEEE International Conference on
Intelligent Engineering Systems. 2016, vol., art. no. 7555134. P. 273–277.
12. Prasanna Sahoo: Probability and Mathematical Statistics, University of Louisville. 2013. P. 497–584.
http://www.math.louisville.edu/~pksaho01/teaching/Math662TB-09S.pdf
13. Krishnamoorthy K. Statistical Tolerance Regions: Theory, Applications, and Computation, 2009, John Wiley and Sons. P. 1–6.
14. Matlab, Statistics and Machine Learning Toolbox Functions, https://www.mathworks.com/help/stats/functionlist-alpha.html

Література

1. Hastie T., Tibshirani R., Friedman J. The elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition 2009,
Springer. P. 463–470, 605–622. http://web.stanford.edu/~hastie/Papers/ESLII.pdf
2. Ian H. Witten, Eibe Frank: Data Mining: Practical Machine Learning Tools and Techniques, Second Edition, Elsevier. P. 315–334, 414–418.
3. Peter Krammer, Marcel Kvassay, Ladislav Hluchý: Improved regression method with interval estimation. In ICNC-FSKD 2017: 2017 13th
international conference on natural computation, fuzzy systems and knowledge discovery. - Guilin, China: IEEE, 2017. P. 2402–2408.
4. Jain Anil K., Robert P. W. Duin, Jianchang Mao: Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 2000. P. 4–37.
5. Dietterich T.G. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and
Randomization, Machine Learning 2000, 40(2). P. 139–157.
6. UCI Machine Learning Repository: Energy Efficiency, Center for Machine Learning and Intelligent Systems,
https://archive.ics.uci.edu/ml/datasets/Energy+efficiency
7. K. Krishnamoorthy: Statistical Tolerance Regions: Theory, Applications, and Computation. 2009. John Wiley and Sons. P. 1–6.
8. Liu Y., Yao X., Higuchi T. Evolutionary Ensembles with Negative Correlation Learning. IEEE Transactions on Evolutionary Computation.
2000. P. 380–387.

195
9. CHO Sung-Bae, and Jin H. KIM: Multiple Network Fusion Using Fuzzy Logic. IEEE Transactions on Neural Networks. 1995, 6(2).
P. 497–501.
10. Chandra Arjun, and Xin Yao: Evolving Hybrid Ensembles of Learning Machines for Better Generalisation, Neurocomputing. 2006. 69(7–9).
P. 686–700.
11. Krammer Peter, Habala Ondrej, Hluchý Ladislav. Transformation regression technique for data mining. In IEEE International Conference on
Intelligent Engineering Systems. 2016, vol., art. no. 7555134. P. 273–277.
12. Prasanna Sahoo: Probability and Mathematical Statistics, University of Louisville. 2013. P. 497–584.
http://www.math.louisville.edu/~pksaho01/teaching/Math662TB-09S.pdf
13. Krishnamoorthy K. Statistical Tolerance Regions: Theory, Applications, and Computation, 2009, John Wiley and Sons. P. 1–6.
14. Matlab, Statistics and Machine Learning Toolbox Functions, https://www.mathworks.com/help/stats/functionlist-alpha.html
About Authors:

Peter Krammer,
graduated from the Faculty of Electrical Engineering and Information Technology,
Slovak University of Technology in Bratislava,
and is currently Researcher at the Institute of Informatics of the Slovak Academy of Sciences.
His research interests include data mining and machine learning.
He is (co-)author of several scientic papers and has participated in international and national research projects.

Marcel Kvassay,
is a research scientist at the Institute of Informatics of the Slovak Academy of Sciences.
He graduated from the Faculty of Electrical Engineering and Information Technology in 1991 and in 2017 earned his
PhD in applied informatics from the Faculty of Informatics and Information Technologies of the Slovak University of
Technology in Bratislava.
Prior to joining the Institute in 2009 he worked at various positions as a software engineer, software design coach and
software process improvement manager. His research interests include causal analysis, complex systems, intelligent and
knowledge-based technologies, data mining and machine learning.

Ladislav Hluchý,
is the Head of the Department of Parallel and Distributed Information Processing
at the Institute of Informatics of the Slovak Academy of Sciences.
He received M. Sc. and Ph.D. degrees, both in Computer Science.
He is R&D Project Manager and Work-package Leader in a number of 4FP, 5FP, 6FP and 7FP projects,
as well as in Slovak national R&D projects.

Organization:

Institute of Informatics
Slovak Academy of Sciences
Dúbravská cesta 9
845 07 Bratislava, Slovakia.
E-mails: peter.krammer@savba.sk,
marcel.kvassay@savba.sk,
ladislav.hluchy@savba.sk

196