Application of a Nine-Variate Prediction Ellipsoid for
                                Normalized Data and Machine Learning Algorithms for
                                Keystroke Dynamics Recognition
                                Sergiy Prykhodko1,2,*, and Artem Trukhov1,
                                1
                                    Admiral Makarov National University of Shipbuilding, Heroes of Ukraine Ave., 9, Mykolaiv, 54007, Ukraine
                                2
                                    Odesa Polytechnic National University, Shevchenko Ave., 1, Odesa, 65044, Ukraine


                                                   Abstract
                                                   Keystroke dynamics recognition is a crucial element in enhancing security, enabling personalized user
                                                   authentication, and supporting various identity verification systems. This study offers a comparative
                                                   analysis of a nine-variate prediction ellipsoid for normalized data and machine learning algorithms
                                                   specifically, autoencoder, isolation forest, and one-class support vector machine for keystroke dynamics
                                                   recognition. Traditional methods often assume a multivariate normal distribution. However, real-world
                                                   keystroke data typically deviate from this assumption, negatively impacting model performance. To address
                                                   this, the dataset was normalized using the multivariate Box-Cox transformation, allowing the construction
                                                   of a decision rule based on a nine-variate prediction ellipsoid for normalized data. The study also includes

                                                   The results revealed that the application of the Box-Cox transformation significantly enhanced both the
                                                   accuracy and robustness of the prediction ellipsoid. Although all models demonstrated strong performance,
                                                   the nine-variate prediction ellipsoid for normalized data consistently outperformed the machine learning
                                                   algorithms. The study highlights the importance of careful feature selection and multivariate normalizing
                                                   transformations in keystroke dynamics recognition. Future studies could benefit from broader datasets that
                                                   include a wider range of user behaviors, such as variations in environmental factors and longer key
                                                   sequences.

                                                   Keywords
                                                   keystroke dynamics, multivariate normal distribution, Box-Cox transformation, machine learning 1


                                1. Introduction
                                In recent years, keystroke dynamics recognition has become an effective method for biometric
                                authentication. By analyzing the unique patterns and rhythms individuals demonstrate while typing,
                                such as keystroke duration and the intervals between key presses [1], it becomes possible to create
                                a distinctive typing profile for each user. Unlike traditional biometric methods like fingerprint or
                                facial recognition, keystroke dynamics offers a non-intrusive and continuous form of authentication
                                [2]. This makes it especially appealing for secure applications such as online banking, login systems,
                                and access control.
                                   The keystroke recognition process involves several essential stages to ensure accurate user
                                authentication. It begins with the collection of a dataset, typically consisting of timestamps for
                                keypress and key release events. From this raw data, key attributes such as hold times and inter-key
                                intervals are extracted, which reveal the unique typing behavior of the user. A critical preprocessing
                                step is the detection and removal of outliers - data points that significantly deviate from the expected
                                behavior and could otherwise distort the results [3]. This step is vital for creating a cleaner dataset


                                Information Technology and Implementation (IT&I-2024), November 20-21, 2024, Kyiv, Ukraine
                                 Corresponding author.
                                 These authors contributed equally.
                                   sergiy.prykhodko@nuos.edu.ua (S. Prykhodko); artem.trukhov@gmail.com (A. Trukhov)
                                    0000-0002-2325-018X (S. Prykhodko); 0000-0002-7160-8609 (A. Trukhov)
                                              © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


                                                                                                                                                                                        51
CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
and improving model accuracy. Once preprocessing is complete, classification models are applied to
recognize new data inputs.
   In traditional recognition tasks, classification typically involves assigning an object to one of
several predefined categories. However, in the context of keystroke dynamics, one-class
classification is more frequently employed. Unlike standard classification methods, which rely on a
balanced dataset with both positive and negative examples, one-class classification focuses on

without the need for negative samples. This approach is particularly beneficial in authentication
systems, where the goal is to continuously verify that the current user matches the known profile,
rather than distinguishing between multiple users [4]. Closely related to outlier detection, one-class
classification evaluates new data to determine if it aligns with the target profile, flagging any
deviations as potential anomalies [5].
   Prediction ellipsoids [6] and machine learning algorithms [7] are commonly utilized in the field
of pattern recognition. The study aims to compare these models in keystroke dynamics recognition,
assessing their performance, robustness, and applicability in real-world settings.

2. Literature review

Mathematical modeling techniques are pivotal in the field of keystroke dynamics recognition, aimed
at improving accuracy and reliability. Recent advancements have integrated a range of approaches.
Tree-based models, like random forests [8], classify data by constructing hierarchical structures, and
learning feature splits that effectively differentiate between classes. Support vector-based methods
[9] focus on maximizing the margin between classes to create optimal decision boundaries, while
neural network models [10] capture complex patterns in keystroke data by processing information
through multiple interconnected layers of nodes.
    However, for user authentication systems, one-class classification is more commonly employed
[11]. Among the leading techniques are prediction ellipsoids and machine learning algorithms such
as one-class support vector machine (OCSVM) [12, 13], isolation forest (IF) [14], and autoencoder
(AE) [15, 16]. OCSVM learns a decision boundary that separates target data points from outliers
while maximizing the margin within the feature space. IF is an ensemble method that isolates
anomalies by randomly selecting features and partitioning the data until anomalous points are
isolated in smaller partitions, requiring fewer splits for target points. AE, as a neural network, learns
an efficient representation of data by encoding inputs into a lower-dimensional space and
reconstructing them. Anomalies are flagged by evaluating reconstruction errors, where higher
discrepancies suggest potential outliers.
    The use of prediction ellipsoids relies on the assumption that data conforms to a multivariate
normal distribution [17]. In practice, however, this assumption often does not hold for real-world
keystroke data [18]. To address this, normalization transformations are applied, adjusting the data
to more closely align with a multivariate normal distribution and thereby improving the model's
accuracy and robustness [19-20]. Techniques like univariate transformations (e.g., logarithmic or
Box-Cox transformation) operate on individual features, while multivariate transformations, such as
the multivariate Box-Cox transformation, consider relationships between features for a more holistic
normalization approach.
    This study focuses on comparing prediction ellipsoid for normalized data and machine learning
algorithms such as OCSVM, IF, and AE, which are widely used and offer distinct approaches to one-
class classification. In the context of keystroke dynamics recognition, accuracy, and efficiency are
critical, making it essential to evaluate the effectiveness of different approaches. While prediction
ellipsoid offers interpretability and computational efficiency, it can encounter limitations when
dealing with non-Gaussian data distributions. On the other hand, machine learning algorithms such
as OCSVM, IF, and AE provide alternative techniques, each with its own advantages and challenges
when applied to keystroke dynamics.
                                                                                                       52
3. Materials and methods
3.1. Keystroke dynamics dataset
In keystroke dynamics recognition, the quality and structure of the dataset play a crucial role in
determining the performance and accuracy of the applied algorithms. A typical keystroke dynamics
dataset records various temporal characteristics of an individual's typing behavior, including metrics
such as the duration of key presses and the intervals between consecutive key events.
   This study utilizes the CMU keystroke dynamics dataset, which captures detailed typing data

dataset records various keystroke timing features in seconds, including how long each key is pressed
and the intervals between key presses. Data collection was conducted over eight distinct sessions
per subject, with at least one day between sessions. Each session required subjects to type the
password 50 times, resulting in 400 samples per individual and a total of 20,400 samples across all
participants.
   The dataset is organized by subject identifier, session number, repetition count, and 31 timing
features. Columns are labeled to reflect specific keystroke metrics: H.key denotes the hold time for
a particular key, measuring the duration from key press to release. DD.key1.key2 represents the
keydown-keydown interval, i.e., the time between pressing two consecutive keys, while
UD.key1.key2 indicates the keyup-keydown interval, measuring the time between releasing one key
and pressing the next. Notably, UD times can be negative in some cases, and the sum of H times and
UD times corresponds to the DD time for a given digraph.
   To simplify the modeling process, this study focuses on 9 key properties, hold time for a particular
key, forming the feature vector: X = { H.t, H.i, H.e, H.5, H.R, H.o, H.a, H.n, H.l }.

3.2. Outlier removal
After extracting feature vectors, the subsequent step involves detecting and removing outliers. This
process is crucial because outliers can distort the analysis and undermine the performance of
recognition models. By eliminating these anomalies, the dataset is refined, ensuring that the data
better reflects typical user behavior, which in turn enhances model training.
    One commonly used method for outlier detection is based on the squared Mahalanobis distance
(SMD). However, SMD assumes that the data follows a multivariate normal distribution, which might
not always be the case. To verify this assumption, it is necessary to assess the data's normality
through statistical tests like the Mardia test, which is used in the study [21]. This test evaluates two
aspects of multivariate normality: skewness 𝛽1 and kurtosis 𝛽2 .
    The Mardia test calculates skewness scaled by 𝑁/6, which follows a chi-square distribution with
𝑝(𝑝 + 1)(𝑝 + 2)/6 degrees of freedom, where 𝑝 is the number of variables and 𝑁 is the sample size.
Kurtosis is compared to the normal distribution, with a mean of 𝑝(𝑝 + 2) and a variance of
8𝑝(𝑝 + 2)/𝑁. By comparing the calculated skewness and kurtosis values with those expected under
a normal distribution, the test helps identify significant deviations from multivariate normality. If
the data deviates significantly, normalization is required to transform a non-Gaussian vector 𝑋 =
𝑋1 , 𝑋2 , … , 𝑋9 𝑇 into a Gaussian vector 𝑍 = 𝑍1 , 𝑍2 , … , 𝑍9 𝑇 .
    Normalization transformations are essential in data analysis and machine learning, as they help
stabilize variance, reduce skewness, and better align data with a multivariate Gaussian distribution.
Univariate transformations, such as logarithmic transformations and univariate BCT, are typically
applied to individual features. The logarithmic transformation is effective for stabilizing variance in
positively skewed data, while the univariate BCT can handle both positive and negative skewness
by optimizing a
interdependent, and the BCT can be sensitive to outliers due to the complexity of parameter
estimation.

                                                                                                     53
   In contrast, multivariate transformations like the multivariate BCT consider the relationships
between multiple features. The multivariate Box-Cox transformation builds upon the principles of
the univariate Box-Cox transformation but applies it across multiple variables at once:

                                              λ
                                            (𝑋 𝑗 − 1)/λ𝑗 ,   λ𝑗 ≠ 0;
                            𝑍𝑗 = 𝑥(λ𝑗 ) = { 𝑗                                                   (1)
                                                ln(𝑋𝑗 ),     λ𝑗 = 0.
   While it is more computationally demanding, this transformation preserves correlations between
variables, offering a more robust approach to normalizing complex datasets. The multivariate BCT
improves the alignment of data with a multivariate normal distribution by optimizing parameters
through methods such as maximizing the log-likelihood of transformed data, as discussed in the
study [21]. Once the multivariate BCT is applied, the Mardia test should be repeated to verify the
success of the normalization.
   After normalization, outlier removal is performed iteratively using the SMD method, removing
one data point per iteration based on the largest distance. This ensures that the most extreme values
are eliminated first, leading to a cleaner and more representative dataset for subsequent analysis.

3.3. Prediction ellipsoid
A prediction ellipsoid is a multivariate tool used to assess whether a data point belongs to a specific
target class. It operates by calculating the SMD for each point, which forms the left side of the
comparison equation. This distance is then measured against a critical value derived from the chi-
square distribution, which serves as the right side of the equation [22]:

                                (𝑋 − 𝑋̄)𝑇 𝑆𝑋−1 (𝑋 − 𝑋̄) = 𝜒9,2 0.005 .                                 (2)
    The SMD follows a chi-square distribution with degrees of freedom corresponding to the number
of features in the data, which in this case is 9. This allows for the calculation of a critical value based
on the desired significance level, commonly set at 0.005 for one-class classification tasks. If a data
                                                               an anomaly, meaning it is likely part of a
different class. If the SMD falls below the threshold, the point is considered an instance of the target
class.
    In cases where the data does not follow a normal distribution, a normalization process is
implemented before constructing the nine-variate prediction ellipsoid, which is represented by the
equation:

                              (𝑍 − 𝑍̄)𝑇 𝑆𝑍−1 (𝑍 − 𝑍̄) = 𝜒9,2 0.005 .                            (3)
    For 9 degrees of freedom at a significance level of 0.005, the chi-square distribution provides a
critical value of 23.59. Any data point with an SMD below this value is deemed to lie within the
ellipsoid, signifying its membership in the target class.

3.4. Machine learning algorithms
3.4.1. One-class support vector machine
OCSVM constructs a decision boundary that separates target data from the rest of the feature space
by finding a hyperplane with the maximum margin. This boundary is optimized by maximizing the
distance between the hyperplane and the origin within a high-dimensional feature space. The
OCSVM employs an implicit transformation function, denoted as φ(∙), which is a non-linear
projection evaluated through a kernel function. This kernel function maps the original feature space
into a potentially higher-dimensional one: 𝑘(𝑥, 𝑦) = φ(𝑥) ∙ φ(𝑦) [23].
   Several kernel functions are commonly used in OCSVM. The linear kernel computes dot products
in the original feature space, making it ideal for linearly separable data. The polynomial kernel
captures non-linear relationships by raising dot products to specific powers, allowing it to model
                                                                                                        54
more complex decision boundaries. The radial basis function kernel, using a Gaussian function,
effectively captures intricate relationships, particularly in cases where data is not linearly separable.
The sigmoid kernel, based on the hyperbolic tangent function, excels at capturing non-linear
patterns, making it useful for handling complex relationships between features and classes.
   The decision boundary that OCSVM learns is defined by the following equation:

                                 𝑔(𝑥) = 𝜔𝑇 φ(𝑥) − 𝜌,
where 𝜔 represents the normal vector of the hyperplane, and 𝜌 is the bias term.
   OCSVM is formulated as a quadratic optimization problem, aiming to minimize the weight vector
𝜔 while maximizing the margin, subject to specific constraints. The optimization problem can be
expressed as:

                                                                𝑁
                                        ||𝜔||2      1
                               𝑚𝑖𝑛𝜔,ξ,𝜌        −𝜌+    ∑ ξ𝑖 ,
                                          2        νN
                                                                𝑖=1
                            subject to: 𝜔𝑇 φ(𝑥𝑖 ) ≥ 𝜌 − ξ𝑖 , ξ𝑖 ≥ 0,
where ξ𝑖 are slack variables that account for separation errors, and ν ∈ (0,1] is the regularization
parameter, which controls the balance between the number of outliers and the number of support
vectors.
   The optimization problem is typically solved in its dual form, producing a decision function that
classifies new data points as either belonging to the target class or as anomalies. The final decision
function is:

                                    𝑓(𝑥) = 𝑠𝑔𝑛(𝑔(𝑥)).
   The function returns a positive value for data points belonging to the target class and a negative
value for anomalies.

3.4.2. Isolation forest
Unlike traditional methods that rely on modeling target points, IF takes a distinctive approach by
focusing directly on isolating anomalies. This technique works by constructing isolation trees, where
internal nodes represent features and their split values, and leaf nodes represent individual data
points. The construction of isolation trees begins by randomly selecting a feature and a
corresponding split value within its range. This random process continues until each data point is
isolated in its own leaf node or until a specified maximum tree depth is reached [24]. The strength

distribution. Anomalies, being easier to isolate since they reside in sparser regions of the feature
space, require fewer splits from root to leaf nodes compared to normal data points. As a result, the
average path length from the root to the leaf node for each data point is calculated across all trees in
the forest.
    The anomaly score for each data point is derived based on its average path length using the
following formula:

                                                      𝐸(ℎ(𝑥))
                                                  −
                                   𝑠(𝑥, 𝑛) = 2 𝑐(𝑛) ,
where 𝐸(ℎ(𝑥)) is the average path length of data point 𝑥 across 𝑡 isolation trees:

                                              ∑𝑡𝑖=1 ℎ𝑖 (𝑥)
                                    𝐸(ℎ(𝑥)) =             ,
                                                   𝑡
and 𝑐(𝑛) represents the average path length of an unsuccessful search in a binary tree:

                             𝑐(𝑛) = 2𝐻(𝑛 − 1) − (2(𝑛 − 1)/𝑛),

                                                                                                      55
where 𝐻(𝑖) = ln(𝑖) + γ, and γ
   Data points with shorter path lengths, closer to the root of the tree, are more likely to be
anomalies, while those with longer paths are considered targets. Based on the anomaly scores, a
threshold is set to classify data points as either anomalies or normal. Points with scores above the
threshold are flagged as anomalies, while those below are classified as normal data points.

3.4.3. Autoencoder
An autoencoder is a type of artificial neural network designed for learning efficient data
representations, dimensionality reduction, and anomaly detection. As an unsupervised learning
method, it consists of two key components: an encoder and a decoder.
    The primary goal of an autoencoder is to learn a compressed and meaningful representation of
the input data. The encoder's function is to map the input data into latent space, effectively
compressing the data into a lower-dimensional form. This is typically achieved through a series of
layers, where each layer applies non-linear transformations to the input data. The resulting latent
space captures the most relevant features and patterns of the input, condensing its essential
information. The decoder, on the other hand, is tasked with reconstructing the original input data
from its latent space representation. Its architecture generally mirrors that of the encoder, but in
reverse, and it applies a series of non-linear transformations to transform the latent representation
back into the original data format [25].
    During training, the autoencoder aims to minimize reconstruction error, which quantifies the
difference between the original input and the reconstructed output. This is typically achieved by
optimizing a loss function, such as mean squared error or binary cross-entropy, using gradient-based
methods like backpropagation.
    In recognition tasks, the autoencoder is trained using only instances of the target class, allowing
it to learn the typical patterns and structure of normal data. When the autoencoder encounters new
data, it will reconstruct the input with a low error if it belongs to the target class. However, if the
input represents an anomaly, the reconstruction error will be higher, as the autoencoder is not well-
equipped to accurately reconstruct unfamiliar instances. By establishing a threshold for the
reconstruction error, anomalies can be detected and distinguished from normal instances.

3.5. Evaluation metrics
In one-class classification, where the objective is to differentiate between target instances and
anomalies, evaluation metrics such as specificity, recall, precision, F1 score, and accuracy are crucial
for assessing model performance [26].
   These metrics are derived from the classification outcomes, which can be categorized into four
groups: true positives (TP), representing correctly identified anomalies; false positives (FP),
indicating instances mistakenly classified as anomalies; true negatives (TN), denoting correctly
identified target instances; and false negatives (FN), reflecting actual anomalies that were
misclassified as target instances.

proportion of accurately identified target instances out of all target instances:
                                                          𝑇𝑁
                                      𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =              .
                                                       𝑇𝑁 + 𝐹𝑃
   Recall gauges the model's ability to detect all actual anomalies, measuring the proportion of true
anomalies correctly identified out of all existing anomalies:
                                                       𝑇𝑃
                                          𝑅𝑒𝑐𝑎𝑙𝑙 =            .
                                                    𝑇𝑃 + 𝐹𝑁
   Precision assesses the model's reliability when identifying anomalies, showing the proportion of
true anomalies among all instances classified as anomalies:


                                                                                                     56
                                                         𝑇𝑃
                                       𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =            .
                                                     𝑇𝑃 + 𝐹𝑃
   F1 score provides a balanced evaluation by calculating the harmonic mean of precision and recall,
offering a single metric that accounts for both aspects:
                                                 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
                                𝐹1 𝑠𝑐𝑜𝑟𝑒 = 2 ∗                       .
                                                𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
   Finally, the accuracy metric measures the overall correctness of the classification, taking both
target instances and anomalies into account:
                                                     𝑇𝑃 + 𝑇𝑁
                                 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =                        .
                                               𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
   After constructing the models, they will be evaluated using these metrics, enabling a
comprehensive analysis of their performance.

4. Experiments
4.1. Data preparation and outlier removal
For the experiments, data with the identifier s015 was randomly selected for analysis, while data
from s004 was used as a test set to evaluate the recognition of keystroke dynamics from a different
individual. Outlier detection began by assessing whether the s015 dataset adhered to a multivariate
normal distribution. The Mardia test revealed significant deviations, as the test statistic for
multivariate skewness 𝑁𝛽1 /6, at 391.54, exceeded the chi-square critical value of 215.53 for 165
degrees of freedom at a significance level of 0.005. Similarly, the multivariate kurtosis statistic 𝛽2 ,
with a value of 113.32, surpassing the critical value of 102.62 for a mean of 99, variance of 1.98, and
significance level of 0.005, indicating non-normality and necessitating further normalization.
    Normalization parameters were estimated using the maximum likelihood method, yielding the
following estimates for the multivariate BCT: λ̂1 = 0.9939, λ̂2 = 1.3605, λ̂3 = 1.2202, λ̂4 = 1.7521,
λ̂5 = 2.2965, λ̂6 = 1.0447, λ̂7 = 1.6466, λ̂8 = 1.3512, λ̂9 = 2.0599.
    After applying the nine-variate Box-Cox transformation with components (1), the Mardia test
was performed again. The skewness statistic 𝑁𝛽1 /6 was reduced to 212.07, which is below the chi-
square threshold of 215.53, but the kurtosis statistic 𝛽2 remained slightly elevated at 109.01, still
above the critical value of 102.62. Despite some remaining non-normality, primarily due to outliers,
the transformed dataset better approximated a multivariate normal distribution, improving the
conditions for using SMD.
    Subsequently, SMD was computed for each feature vector to identify potential outliers. These
distances were compared to the chi-square critical value of 23.59 for 9 degrees of freedom at a 0.005
significance level. Any vectors with SMD exceeding this value were classified as outliers. The most
extreme outlier, vector number 295 with an SMD of 37.44, was removed.
    This process of outlier removal was iteratively repeated until all extreme points were excluded.
After eliminating 6 outliers, the multivariate kurtosis statistic finally fell below the critical value,
confirming that outliers had a substantial impact on the dataset's distribution.
    Table 1 lists the SMD values and the corresponding indices for each outlier that was removed.
This iterative process continued until no further significant outliers were detected, resulting in a
refined dataset that was less affected by extreme values.
    To mitigate any potential bias related to the order of the data, the final sample was randomly
shuffled to ensure an even distribution across the training and test sets. The shuffled data was then
split into two equal parts, with 195 vectors in each set.
    The training set was utilized to build both the prediction ellipsoid and the machine learning
models, allowing them to capture the underlying patterns and relationships within the data.
Meanwhile, the test set was reserved to assess the performance of the models on data not previously
encountered during training.

                                                                                                     57
Table 1
Removed anomalies
                     SMD         Vector number                SMD        Vector number
              1      37.44       295                  6       26.963     323
              2      36.962      160                  7       26.868     45
              3      30.742      306                  8       25.776     263
              4      28.833      388                  9       24.515     294
              5      28.662      214                  10      23.972     204
   Following this outlier removal process, the final set was obtained with the following vector of
means: 𝑋̅ = {0.07525; 0.07022; 0.07823; 0.063; 0.06911; 0.08829; 0.08605; 0.07505; 0.0751}, Table 2
presents the covariance matrix.

Table 2
The covariance matrix of the final set
             X1        X2         X3        X4        X5        X6        X7        X8        X9
        X1   0.0319    0.0426     0.0454    0.0413    0.0549    0.0412    0.0426    -0.056    -0.0417
        X2   0.0426    0.0321     -0.0419   0.0551    0.0582    0.0414    0.044     -0.041    0.0514
        X3   0.0450    -0.0419    0.0325    0.061     0.0535    0.0511    -0.0426   0.0427    0.061
        X4   0.0413    0.0551     0.061     0.0319    0.0416    -0.0552   0.0431    -0.0418   -0.0557
        X5   0.0549    0.0582     0.0535    0.0416    0.0313    0.0414    -0.0562   0.0556    0.0548
        X6   0.0412    0.0414     0.0511    -0.0552   0.0414    0.0312    -0.0536   0.0536    0.0586
        X7   0.0426    0.044      -0.0426   0.0431    -0.0562   -0.0536   0.0335    -0.0438   0.0519
        X8   -0.056    -0.041     0.0427    -0.0418   0.0556    0.0536    -0.0438   0.0327    0.057
        X9   -0.0417   0.0514     0.061     -0.0557   0.0548    0.0586    0.0519    0.057     0.032

Table 3
The covariance matrix of the training set
             X1        X2         X3        X4        X5        X6        X7        X8 X9
        X1   0.032     0.0416     0.0462    0.0442    0.0553    0.0571    0.0438    0.0514
                                                                                       -0.0575
        X2   0.0416    0.0321     -0.0419   -0.0526   -0.0635   0.0414    0.0462    -0.041
                                                                                       0.0515
        X3   0.0462    -0.0419    0.0326    -0.0684   0.0411    0.0571    -0.0451   0.0599
                                                                                       0.0417
        X4   0.0442    -0.0526    -0.0684   0.0321    0.0424    -0.0675   0.0437    -0.0428
                                                                                       -0.0587
        X5   0.0553    -0.0 635   0.0411    0.0424    0.0313    0.0412    -0.0536   -0.0553
                                                                                       0.0572
        X6   0.0571    0.0414     0.0571    -0.0675   0.0412    0.0312    0.041     0.0516
                                                                                       0.0418
        X7   0.0438    0.0462     -0.0451   0.0437    -0.0536   0.041     0.0336    -0.0461
                                                                                       0.0669
        X8   0.0514    -0.041     0.0599    -0.0428   -0.0553   0.0516    -0.0461   0.0327
                                                                                       0.0423
        X9   -0.0574   0.0515     0.0417    -0.0587   0.0572    0.0418    0.0669    0.0423
                                                                                       0.0322
   Table 3 presents the covariance matrix of the training set, which has the mean vector 𝑋̅ = {0.07635;
0.07052; 0.07875; 0.06254; 0.06955; 0.08806; 0.08752; 0.07447; 0.07495}.

4.2. Prediction ellipsoid construction
The prediction ellipsoid should be constructed using data that follows a normal distribution, so
verifying the data's normality is a necessary first step. Based on the Mardia test results, the
multivariate distribution of this training sample deviates from normality. The test statistic for
multivariate skewness 𝑁𝛽1 /6 is 286.99, exceeding the critical value of 215.53 from the chi-square
distribution for 165 degrees of freedom at a 0.005 significance level. Additionally, the test statistic for
multivariate kurtosis 𝛽2 is 105.43, also exceeding the critical value of 104.19, given a mean of 99, a
variance of 4.062, and a 0.005 significance level.
                                                                                                         58
    To address this non-normality, the training set is normalized using a nine-variate BCT. The
optimal parameters for this transformation were estimated using the maximum likelihood method:
λ̂1 = 1.3676, λ̂2 = 1.4807, λ̂3 = 1.078, λ̂4 = 1.7393, λ̂5 = 2.1004, λ̂6 = 1.1498, λ̂7 = 1.566, λ̂8 = 1.1685,
λ̂9 = 2.1146.
    After applying the BCT with components (1), the normalized training set has a mean vector 𝑍̅ =
{0.70932; 0.66184; 0.86764; -0.57016; -0.47427; 0.81642; 0.62417; -0.81443; -0.47084}. The covariance
matrix 𝑆𝑍 is presented in Table 4.

Table 4
The covariance matrix of the normalized training set
              Z1       Z2         Z3         Z4       Z5        Z6       Z7        Z8        Z9
         Z1 0.0 29 0.0 14 0.0 2
                 4        5          4
                                             0.0 2
                                                5
                                                      0.0 13 0.0 21 0.0 39 0.0 73 -0.0768
                                                         6         5         5         7

         Z2 0.0514 0.0416 -0.0548 -0.0721 0.0771 0.0527 0.0543 -0.0516 0.0731
         Z3 0.042      -0.0548 0.0318 -0.0642 0.0643 0.0542 -0.0411 0.0551 0.0665
         Z4 0.052      -0.0721 -0.0642 0.0531 0.0615 -0.0615 0.0514 -0.052 -0.0732
         Z5 0.0 13 0.0771 0.0643 0.0615 0.0633 0.0658 0.088
                 6
                                                                                   -0.0617 0.0613
         Z6 0.0521 0.0527 0.0542 -0.0615 0.0658 0.0455 0.0515 0.0692 0.0672
         Z7 0.0539 0.0543 -0.0411 0.0514 0.088                  0.0515 0.0422 -0.041 0.0712
         Z8 0.0773 -0.0516 0.0551 -0.052 -0.0617 0.0692 -0.041 0.0311 0.0689
         Z9 -0.0768 0.0731 0.0666 -0.0732 0.0713 0.0672 0.0712 0.0689 0.0657
   The Mardia test performed on the normalized training set indicates conformity with multivariate
normality. The test statistic for multivariate skewness 𝑁𝛽1 /6 is 175.47, which is below the critical
value of 215.53. Similarly, the test statistic for multivariate kurtosis 𝛽2 is 99.76, which does not exceed
the critical value of 104.19, confirming that the normalized set follows a multivariate normal
distribution.

4.3. Implementation of machine learning algorithms
This section outlines the implementation of various machine learning algorithms used to recognize
the keystroke dynamics data. Specifically, we explore the One-Class Support Vector Machine,
Isolation Forest, and autoencoder models, each selected for their unique capabilities in anomaly
detection and one-class classification.
    The OCSVM is implemented in Python using the OneClassSVM object from the scikit-learn
library. This implementation allows for the customization of several critical parameters, including

determines the acceptable proportion of training errors and establishes an upper limit for the fraction
of outliers in the training dataset. The radial basis function kernel was chosen for its flexibility in
modeling non-linear relationships among data points. The gamma parameter is set to "auto,"
allowing its value to be computed automatically based on the inverse of the number of features,
influencing the range for each training example; lower values extend the influence while higher
values localize it.
    The IF algorithm, also implemented through sci-kit-learn [27], provides several tunable
parameters for optimizing performance. A key parameter is the contamination level, which defines
the threshold for categorizing new data points as either target or anomalous. After experimentation,
a contamination value of 0.05 was determined to effectively balance the detection of true anomalies
against false positives.
    Additional significant parameters include n_estimators, which denotes the number of decision
trees in the forest (set at 100), max_samples, indicating the maximum number of samples per tree
(set to 256), and max_features, specifying the maximum number of features for splitting each node
(set to 1.0 to utilize all features). To classify a sample as either target or anomalous, we compare the
                                                                                                          59
anomaly score against a defined threshold. The scores can range from negative to positive values;
negative scores indicate a higher likelihood of being a target, while positive scores suggest a greater
probability of being anomalous. The selection of the threshold value is application-dependent; in this
analysis, a threshold of 0 yielded optimal results.
    For the AE model, we utilized TensorFlow and Keras, leveraging their combined strengths in
flexibility, scalability, and ease of use. Keras, as a high-level API for building neural networks atop
TensorFlow, simplifies the process of constructing and training models. Meanwhile, TensorFlow
provides the essential computational framework, ensuring efficient performance during training and
inference.
    Before passing the data into the neural network, min-max normalization is applied to each feature
individually, scaling all features to a range of [0, 1]. This technique standardizes the features,
promoting stable and efficient learning processes.
    The AE architecture consists of an input layer configured to accept a nine-variate representation
of the data. The model includes fully connected layers for encoding and decoding operations. During
the encoding phase, the input data is compressed into a lower-dimensional representation,
progressively reducing dimensionality from 9 to 8 and then to 6, creating a bottleneck in the network
structure. This bottleneck layer compels the model to capture essential features of the input data
while minimizing redundancy [28].
    Each encoding layer employs rectified linear unit ReLU activation functions, introducing non-
linearity that facilitates the extraction of complex features. The decoding phase reverses this process,
expanding the dimensionality back to 8 and ultimately to the original 9 dimensions, using ReLU
activation functions to retain the learned non-linear relationships. The final layer utilizes a sigmoid
activation function to constrain output values within the range of [0, 1], a common choice for
reconstruction and binary classification tasks that require smooth and interpretable outputs. The
structure of the AE is illustrated in Figure 1.


                                  Figure 1: Autoencoder structure.

   To train the model, we employed the Adam optimizer in conjunction with binary cross-entropy
loss, a standard metric for reconstruction tasks aimed at minimizing the discrepancy between the
original and reconstructed data. The Adam optimizer combines the strengths of AdaGrad and

                                                                                                     60
RMSProp [29], dynamically adjusting the learning rate during training for faster convergence and
improved performance. The binary cross-entropy loss effectively measures the difference between
the input and reconstructed outputs, making it suitable for binary classification problems. The
training process encompasses 25 epochs, with a batch size of 16 instances per batch. Shuffling the
data at each epoch introduces variability, preventing the model from memorizing the training
sequence, and thus enhancing generalization.

5. Results
Table 5 displays a comparison of the recognition performance among various methods, including the
Prediction Ellipsoid for Non-Gaussian Data (PENGD) (1), the Prediction Ellipsoid for Normalized
Data (PEND) (7), One-Class Support Vector Machine (OCSVM), Isolation Forest (IF), and
Autoencoder (AE).

Table 5
Comparison of models
      Model           Specificity     Recall        Precision         F1 score         Accuracy
     PENGD              0.9795        0.9225        0.9893            0.9547           0.9412
      PEND              0.9949        0.9700        0.9974            0.9835           0.9782
    OCSVM               0.9744        0.9675        0.9872            0.9773           0.9697
        IF              0.9333        0.9500        0.9669            0.9584           0.9445
        AE              0.9641        0.9625        0.9821            0.9722           0.9630
    All models evaluated in this study demonstrate commendable performance in keystroke dynamics
recognition. However, the PENGD stands out with the lowest accuracy among the models assessed,
indicating that while it can capture some patterns, it may struggle with more complex datasets,
particularly due to the challenges posed by non-Gaussian data distributions. On the other hand, both
the OCSVM and AE exhibit very good performance across multiple metrics, reflecting their
capabilities in identifying true anomalies with high precision and recall. These models effectively
leverage their respective architectures to capture intricate relationships within the data, contributing
to their robust performance. In contrast, the IF did not perform as well as the other models.
    Ultimately, the PEND emerged as the best-performing model, achieving the highest scores across
key evaluation metrics. This reinforces the significance of normalization transformations in
enhancing prediction ellipsoid models for recognition tasks, particularly in scenarios involving non-
Gaussian data distributions.

6. Discussion
All models in this study exhibit strong performance in keystroke dynamics recognition, but PEND
stands out as the best performer. The precision, recall, and F1 score of this model are the highest,
demonstrating its ability to handle keystroke dynamics recognition tasks with remarkable accuracy.
The performance of OCSVM and AE is also notable, offering very good results, while IF lags slightly
behind the others.
   The findings underscore that applying the nine-variate BCT played a critical role in boosting
model performance, particularly by improving how the models handle non-Gaussian data.
Multivariate transformations like BCT take into account the correlations between variables, allowing
for a more accurate and comprehensive prediction ellipsoid. This, in turn, enhances the model's
ability to identify intricate patterns in the data, improving both its accuracy and reliability.
   However, there are certain disadvantages to using prediction ellipsoid for normalized data. A
robust model typically requires a dataset of at least 100 instances, which can be a challenge for
smaller datasets. Additionally, selecting the most appropriate normalization transformation can be

                                                                                                     61
complex, especially for datasets with intricate distributions or a large number of outliers. Another
important factor is the choice of significance level, as this can influence the efficiency and reliability
of the prediction ellipsoid.
    Limitations also arise from the outlier removal process, as deleting 10 outliers during
preprocessing may cause the model to miss some underlying patterns in the data. To mitigate this,
more advanced normalization techniques, such as the Johnson transformation, could be considered
to better align the model with the dataset's distribution, improving its ability to generalize across all
relevant data points.
    In this paper, the primary aim was to address the challenge posed by non-Gaussian data
distributions in the context of biometric identification based on keystroke dynamics. Emphasizing
the importance of normalization techniques, specifically the multivariate Box-Cox transformation,
to enhance model accuracy with such data.
    The dataset used in this study represents a 10-character password length, which may not be
optimal for real-world applications. A password length of 20-22 characters, without the use of
uppercase characters, is generally considered preferable, as it allows for more comprehensive feature
extraction. Beyond keystroke length and character variety, several contextual factors, such as the
                                                                        30], were not considered in this
research. However, these factors could play an important role in biometric identification based on
keystroke dynamics.
    In future research, a broader dataset that includes data reflecting the impact of environmental
factors could be used, along with extended key sequences. The inclusion of these factors would
provide a more realistic representation of user behavior. Additionally, the application of other
normalization techniques, such as the Johnson transformation, could further enhance model
accuracy by addressing the complexity of non-Gaussian distributions.

7. Conclusions
The focus of this paper was to address the challenges associated with non-Gaussian data distributions
in the context of keystroke dynamics recognition. The study compared the performance of prediction
ellipsoid models and machine learning algorithms, including OCSVM, IF, and AE. All models
demonstrated a high probability of recognition. Notably, the prediction ellipsoid for non-Gaussian
data had the lowest accuracy, highlighting the challenges posed by complex datasets. However, by
applying the multivariate BCT, the prediction ellipsoid model for normalized data showed significant
performance improvements, emphasizing the critical role of normalization when addressing non-
Gaussian data distributions. The BCT not only improved the overall accuracy but also deepened the
understanding of data patterns by considering correlations between variables, ultimately leading to
a more precise prediction ellipsoid.
    Despite these advancements, the study identified certain limitations and challenges. One
significant drawback is the necessity for a large dataset, as constructing a reliable prediction ellipsoid
model generally requires at least 100 instances. Furthermore, selecting the optimal normalization
transformation remains a complex task, especially when dealing with datasets that contain outliers
or exhibit highly intricate distributions. Another challenge lies in determining the appropriate
significance level, which directly affects the reliability and efficiency of the prediction ellipsoid.
    Looking ahead, future research could expand the dataset to include factors like environmental
factors, as well as extended key sequences, to provide a more realistic representation of user
behavior.
    The incorporation of alternative normalization techniques, such as the Johnson transformation,
could further enhance model accuracy by addressing the impact of non-Gaussian data. Further
investigation into model complexity and feature selection for both prediction ellipsoid models and
machine learning algorithms could offer valuable insights for improving keystroke dynamics
recognition.
                                                                                                       62
Declaration on Generative AI
The authors have not employed any Generative AI tools.

References
[1] L. de-Marcos, J. Martínez-Herráiz, J. Junquera-Sánchez, C. Cilleruelo, C. Pages-Arévalo,
     Comparing machine learning classifiers for continuous authentication on mobile devices by
     keystroke dynamics. Electronics, 2021. DOI: 10.3390/electronics10141622
[2] A. Alshehri, F. Coenen, D. Bollegala, Accurate Continuous and Non-intrusive User
     Authentication with Multivariate Keystroke Streaming, in: 9th International Conference on
     Knowledge Discovery and                 Information Retrieval, pp. 61-70, 2017. DOI:
     10.5220/0006497200610070.
[3] G. Ismail M, Salem, Abd El Ghany, Aldakheel EA, S. Abbas, Outlier detection for keystroke
     biometric user authentication. PeerJ Computer Science, 2024. DOI: 10.7717/peerj-cs.2086
[4] M. Choi, S. Lee, M. Jo, JS. Shin, Keystroke dynamics-based authentication using unique keypad.
     Sensors, 2021; 21(6):2242. DOI: 10.3390/s21062242
[5] H. Marques, L. Swersky, J. Sander, R. Campello, A. Zimek, On the evaluation of outlier detection
     and one-class classification: a comparative study of algorithms, model selection, and ensembles.
     Data Min Knowl Disc 37, 1473 1517, 2023. DOI: 10.1007/s10618-023-00931-x
[6] S. Kim, D. Park, J. Jung, Evaluation of one-class classifiers for fault detection: Mahalanobis
     classifiers and the Mahalanobis Taguchi system, Processes, 9(8), 1450, 2021. DOI:
     10.3390/pr9081450
[7] H. Chang, J. Li, C. Wu, M. Stamp, Machine learning and deep learning for fixed-text keystroke
     dynamics. Artificial Intelligence for Cybersecurity, pp. 309-329, 2022. DOI:
     10.48550/arXiv.2107.00507
[8] B. Saini, P. Singh, A. Nayyar, N. Kaur, K. Bhatia, S. El-Sappagh, J. Hu, A three-step authentication
     model for mobile phone user using keystroke dynamics. IEEE Access. 8. 125909-125922, 2020.
     DOI: 10.1109/ACCESS.2020.3008019
[9] Q. Li, H. Chen, CDAS: A continuous dynamic authentication system, in: Proceedings of the 2019
     8th International Conference on Software and Computer Applications, pp. 447-452, 2019. DOI:
     10.1145/3316615.3316691
[10]                                                                                    . arXiv preprint
     arXiv:2307.05529, 2023. DOI: 10.48550/arXiv.2307.05529
[11] N. Raul, R. Shankarmani, P. Joshi, A comprehensive review of keystroke dynamics-based
     authentication mechanism, in: International Conference on Innovative Computing and
     Communications. Advances in Intelligent Systems and Computing, Vol. 1059. Springer,
     Singapore, 2020. DOI: 10.1007/978-981-15-0324-5_13
[12] R. Toosi, M. Akhaee, Time-frequency analysis of keystroke dynamics for user authentication.
     Future Generation Computer Systems, 115, 438-447, 2021. DOI: 10.1016/j.future.2020.09.027
[13] ML Ali, K. Thakur, MA Obaidat, A hybrid method for keystroke biometric user identification.
     Electronics, 11(17):2782, 2022. DOI: 10.3390/electronics11172782
[14] I. Meenakshisundaram, I. Karunanithi, U. Sahana, Enhancing user authentication through
     keystroke dynamics analysis using isolation forest algorithm, in: 2024 Second International
     Conference on Emerging Trends in Information Technology and Engineering (ICETITE),
     pp. 1-5, 2024. DOI: 10.1109/ic-ETITE58242.2024.10493648.
[15] F. Trad, A. Hussein, A. Chehab, Free text keystroke dynamics-based authentication with
     continuous learning: a case study, in: 2022 IEEE 21st International Conference on Ubiquitous
     Computing and Communications (IUCC/CIT/DSCI/SmartCNS), Chongqing, China, 2022, pp.
     125-131. DOI: 10.1109/IUCC-CIT-DSCI-SmartCNS57392.2022.00031
[16] Y. Patel, K. Ouazzane, V. Vassilev, I. Faruqi, G. Walker, Keystroke dynamics using auto encoders,
     in: 2019 International Conference on Cyber Security and Protection of Digital Services (Cyber
     Security), pp. 1-8. Oxford, UK, 2019. DOI: 10.1109/CyberSecPODS.2019.8885203
                                                                                                     63
[17] S. Prykhodko, L. Makarova, K. Prykhodko, A. Pukhalevych, Application of transformed
     prediction ellipsoids for outlier detection in multivariate non-gaussian data, in: 2020 IEEE 15th
     International Conference on Advanced Trends in Radioelectronics, Telecommunications and
     Computer Engineering (TCSET), pp 359-362, 2020. DOI: 10.1109/TCSET49122.2020.235454.
[18] O. Oyebola, Examining the distribution of keystroke dynamics features on computer, tablet and
     mobile phone platforms, in: Mobile Computing and Sustainable Informatics: Proceedings of
     ICMCSI 2023, pp. 613-620. Singapore: Springer Nature Singapore. DOI: 10.1007/978-981-99-
     0835-6_43
[19] K. Lam, K. Meijer, F. Loonstra, E. Coerver, J. Twose, E. Redeman, J. Killestein, Real-world
     keystroke dynamics are a potentially valid biomarker for clinical disability in multiple sclerosis.
     Multiple Sclerosis Journal, 27(9), 1421-1431, 2021. DOI: 10.1177/1352458520968797
[20] S. Prykhodko, A. Prykhodko, I. Shutko, Estimating the size of web apps created using the
     CakePHP framework by nonlinear regression models with three predictors, in: IEEE 16th
     International Conference on Computer Sciences and Information Technologies (CSIT),
     pp. 333-336. LVIV, Ukraine, 2021. DOI: 10.1109/CSIT52700.2021.9648680
[21] Prykhodko S., Trukhov A. Application of a ten-variate prediction ellipsoid for normalized data
     and machine learning algorithms for face recognition. in: Selected Papers of the Seventh
     International Workshop on Computer Modeling and Intelligent Systems (CMIS-2024).
     Workshop Proceedings (CMIS-2024), Zaporizhzhia, Ukraine, May 3, 2024. CEUR Workshop
     Proceedings, Vol.3702, pp. 362-375, 2024. https://ceur-ws.org/Vol-3702/paper30.pdf.
[22] T. Etherington, Mahalanobis distances for ecological niche modelling and outlier detection:
     implications of sample size, error, and bias for selecting and parameterising a multivariate
     location and scatter method. PeerJ, 9. 2021. DOI: 10.7717/peerj.11436
[23] S. Todkar, V. Baltazart, A. Ihamouten, X. Dérobert, D. Guilbert, One-class SVM based outlier
     detection strategy to detect thin interlayer debondings within pavement structures using
     Ground Penetrating Radar data. Journal of Applied Geophysics, 192, 104392, 2021.
     DOI: 10.1016/j.jappgeo.2021.104392
[24] J. Lesouple, C. Baudoin, M. Spigai, JY. Tourneret, Generalized isolation forest for anomaly
     detection. Pattern Recognition Letters, Vol. 149, pp. 109-119, 2021. DOI:
     10.1016/j.patrec.2021.05.022
[25] W. Xu, J. Jang-Jaccard, A. Singh, Y. Wei, F. Sabrina, Improving performance of autoencoder-
     based network anomaly detection on NSL-KDD dataset. IEEE Access, Vol. 9, pp. 140136-140146,
     2021. DOI: 10.1109/ACCESS.2021.3116612
[26] W. Hilal, S. A. Gadsden, J. Yawney, Financial Fraud: A review of anomaly detection techniques
     and recent advances. Expert Systems with Applications, Vol. 193, 2022, 116429. DOI:
     10.1016/j.eswa.2021.116429
[27] M. U. Togbe, M. Barry, A. Boly, Y. Chabchoub, R. Chiky, J. Montiel, T.V. Tran, Anomaly
     detection for data streams based on isolation forest using scikit-multiflow, in: Computational
     Science and Its Applications ICCSA 2020: 20th International Conference, Cagliari, Italy, July 1
     4, 2020, Proceedings, Part IV 20 (pp. 15-30). Springer International Publishing. DOI: 10.1007/978-
     3-030-58811-3_2
[28] M. Sewak, S. K. Sahay, H. Rathore, An overview of deep learning architecture of deep neural
     networks and autoencoders. Journal of Computational and Theoretical Nanoscience, Vol. 17,
     No. 4, pp. 182-188. 2020. DOI: 10.1166/jctn.2020.8648.
[29] I. H. Kartowisastro, J. Latupapua, A comparison of adaptive moment estimation and rmsprop
     optimisation techniques for wildlife animal classification using convolutional neural networks.
     Revue d'Intelligence Artificielle, Vol. 37, No. 4, pp. 1023-1030. 2023. DOI: 10.18280/ria.370424
[30] S. Bilan, M. Bilan, A. Bilan, interactive biometric identification system based on the keystroke
     dynamic, in: S. Bilan, M. Elhoseny, D. J. Hemanth (Eds.), Biometric Identification Technologies
     Based on Modern Data Mining Methods. Springer, Cham, pp. 39-58, 2021. DOI: 10.1007/978-3-
     030-48378-4_3


                                                                                                     64