<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>X (S. Prykhodko);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Application of a Nine-Variate Prediction Ellipsoid for Normalized Data and Machine Learning Algorithms for Keystroke Dynamics Recognition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergiy Prykhodko</string-name>
          <email>sergiy.prykhodko@nuos.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artem Trukhov</string-name>
          <email>artem.trukhov@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Admiral Makarov National University of Shipbuilding, Heroes of Ukraine Ave.</institution>
          ,
          <addr-line>9, Mykolaiv, 54007</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Odesa Polytechnic National University</institution>
          ,
          <addr-line>Shevchenko Ave., 1, Odesa, 65044</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Keystroke dynamics recognition is a crucial element in enhancing security, enabling personalized user authentication, and supporting various identity verification systems. This study offers a comparative analysis of a nine-variate prediction ellipsoid for normalized data and machine learning algorithms specifically, autoencoder, isolation forest, and one-class support vector machine for keystroke dynamics recognition. Traditional methods often assume a multivariate normal distribution. However, real-world keystroke data typically deviate from this assumption, negatively impacting model performance. To address this, the dataset was normalized using the multivariate Box-Cox transformation, allowing the construction of a decision rule based on a nine-variate prediction ellipsoid for normalized data. The study also includes The results revealed that the application of the Box-Cox transformation significantly enhanced both the accuracy and robustness of the prediction ellipsoid. Although all models demonstrated strong performance, the nine-variate prediction ellipsoid for normalized data consistently outperformed the machine learning algorithms. The study highlights the importance of careful feature selection and multivariate normalizing transformations in keystroke dynamics recognition. Future studies could benefit from broader datasets that include a wider range of user behaviors, such as variations in environmental factors and longer key sequences.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;keystroke dynamics</kwd>
        <kwd>multivariate normal distribution</kwd>
        <kwd>Box-Cox transformation</kwd>
        <kwd>machine learning 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent years, keystroke dynamics recognition has become an effective method for biometric
authentication. By analyzing the unique patterns and rhythms individuals demonstrate while typing,
such as keystroke duration and the intervals between key presses [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], it becomes possible to create
a distinctive typing profile for each user. Unlike traditional biometric methods like fingerprint or
facial recognition, keystroke dynamics offers a non-intrusive and continuous form of authentication
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This makes it especially appealing for secure applications such as online banking, login systems,
and access control.
      </p>
      <p>
        The keystroke recognition process involves several essential stages to ensure accurate user
authentication. It begins with the collection of a dataset, typically consisting of timestamps for
keypress and key release events. From this raw data, key attributes such as hold times and inter-key
intervals are extracted, which reveal the unique typing behavior of the user. A critical preprocessing
step is the detection and removal of outliers - data points that significantly deviate from the expected
behavior and could otherwise distort the results [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This step is vital for creating a cleaner dataset
and improving model accuracy. Once preprocessing is complete, classification models are applied to
recognize new data inputs.
      </p>
      <p>
        In traditional recognition tasks, classification typically involves assigning an object to one of
several predefined categories. However, in the context of keystroke dynamics, one-class
classification is more frequently employed. Unlike standard classification methods, which rely on a
balanced dataset with both positive and negative examples, one-class classification focuses on
without the need for negative samples. This approach is particularly beneficial in authentication
systems, where the goal is to continuously verify that the current user matches the known profile,
rather than distinguishing between multiple users [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Closely related to outlier detection, one-class
classification evaluates new data to determine if it aligns with the target profile, flagging any
deviations as potential anomalies [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Prediction ellipsoids [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and machine learning algorithms [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] are commonly utilized in the field
of pattern recognition. The study aims to compare these models in keystroke dynamics recognition,
assessing their performance, robustness, and applicability in real-world settings.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature review</title>
      <p>
        Mathematical modeling techniques are pivotal in the field of keystroke dynamics recognition, aimed
at improving accuracy and reliability. Recent advancements have integrated a range of approaches.
Tree-based models, like random forests [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], classify data by constructing hierarchical structures, and
learning feature splits that effectively differentiate between classes. Support vector-based methods
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] focus on maximizing the margin between classes to create optimal decision boundaries, while
neural network models [10] capture complex patterns in keystroke data by processing information
through multiple interconnected layers of nodes.
      </p>
      <p>
        However, for user authentication systems, one-class classification is more commonly employed
[
        <xref ref-type="bibr" rid="ref10">11</xref>
        ]. Among the leading techniques are prediction ellipsoids and machine learning algorithms such
as one-class support vector machine (OCSVM) [
        <xref ref-type="bibr" rid="ref11 ref12">12, 13</xref>
        ], isolation forest (IF) [
        <xref ref-type="bibr" rid="ref13">14</xref>
        ], and autoencoder
(AE) [
        <xref ref-type="bibr" rid="ref14 ref15">15, 16</xref>
        ]. OCSVM learns a decision boundary that separates target data points from outliers
while maximizing the margin within the feature space. IF is an ensemble method that isolates
anomalies by randomly selecting features and partitioning the data until anomalous points are
isolated in smaller partitions, requiring fewer splits for target points. AE, as a neural network, learns
an efficient representation of data by encoding inputs into a lower-dimensional space and
reconstructing them. Anomalies are flagged by evaluating reconstruction errors, where higher
discrepancies suggest potential outliers.
      </p>
      <p>
        The use of prediction ellipsoids relies on the assumption that data conforms to a multivariate
normal distribution [
        <xref ref-type="bibr" rid="ref16">17</xref>
        ]. In practice, however, this assumption often does not hold for real-world
keystroke data [
        <xref ref-type="bibr" rid="ref17">18</xref>
        ]. To address this, normalization transformations are applied, adjusting the data
to more closely align with a multivariate normal distribution and thereby improving the model's
accuracy and robustness [
        <xref ref-type="bibr" rid="ref18 ref19">19-20</xref>
        ]. Techniques like univariate transformations (e.g., logarithmic or
Box-Cox transformation) operate on individual features, while multivariate transformations, such as
the multivariate Box-Cox transformation, consider relationships between features for a more holistic
normalization approach.
      </p>
      <p>This study focuses on comparing prediction ellipsoid for normalized data and machine learning
algorithms such as OCSVM, IF, and AE, which are widely used and offer distinct approaches to
oneclass classification. In the context of keystroke dynamics recognition, accuracy, and efficiency are
critical, making it essential to evaluate the effectiveness of different approaches. While prediction
ellipsoid offers interpretability and computational efficiency, it can encounter limitations when
dealing with non-Gaussian data distributions. On the other hand, machine learning algorithms such
as OCSVM, IF, and AE provide alternative techniques, each with its own advantages and challenges
when applied to keystroke dynamics.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Materials and methods</title>
      <sec id="sec-3-1">
        <title>3.1. Keystroke dynamics dataset</title>
        <p>In keystroke dynamics recognition, the quality and structure of the dataset play a crucial role in
determining the performance and accuracy of the applied algorithms. A typical keystroke dynamics
dataset records various temporal characteristics of an individual's typing behavior, including metrics
such as the duration of key presses and the intervals between consecutive key events.</p>
        <p>This study utilizes the CMU keystroke dynamics dataset, which captures detailed typing data
dataset records various keystroke timing features in seconds, including how long each key is pressed
and the intervals between key presses. Data collection was conducted over eight distinct sessions
per subject, with at least one day between sessions. Each session required subjects to type the
password 50 times, resulting in 400 samples per individual and a total of 20,400 samples across all
participants.</p>
        <p>The dataset is organized by subject identifier, session number, repetition count, and 31 timing
features. Columns are labeled to reflect specific keystroke metrics: H.key denotes the hold time for
a particular key, measuring the duration from key press to release. DD.key1.key2 represents the
keydown-keydown interval, i.e., the time between pressing two consecutive keys, while
UD.key1.key2 indicates the keyup-keydown interval, measuring the time between releasing one key
and pressing the next. Notably, UD times can be negative in some cases, and the sum of H times and
UD times corresponds to the DD time for a given digraph.</p>
        <p>To simplify the modeling process, this study focuses on 9 key properties, hold time for a particular
key, forming the feature vector: X = { H.t, H.i, H.e, H.5, H.R, H.o, H.a, H.n, H.l }.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Outlier removal</title>
        <p>After extracting feature vectors, the subsequent step involves detecting and removing outliers. This
process is crucial because outliers can distort the analysis and undermine the performance of
recognition models. By eliminating these anomalies, the dataset is refined, ensuring that the data
better reflects typical user behavior, which in turn enhances model training.</p>
        <p>
          One commonly used method for outlier detection is based on the squared Mahalanobis distance
(SMD). However, SMD assumes that the data follows a multivariate normal distribution, which might
not always be the case. To verify this assumption, it is necessary to assess the data's normality
through statistical tests like the Mardia test, which is used in the study [
          <xref ref-type="bibr" rid="ref20">21</xref>
          ]. This test evaluates two
aspects of multivariate normality: skewness  1 and kurtosis  2.
        </p>
        <p>The Mardia test calculates skewness scaled by  /6, which follows a chi-square distribution with
 ( + 1)( + 2)/6 degrees of freedom, where  is the number of variables and  is the sample size.
Kurtosis is compared to the normal distribution, with a mean of  ( + 2) and a variance of
8 ( + 2)/ . By comparing the calculated skewness and kurtosis values with those expected under
a normal distribution, the test helps identify significant deviations from multivariate normality. If
the data deviates significantly, normalization is required to transform a non-Gaussian vector  =
 1,  2, … ,  9 into a Gaussian vector  =  1,  2, … ,  9 .</p>
        <p>Normalization transformations are essential in data analysis and machine learning, as they help
stabilize variance, reduce skewness, and better align data with a multivariate Gaussian distribution.
Univariate transformations, such as logarithmic transformations and univariate BCT, are typically
applied to individual features. The logarithmic transformation is effective for stabilizing variance in
positively skewed data, while the univariate BCT can handle both positive and negative skewness
by optimizing a
interdependent, and the BCT can be sensitive to outliers due to the complexity of parameter
estimation.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Prediction ellipsoid</title>
        <p>
          A prediction ellipsoid is a multivariate tool used to assess whether a data point belongs to a specific
target class. It operates by calculating the SMD for each point, which forms the left side of the
comparison equation. This distance is then measured against a critical value derived from the
chisquare distribution, which serves as the right side of the equation [
          <xref ref-type="bibr" rid="ref21">22</xref>
          ]:
( −  ̄ )  −1(
        </p>
        <p>−  ̄ ) =  92, 0.005.</p>
        <p>The SMD follows a chi-square distribution with degrees of freedom corresponding to the number
of features in the data, which in this case is 9. This allows for the calculation of a critical value based
on the desired significance level, commonly set at 0.005 for one-class classification tasks. If a data
different class. If the SMD falls below the threshold, the point is considered an instance of the target
an anomaly, meaning it is likely part of a</p>
        <p>In contrast, multivariate transformations like the multivariate BCT consider the relationships
between multiple features. The multivariate Box-Cox transformation builds upon the principles of
the univariate Box-Cox transformation but applies it across multiple variables at once:

 =  (λ ) = {( 
λ
 − 1)/λ ,
ln(  ) ,
λ ≠ 0;
ellipsoid, signifying its membership in the target class.</p>
        <p>
          While it is more computationally demanding, this transformation preserves correlations between
variables, offering a more robust approach to normalizing complex datasets. The multivariate BCT
improves the alignment of data with a multivariate normal distribution by optimizing parameters
through methods such as maximizing the log-likelihood of transformed data, as discussed in the
study [
          <xref ref-type="bibr" rid="ref20">21</xref>
          ]. Once the multivariate BCT is applied, the Mardia test should be repeated to verify the
success of the normalization.
        </p>
        <p>After normalization, outlier removal is performed iteratively using the SMD method, removing
one data point per iteration based on the largest distance. This ensures that the most extreme values
are eliminated first, leading to a cleaner and more representative dataset for subsequent analysis.
(1)
(2)
(3)</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Machine learning algorithms</title>
      </sec>
      <sec id="sec-3-5">
        <title>3.4.1. One-class support vector machine</title>
        <p>
          OCSVM constructs a decision boundary that separates target data from the rest of the feature space
by finding a hyperplane with the maximum margin. This boundary is optimized by maximizing the
distance between the hyperplane and the origin within a high-dimensional feature space. The
OCSVM employs an implicit transformation function, denoted as φ(∙), which is a non-linear
projection evaluated through a kernel function. This kernel function maps the original feature space
into a potentially higher-dimensional one:  ( ,  ) = φ( ) ∙ φ( ) [
          <xref ref-type="bibr" rid="ref22">23</xref>
          ].
        </p>
        <p>Several kernel functions are commonly used in OCSVM. The linear kernel computes dot products
in the original feature space, making it ideal for linearly separable data. The polynomial kernel
captures non-linear relationships by raising dot products to specific powers, allowing it to model
more complex decision boundaries. The radial basis function kernel, using a Gaussian function,
effectively captures intricate relationships, particularly in cases where data is not linearly separable.
The sigmoid kernel, based on the hyperbolic tangent function, excels at capturing non-linear
patterns, making it useful for handling complex relationships between features and classes.</p>
        <p>The decision boundary that OCSVM learns is defined by the following equation:
 ( ) =   φ( ) −  ,
where  represents the normal vector of the hyperplane, and  is the bias term.</p>
        <p>OCSVM is formulated as a quadratic optimization problem, aiming to minimize the weight vector
 while maximizing the margin, subject to specific constraints. The optimization problem can be
expressed as:
vectors.
function is:
where ξ are slack variables that account for separation errors, and ν ∈ (0,1] is the regularization
parameter, which controls the balance between the number of outliers and the number of support</p>
        <p>The optimization problem is typically solved in its dual form, producing a decision function that
classifies new data points as either belonging to the target class or as anomalies. The final decision
 ( ) =</p>
        <p>( ( )).</p>
        <p>The function returns a positive value for data points belonging to the target class and a negative
value for anomalies.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.4.2. Isolation forest</title>
        <p>
          the forest.
following formula:
Unlike traditional methods that rely on modeling target points, IF takes a distinctive approach by
focusing directly on isolating anomalies. This technique works by constructing isolation trees, where
internal nodes represent features and their split values, and leaf nodes represent individual data
points. The construction of isolation trees begins by randomly selecting a feature and a
corresponding split value within its range. This random process continues until each data point is
isolated in its own leaf node or until a specified maximum tree depth is reached [
          <xref ref-type="bibr" rid="ref23">24</xref>
          ]. The strength
distribution. Anomalies, being easier to isolate since they reside in sparser regions of the feature
space, require fewer splits from root to leaf nodes compared to normal data points. As a result, the
average path length from the root to the leaf node for each data point is calculated across all trees in
        </p>
        <p>The anomaly score for each data point is derived based on its average path length using the
where  (ℎ( )) is the average path length of data point  across  isolation trees:
and  ( ) represents the average path length of an unsuccessful search in a binary tree:
 ( ,  ) = 2</p>
        <p>(ℎ( ))
−  ( ) ,
 (ℎ( )) =
∑

 =1 ℎ ( )

where  ( ) = ln( ) + γ, and γ</p>
        <p>Data points with shorter path lengths, closer to the root of the tree, are more likely to be
anomalies, while those with longer paths are considered targets. Based on the anomaly scores, a
threshold is set to classify data points as either anomalies or normal. Points with scores above the
threshold are flagged as anomalies, while those below are classified as normal data points.</p>
      </sec>
      <sec id="sec-3-7">
        <title>3.4.3. Autoencoder</title>
        <p>An autoencoder is a type of artificial neural network designed for learning efficient data
representations, dimensionality reduction, and anomaly detection. As an unsupervised learning
method, it consists of two key components: an encoder and a decoder.</p>
        <p>
          The primary goal of an autoencoder is to learn a compressed and meaningful representation of
the input data. The encoder's function is to map the input data into latent space, effectively
compressing the data into a lower-dimensional form. This is typically achieved through a series of
layers, where each layer applies non-linear transformations to the input data. The resulting latent
space captures the most relevant features and patterns of the input, condensing its essential
information. The decoder, on the other hand, is tasked with reconstructing the original input data
from its latent space representation. Its architecture generally mirrors that of the encoder, but in
reverse, and it applies a series of non-linear transformations to transform the latent representation
back into the original data format [
          <xref ref-type="bibr" rid="ref24">25</xref>
          ].
        </p>
        <p>During training, the autoencoder aims to minimize reconstruction error, which quantifies the
difference between the original input and the reconstructed output. This is typically achieved by
optimizing a loss function, such as mean squared error or binary cross-entropy, using gradient-based
methods like backpropagation.</p>
        <p>In recognition tasks, the autoencoder is trained using only instances of the target class, allowing
it to learn the typical patterns and structure of normal data. When the autoencoder encounters new
data, it will reconstruct the input with a low error if it belongs to the target class. However, if the
input represents an anomaly, the reconstruction error will be higher, as the autoencoder is not
wellequipped to accurately reconstruct unfamiliar instances. By establishing a threshold for the
reconstruction error, anomalies can be detected and distinguished from normal instances.</p>
      </sec>
      <sec id="sec-3-8">
        <title>3.5. Evaluation metrics</title>
        <p>
          In one-class classification, where the objective is to differentiate between target instances and
anomalies, evaluation metrics such as specificity, recall, precision, F1 score, and accuracy are crucial
for assessing model performance [
          <xref ref-type="bibr" rid="ref25">26</xref>
          ].
        </p>
        <p>These metrics are derived from the classification outcomes, which can be categorized into four
groups: true positives (TP), representing correctly identified anomalies; false positives (FP),
indicating instances mistakenly classified as anomalies; true negatives (TN), denoting correctly
identified target instances; and false negatives (FN), reflecting actual anomalies that were
misclassified as target instances.
proportion of accurately identified target instances out of all target instances:



=
=</p>
        <p>+</p>
        <p>Recall gauges the model's ability to detect all actual anomalies, measuring the proportion of true
anomalies correctly identified out of all existing anomalies:

 +</p>
        <p>Precision assesses the model's reliability when identifying anomalies, showing the proportion of
true anomalies among all instances classified as anomalies:
 +</p>
        <p>F1 score provides a balanced evaluation by calculating the harmonic mean of precision and recall,
offering a single metric that accounts for both aspects:</p>
        <p>∗ 
 1  = 2 ∗ .</p>
        <p>+</p>
        <p>Finally, the accuracy metric measures the overall correctness of the classification, taking both
target instances and anomalies into account:
 + 
 = .</p>
        <p>+  +  +</p>
        <p>After constructing the models, they will be evaluated using these metrics, enabling a
comprehensive analysis of their performance.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Data preparation and outlier removal</title>
        <p>For the experiments, data with the identifier s015 was randomly selected for analysis, while data
from s004 was used as a test set to evaluate the recognition of keystroke dynamics from a different
individual. Outlier detection began by assessing whether the s015 dataset adhered to a multivariate
normal distribution. The Mardia test revealed significant deviations, as the test statistic for
multivariate skewness  1/6, at 391.54, exceeded the chi-square critical value of 215.53 for 165
degrees of freedom at a significance level of 0.005. Similarly, the multivariate kurtosis statistic  2,
with a value of 113.32, surpassing the critical value of 102.62 for a mean of 99, variance of 1.98, and
significance level of 0.005, indicating non-normality and necessitating further normalization.</p>
        <p>Normalization parameters were estimated using the maximum likelihood method, yielding the
following estimates for the multivariate BCT: λ̂1 = 0.9939, λ̂2 = 1.3605, λ̂3 = 1.2202, λ̂4 = 1.7521,
λ̂5 = 2.2965, λ̂6 = 1.0447, λ̂7 = 1.6466, λ̂8 = 1.3512, λ̂9 = 2.0599.</p>
        <p>After applying the nine-variate Box-Cox transformation with components (1), the Mardia test
was performed again. The skewness statistic  1/6 was reduced to 212.07, which is below the
chisquare threshold of 215.53, but the kurtosis statistic  2 remained slightly elevated at 109.01, still
above the critical value of 102.62. Despite some remaining non-normality, primarily due to outliers,
the transformed dataset better approximated a multivariate normal distribution, improving the
conditions for using SMD.</p>
        <p>Subsequently, SMD was computed for each feature vector to identify potential outliers. These
distances were compared to the chi-square critical value of 23.59 for 9 degrees of freedom at a 0.005
significance level. Any vectors with SMD exceeding this value were classified as outliers. The most
extreme outlier, vector number 295 with an SMD of 37.44, was removed.</p>
        <p>This process of outlier removal was iteratively repeated until all extreme points were excluded.
After eliminating 6 outliers, the multivariate kurtosis statistic finally fell below the critical value,
confirming that outliers had a substantial impact on the dataset's distribution.</p>
        <p>Table 1 lists the SMD values and the corresponding indices for each outlier that was removed.
This iterative process continued until no further significant outliers were detected, resulting in a
refined dataset that was less affected by extreme values.</p>
        <p>To mitigate any potential bias related to the order of the data, the final sample was randomly
shuffled to ensure an even distribution across the training and test sets. The shuffled data was then
split into two equal parts, with 195 vectors in each set.</p>
        <p>The training set was utilized to build both the prediction ellipsoid and the machine learning
models, allowing them to capture the underlying patterns and relationships within the data.
Meanwhile, the test set was reserved to assess the performance of the models on data not previously
encountered during training.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Prediction ellipsoid construction</title>
        <p>The prediction ellipsoid should be constructed using data that follows a normal distribution, so
verifying the data's normality is a necessary first step. Based on the Mardia test results, the
multivariate distribution of this training sample deviates from normality. The test statistic for
multivariate skewness  1/6 is 286.99, exceeding the critical value of 215.53 from the chi-square
distribution for 165 degrees of freedom at a 0.005 significance level. Additionally, the test statistic for
multivariate kurtosis  2 is 105.43, also exceeding the critical value of 104.19, given a mean of 99, a
variance of 4.062, and a 0.005 significance level.</p>
        <p>To address this non-normality, the training set is normalized using a nine-variate BCT. The
optimal parameters for this transformation were estimated using the maximum likelihood method:
λ̂1 = 1.3676, λ̂2 = 1.4807, λ̂3 = 1.078, λ̂4 = 1.7393, λ̂5 = 2.1004, λ̂6 = 1.1498, λ̂7 = 1.566, λ̂8 = 1.1685,
λ̂9 = 2.1146.</p>
        <p>After applying the BCT with components (1), the normalized training set has a mean vector  ̅=
{0.70932; 0.66184; 0.86764; -0.57016; -0.47427; 0.81642; 0.62417; -0.81443; -0.47084}. The covariance
matrix   is presented in Table 4.</p>
        <p>The Mardia test performed on the normalized training set indicates conformity with multivariate
normality. The test statistic for multivariate skewness  1/6 is 175.47, which is below the critical
value of 215.53. Similarly, the test statistic for multivariate kurtosis  2 is 99.76, which does not exceed
the critical value of 104.19, confirming that the normalized set follows a multivariate normal
distribution.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Implementation of machine learning algorithms</title>
        <p>This section outlines the implementation of various machine learning algorithms used to recognize
the keystroke dynamics data. Specifically, we explore the One-Class Support Vector Machine,
Isolation Forest, and autoencoder models, each selected for their unique capabilities in anomaly
detection and one-class classification.</p>
        <p>The OCSVM is implemented in Python using the OneClassSVM object from the scikit-learn
library. This implementation allows for the customization of several critical parameters, including
determines the acceptable proportion of training errors and establishes an upper limit for the fraction
of outliers in the training dataset. The radial basis function kernel was chosen for its flexibility in
modeling non-linear relationships among data points. The gamma parameter is set to "auto,"
allowing its value to be computed automatically based on the inverse of the number of features,
influencing the range for each training example; lower values extend the influence while higher
values localize it.</p>
        <p>
          The IF algorithm, also implemented through sci-kit-learn [
          <xref ref-type="bibr" rid="ref26">27</xref>
          ], provides several tunable
parameters for optimizing performance. A key parameter is the contamination level, which defines
the threshold for categorizing new data points as either target or anomalous. After experimentation,
a contamination value of 0.05 was determined to effectively balance the detection of true anomalies
against false positives.
        </p>
        <p>Additional significant parameters include n_estimators, which denotes the number of decision
trees in the forest (set at 100), max_samples, indicating the maximum number of samples per tree
(set to 256), and max_features, specifying the maximum number of features for splitting each node
(set to 1.0 to utilize all features). To classify a sample as either target or anomalous, we compare the
anomaly score against a defined threshold. The scores can range from negative to positive values;
negative scores indicate a higher likelihood of being a target, while positive scores suggest a greater
probability of being anomalous. The selection of the threshold value is application-dependent; in this
analysis, a threshold of 0 yielded optimal results.</p>
        <p>For the AE model, we utilized TensorFlow and Keras, leveraging their combined strengths in
flexibility, scalability, and ease of use. Keras, as a high-level API for building neural networks atop
TensorFlow, simplifies the process of constructing and training models. Meanwhile, TensorFlow
provides the essential computational framework, ensuring efficient performance during training and
inference.</p>
        <p>
          Before passing the data into the neural network, min-max normalization is applied to each feature
individually, scaling all features to a range of [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]. This technique standardizes the features,
promoting stable and efficient learning processes.
        </p>
        <p>
          The AE architecture consists of an input layer configured to accept a nine-variate representation
of the data. The model includes fully connected layers for encoding and decoding operations. During
the encoding phase, the input data is compressed into a lower-dimensional representation,
progressively reducing dimensionality from 9 to 8 and then to 6, creating a bottleneck in the network
structure. This bottleneck layer compels the model to capture essential features of the input data
while minimizing redundancy [
          <xref ref-type="bibr" rid="ref27">28</xref>
          ].
        </p>
        <p>
          Each encoding layer employs rectified linear unit ReLU activation functions, introducing
nonlinearity that facilitates the extraction of complex features. The decoding phase reverses this process,
expanding the dimensionality back to 8 and ultimately to the original 9 dimensions, using ReLU
activation functions to retain the learned non-linear relationships. The final layer utilizes a sigmoid
activation function to constrain output values within the range of [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ], a common choice for
reconstruction and binary classification tasks that require smooth and interpretable outputs. The
structure of the AE is illustrated in Figure 1.
        </p>
        <p>
          To train the model, we employed the Adam optimizer in conjunction with binary cross-entropy
loss, a standard metric for reconstruction tasks aimed at minimizing the discrepancy between the
original and reconstructed data. The Adam optimizer combines the strengths of AdaGrad and
RMSProp [
          <xref ref-type="bibr" rid="ref28">29</xref>
          ], dynamically adjusting the learning rate during training for faster convergence and
improved performance. The binary cross-entropy loss effectively measures the difference between
the input and reconstructed outputs, making it suitable for binary classification problems. The
training process encompasses 25 epochs, with a batch size of 16 instances per batch. Shuffling the
data at each epoch introduces variability, preventing the model from memorizing the training
sequence, and thus enhancing generalization.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>Table 5 displays a comparison of the recognition performance among various methods, including the
Prediction Ellipsoid for Non-Gaussian Data (PENGD) (1), the Prediction Ellipsoid for Normalized
Data (PEND) (7), One-Class Support Vector Machine (OCSVM), Isolation Forest (IF), and
Autoencoder (AE).</p>
      <sec id="sec-5-1">
        <title>Model</title>
        <p>PENGD
PEND
OCSVM</p>
        <p>IF
AE</p>
        <p>All models evaluated in this study demonstrate commendable performance in keystroke dynamics
recognition. However, the PENGD stands out with the lowest accuracy among the models assessed,
indicating that while it can capture some patterns, it may struggle with more complex datasets,
particularly due to the challenges posed by non-Gaussian data distributions. On the other hand, both
the OCSVM and AE exhibit very good performance across multiple metrics, reflecting their
capabilities in identifying true anomalies with high precision and recall. These models effectively
leverage their respective architectures to capture intricate relationships within the data, contributing
to their robust performance. In contrast, the IF did not perform as well as the other models.</p>
        <p>Ultimately, the PEND emerged as the best-performing model, achieving the highest scores across
key evaluation metrics. This reinforces the significance of normalization transformations in
enhancing prediction ellipsoid models for recognition tasks, particularly in scenarios involving
nonGaussian data distributions.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>All models in this study exhibit strong performance in keystroke dynamics recognition, but PEND
stands out as the best performer. The precision, recall, and F1 score of this model are the highest,
demonstrating its ability to handle keystroke dynamics recognition tasks with remarkable accuracy.
The performance of OCSVM and AE is also notable, offering very good results, while IF lags slightly
behind the others.</p>
      <p>The findings underscore that applying the nine-variate BCT played a critical role in boosting
model performance, particularly by improving how the models handle non-Gaussian data.
Multivariate transformations like BCT take into account the correlations between variables, allowing
for a more accurate and comprehensive prediction ellipsoid. This, in turn, enhances the model's
ability to identify intricate patterns in the data, improving both its accuracy and reliability.</p>
      <p>However, there are certain disadvantages to using prediction ellipsoid for normalized data. A
robust model typically requires a dataset of at least 100 instances, which can be a challenge for
smaller datasets. Additionally, selecting the most appropriate normalization transformation can be
complex, especially for datasets with intricate distributions or a large number of outliers. Another
important factor is the choice of significance level, as this can influence the efficiency and reliability
of the prediction ellipsoid.</p>
      <p>Limitations also arise from the outlier removal process, as deleting 10 outliers during
preprocessing may cause the model to miss some underlying patterns in the data. To mitigate this,
more advanced normalization techniques, such as the Johnson transformation, could be considered
to better align the model with the dataset's distribution, improving its ability to generalize across all
relevant data points.</p>
      <p>In this paper, the primary aim was to address the challenge posed by non-Gaussian data
distributions in the context of biometric identification based on keystroke dynamics. Emphasizing
the importance of normalization techniques, specifically the multivariate Box-Cox transformation,
to enhance model accuracy with such data.</p>
      <p>The dataset used in this study represents a 10-character password length, which may not be
optimal for real-world applications. A password length of 20-22 characters, without the use of
uppercase characters, is generally considered preferable, as it allows for more comprehensive feature
extraction. Beyond keystroke length and character variety, several contextual factors, such as the
30], were not considered in this
research. However, these factors could play an important role in biometric identification based on
keystroke dynamics.</p>
      <p>In future research, a broader dataset that includes data reflecting the impact of environmental
factors could be used, along with extended key sequences. The inclusion of these factors would
provide a more realistic representation of user behavior. Additionally, the application of other
normalization techniques, such as the Johnson transformation, could further enhance model
accuracy by addressing the complexity of non-Gaussian distributions.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>The focus of this paper was to address the challenges associated with non-Gaussian data distributions
in the context of keystroke dynamics recognition. The study compared the performance of prediction
ellipsoid models and machine learning algorithms, including OCSVM, IF, and AE. All models
demonstrated a high probability of recognition. Notably, the prediction ellipsoid for non-Gaussian
data had the lowest accuracy, highlighting the challenges posed by complex datasets. However, by
applying the multivariate BCT, the prediction ellipsoid model for normalized data showed significant
performance improvements, emphasizing the critical role of normalization when addressing
nonGaussian data distributions. The BCT not only improved the overall accuracy but also deepened the
understanding of data patterns by considering correlations between variables, ultimately leading to
a more precise prediction ellipsoid.</p>
      <p>Despite these advancements, the study identified certain limitations and challenges. One
significant drawback is the necessity for a large dataset, as constructing a reliable prediction ellipsoid
model generally requires at least 100 instances. Furthermore, selecting the optimal normalization
transformation remains a complex task, especially when dealing with datasets that contain outliers
or exhibit highly intricate distributions. Another challenge lies in determining the appropriate
significance level, which directly affects the reliability and efficiency of the prediction ellipsoid.</p>
      <p>Looking ahead, future research could expand the dataset to include factors like environmental
factors, as well as extended key sequences, to provide a more realistic representation of user
behavior.</p>
      <p>The incorporation of alternative normalization techniques, such as the Johnson transformation,
could further enhance model accuracy by addressing the impact of non-Gaussian data. Further
investigation into model complexity and feature selection for both prediction ellipsoid models and
machine learning algorithms could offer valuable insights for improving keystroke dynamics
recognition.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <sec id="sec-8-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>. arXiv preprint</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] L.
          <string-name>
            <surname>de-Marcos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Martínez-Herráiz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Junquera-Sánchez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cilleruelo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Pages-Arévalo</surname>
          </string-name>
          ,
          <article-title>Comparing machine learning classifiers for continuous authentication on mobile devices by keystroke dynamics</article-title>
          .
          <source>Electronics</source>
          ,
          <year>2021</year>
          . DOI:
          <volume>10</volume>
          .3390/electronics10141622
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alshehri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Coenen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bollegala</surname>
          </string-name>
          , Accurate Continuous and
          <article-title>Non-intrusive User Authentication with Multivariate Keystroke Streaming</article-title>
          ,
          <source>in: 9th International Conference on Knowledge Discovery and Information Retrieval</source>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>70</lpage>
          ,
          <year>2017</year>
          . DOI:
          <volume>10</volume>
          .5220/0006497200610070.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ismail</surname>
          </string-name>
          <string-name>
            <given-names>M</given-names>
            , Salem, Abd El Ghany,
            <surname>Aldakheel</surname>
          </string-name>
          <string-name>
            <surname>EA</surname>
          </string-name>
          , S. Abbas,
          <article-title>Outlier detection for keystroke biometric user authentication</article-title>
          .
          <source>PeerJ Computer Science</source>
          ,
          <year>2024</year>
          . DOI:
          <volume>10</volume>
          .7717/peerj-cs.
          <year>2086</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jo</surname>
          </string-name>
          , JS. Shin,
          <article-title>Keystroke dynamics-based authentication using unique keypad</article-title>
          .
          <source>Sensors</source>
          ,
          <year>2021</year>
          ;
          <volume>21</volume>
          (
          <issue>6</issue>
          ):
          <fpage>2242</fpage>
          . DOI:
          <volume>10</volume>
          .3390/s21062242
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Marques</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Swersky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sander</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Campello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zimek</surname>
          </string-name>
          ,
          <article-title>On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles</article-title>
          .
          <source>Data Min Knowl Disc</source>
          <volume>37</volume>
          ,
          <fpage>1473</fpage>
          <lpage>1517</lpage>
          ,
          <year>2023</year>
          . DOI:
          <volume>10</volume>
          .1007/s10618-023-00931-x
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jung</surname>
          </string-name>
          ,
          <article-title>Evaluation of one-class classifiers for fault detection: Mahalanobis classifiers and the Mahalanobis Taguchi system</article-title>
          ,
          <source>Processes</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ),
          <fpage>1450</fpage>
          ,
          <year>2021</year>
          . DOI:
          <volume>10</volume>
          .3390/pr9081450
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stamp</surname>
          </string-name>
          ,
          <article-title>Machine learning and deep learning for fixed-text keystroke dynamics</article-title>
          .
          <source>Artificial Intelligence for Cybersecurity</source>
          , pp.
          <fpage>309</fpage>
          -
          <lpage>329</lpage>
          ,
          <year>2022</year>
          . DOI:
          <volume>10</volume>
          .48550/arXiv.2107.00507
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Saini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nayyar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bhatia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>El-Sappagh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>A three-step authentication model for mobile phone user using keystroke dynamics</article-title>
          .
          <source>IEEE Access</source>
          .
          <volume>8</volume>
          .
          <fpage>125909</fpage>
          -
          <lpage>125922</lpage>
          ,
          <year>2020</year>
          . DOI:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2020</year>
          .3008019
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>CDAS: A continuous dynamic authentication system</article-title>
          ,
          <source>in: Proceedings of the 2019 8th International Conference on Software and Computer Applications</source>
          , pp.
          <fpage>447</fpage>
          -
          <lpage>452</lpage>
          ,
          <year>2019</year>
          . DOI:
          <volume>10</volume>
          .1145/3316615.3316691 arXiv:
          <fpage>2307</fpage>
          .05529,
          <year>2023</year>
          . DOI:
          <volume>10</volume>
          .48550/arXiv.2307.05529
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Raul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shankarmani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <article-title>A comprehensive review of keystroke dynamics-based authentication mechanism</article-title>
          ,
          <source>in: International Conference on Innovative Computing and Communications. Advances in Intelligent Systems and Computing</source>
          , Vol.
          <volume>1059</volume>
          . Springer, Singapore,
          <year>2020</year>
          . DOI:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          -981-15-0324-5_
          <fpage>13</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Toosi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Akhaee</surname>
          </string-name>
          ,
          <article-title>Time-frequency analysis of keystroke dynamics for user authentication</article-title>
          .
          <source>Future Generation Computer Systems</source>
          ,
          <volume>115</volume>
          ,
          <fpage>438</fpage>
          -
          <lpage>447</lpage>
          ,
          <year>2021</year>
          . DOI:
          <volume>10</volume>
          .1016/j.future.
          <year>2020</year>
          .
          <volume>09</volume>
          .027
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>ML</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Thakur</surname>
          </string-name>
          , MA Obaidat,
          <article-title>A hybrid method for keystroke biometric user identification</article-title>
          .
          <source>Electronics</source>
          ,
          <volume>11</volume>
          (
          <issue>17</issue>
          ):
          <fpage>2782</fpage>
          ,
          <year>2022</year>
          . DOI:
          <volume>10</volume>
          .3390/electronics11172782
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>I.</given-names>
            <surname>Meenakshisundaram</surname>
          </string-name>
          , I. Karunanithi, U. Sahana,
          <article-title>Enhancing user authentication through keystroke dynamics analysis using isolation forest algorithm</article-title>
          , in: 2024
          <source>Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE)</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          ,
          <year>2024</year>
          . DOI:
          <volume>10</volume>
          .1109/ic-
          <fpage>ETITE58242</fpage>
          .
          <year>2024</year>
          .
          <volume>10493648</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Trad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hussein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chehab</surname>
          </string-name>
          ,
          <article-title>Free text keystroke dynamics-based authentication with continuous learning: a case study</article-title>
          ,
          <source>in: 2022 IEEE 21st International Conference on Ubiquitous Computing and Communications</source>
          (IUCC/CIT/DSCI/SmartCNS), Chongqing, China,
          <year>2022</year>
          , pp.
          <fpage>125</fpage>
          -
          <lpage>131</lpage>
          . DOI:
          <volume>10</volume>
          .1109/
          <string-name>
            <surname>IUCC-CIT-DSCI-SmartCNS57392</surname>
          </string-name>
          .
          <year>2022</year>
          .00031
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ouazzane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vassilev</surname>
          </string-name>
          , I. Faruqi, G. Walker,
          <article-title>Keystroke dynamics using auto encoders</article-title>
          ,
          <source>in: 2019 International Conference on Cyber Security and Protection of Digital Services (Cyber Security)</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . Oxford, UK,
          <year>2019</year>
          . DOI:
          <volume>10</volume>
          .1109/CyberSecPODS.
          <year>2019</year>
          .8885203
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Prykhodko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Makarova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Prykhodko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pukhalevych</surname>
          </string-name>
          ,
          <article-title>Application of transformed prediction ellipsoids for outlier detection in multivariate non-gaussian data</article-title>
          ,
          <source>in: 2020 IEEE 15th International Conference on Advanced Trends in Radioelectronics</source>
          , Telecommunications and Computer Engineering (TCSET), pp
          <fpage>359</fpage>
          -
          <lpage>362</lpage>
          ,
          <year>2020</year>
          . DOI:
          <volume>10</volume>
          .1109/TCSET49122.
          <year>2020</year>
          .
          <volume>235454</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>O.</given-names>
            <surname>Oyebola</surname>
          </string-name>
          ,
          <article-title>Examining the distribution of keystroke dynamics features on computer, tablet and mobile phone platforms</article-title>
          ,
          <source>in: Mobile Computing and Sustainable Informatics: Proceedings of ICMCSI</source>
          <year>2023</year>
          , pp.
          <fpage>613</fpage>
          -
          <lpage>620</lpage>
          . Singapore: Springer Nature Singapore. DOI:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          -981-99- 0835-6_
          <fpage>43</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Meijer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Loonstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Coerver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Twose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Redeman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Killestein</surname>
          </string-name>
          ,
          <article-title>Real-world keystroke dynamics are a potentially valid biomarker for clinical disability in multiple sclerosis</article-title>
          .
          <source>Multiple Sclerosis Journal</source>
          ,
          <volume>27</volume>
          (
          <issue>9</issue>
          ),
          <fpage>1421</fpage>
          -
          <lpage>1431</lpage>
          ,
          <year>2021</year>
          . DOI:
          <volume>10</volume>
          .1177/1352458520968797
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Prykhodko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Prykhodko</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Shutko</surname>
          </string-name>
          ,
          <article-title>Estimating the size of web apps created using the CakePHP framework by nonlinear regression models with three predictors</article-title>
          ,
          <source>in: IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT)</source>
          , pp.
          <fpage>333</fpage>
          -
          <lpage>336</lpage>
          . LVIV,
          <string-name>
            <surname>Ukraine</surname>
          </string-name>
          ,
          <year>2021</year>
          . DOI:
          <volume>10</volume>
          .1109/CSIT52700.
          <year>2021</year>
          .9648680
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Prykhodko</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trukhov</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>Application of a ten-variate prediction ellipsoid for normalized data and machine learning algorithms for face recognition</article-title>
          .
          <source>in: Selected Papers of the Seventh International Workshop on Computer Modeling and Intelligent Systems (CMIS-2024). Workshop Proceedings (CMIS-2024)</source>
          , Zaporizhzhia, Ukraine, May 3,
          <year>2024</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , Vol.
          <volume>3702</volume>
          , pp.
          <fpage>362</fpage>
          -
          <lpage>375</lpage>
          ,
          <year>2024</year>
          . https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3702</volume>
          /paper30.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>T.</given-names>
            <surname>Etherington</surname>
          </string-name>
          ,
          <article-title>Mahalanobis distances for ecological niche modelling and outlier detection: implications of sample size, error, and bias for selecting and parameterising a multivariate location and scatter method</article-title>
          .
          <source>PeerJ</source>
          , 9.
          <year>2021</year>
          . DOI:
          <volume>10</volume>
          .7717/peerj.11436
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Todkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Baltazart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ihamouten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dérobert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Guilbert</surname>
          </string-name>
          ,
          <article-title>One-class SVM based outlier detection strategy to detect thin interlayer debondings within pavement structures using Ground Penetrating Radar data</article-title>
          .
          <source>Journal of Applied Geophysics</source>
          ,
          <volume>192</volume>
          ,
          <fpage>104392</fpage>
          ,
          <year>2021</year>
          . DOI:
          <volume>10</volume>
          .1016/j.jappgeo.
          <year>2021</year>
          .104392
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lesouple</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Baudoin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Spigai</surname>
          </string-name>
          , JY. Tourneret,
          <article-title>Generalized isolation forest for anomaly detection</article-title>
          .
          <source>Pattern Recognition Letters</source>
          , Vol.
          <volume>149</volume>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>119</lpage>
          ,
          <year>2021</year>
          . DOI:
          <volume>10</volume>
          .1016/j.patrec.
          <year>2021</year>
          .
          <volume>05</volume>
          .022
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jang-Jaccard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sabrina</surname>
          </string-name>
          ,
          <article-title>Improving performance of autoencoderbased network anomaly detection on NSL-KDD dataset</article-title>
          .
          <source>IEEE Access</source>
          , Vol.
          <volume>9</volume>
          , pp.
          <fpage>140136</fpage>
          -
          <lpage>140146</lpage>
          ,
          <year>2021</year>
          . DOI:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2021</year>
          .3116612
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>W.</given-names>
            <surname>Hilal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Gadsden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yawney</surname>
          </string-name>
          , Financial Fraud:
          <article-title>A review of anomaly detection techniques and recent advances</article-title>
          .
          <source>Expert Systems with Applications</source>
          , Vol.
          <volume>193</volume>
          ,
          <year>2022</year>
          , 116429. DOI:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2021</year>
          .116429
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>M. U.</given-names>
            <surname>Togbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Barry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Boly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chabchoub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chiky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Montiel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.V.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <article-title>Anomaly detection for data streams based on isolation forest using scikit-multiflow</article-title>
          ,
          <source>in: Computational Science and Its Applications ICCSA</source>
          <year>2020</year>
          : 20th International Conference, Cagliari,
          <source>Italy, July 1 4</source>
          ,
          <issue>2020</issue>
          , Proceedings,
          <string-name>
            <surname>Part IV</surname>
          </string-name>
          20 (pp.
          <fpage>15</fpage>
          -
          <lpage>30</lpage>
          ). Springer International Publishing. DOI:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>030</fpage>
          -58811-
          <issue>3</issue>
          _
          <fpage>2</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sewak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Sahay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rathore</surname>
          </string-name>
          ,
          <article-title>An overview of deep learning architecture of deep neural networks and autoencoders</article-title>
          .
          <source>Journal of Computational and Theoretical Nanoscience</source>
          , Vol.
          <volume>17</volume>
          , No.
          <issue>4</issue>
          , pp.
          <fpage>182</fpage>
          -
          <lpage>188</lpage>
          .
          <year>2020</year>
          . DOI:
          <volume>10</volume>
          .1166/jctn.
          <year>2020</year>
          .
          <volume>8648</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>I. H.</given-names>
            <surname>Kartowisastro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Latupapua</surname>
          </string-name>
          ,
          <article-title>A comparison of adaptive moment estimation and rmsprop optimisation techniques for wildlife animal classification using convolutional neural networks</article-title>
          .
          <source>Revue d'Intelligence Artificielle</source>
          , Vol.
          <volume>37</volume>
          , No.
          <issue>4</issue>
          , pp.
          <fpage>1023</fpage>
          -
          <lpage>1030</lpage>
          .
          <year>2023</year>
          . DOI:
          <volume>10</volume>
          .18280/ria.370424
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bilan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bilan</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Bilan, interactive biometric identification system based on the keystroke dynamic</article-title>
          , in: S. Bilan,
          <string-name>
            <given-names>M.</given-names>
            <surname>Elhoseny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Hemanth</surname>
          </string-name>
          (Eds.),
          <source>Biometric Identification Technologies Based on Modern Data Mining Methods</source>
          . Springer, Cham, pp.
          <fpage>39</fpage>
          -
          <lpage>58</lpage>
          ,
          <year>2021</year>
          . DOI:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -48378-
          <issue>4</issue>
          _
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>