=Paper= {{Paper |id=Vol-3003/short9 |storemode=property |title=Method of user authentication by keyboard handwriting based on neural networks and genetic algorithm |pdfUrl=https://ceur-ws.org/Vol-3003/short9.pdf |volume=Vol-3003 |authors=Andrii Pryimak,Yurii Yaremchuk,Olha Salieva,Vasyl Karpinets,Nataliia Kunanets |dblpUrl=https://dblp.org/rec/conf/profitai/PryimakYSKK21 }} ==Method of user authentication by keyboard handwriting based on neural networks and genetic algorithm== https://ceur-ws.org/Vol-3003/short9.pdf

Method of User Authentication by Keyboard Handwriting based
on Neural Networks and Genetic Algorithm
Andrii Pryimaka, Yurii Yaremchuka, Olha Salievaa, Vasyl Karpinetsa and Nataliia Kunanetsb
a
Vinnytsia National Technical University, Khmelnytsky highway 95, Vinnytsia, 21000, Ukraine
b
Lviv Polytechnic National University, 12 Bandera street, Lviv, 79013, Ukraine

Abstract
A method of user authentication based on keyboard handwriting with error injection was
proposed. It is based on a two-level neural network architecture using five-time functions
and built-in sigmoid activation function to increase the efficiency of the neural network.
An error code injection was also introduced, which allowed to collect more accurate data
on human handwriting and increase the accuracy of correct recognition of the user and his
successful authentication by 3-11% compared to existing methods. The use of a hash
function based on a genetic algorithm is proposed, which provides the security of storing
a code word in the database.

Keywords 1
Information security, user authentication, neural network, keyboard handwriting, genetic
algorithm.

1. Introduction
Given the rapid pace of development of information technology, increasing the number of
information threats, the degree of uncertainty of their origin and implementation, as well as the
complexity of information security systems and their specialized focus, the task of building an
information security system becomes relevant. One of the methods of information protection is user
authentication. User authentication is the verification that the user being authentication is who he or she
claims to be [1]. For correct user authentication, it is necessary for the user to present some unique
information, which should be owned only by him and no one else.
Traditional methods of identification and authentication, based on the use of smart cards, USB keys,
electronic keys or other portable identifiers, as well as passwords and access codes, have significant
disadvantages, such as: the possibility of stealing the item from the user; the need for special equipment
for working with magnetic cards, smart cards and others; the ability to make a copy of a unique item.
In general, the main disadvantage of such methods is not always reliable authentication [2]. This
shortcoming can be eliminated by using biometric authentication methods, such as the dynamics of
keystrokes by the user. Biometric characteristics are an integral part of human beings and therefore
cannot be falsified, lost or forgotten.
Keystroke dynamics, which represent the typing rhythms that the user performs while typing on the
keyboard, provide a high level of security and also have advantages in practical application, as
inexpensive implementation of this method is an important indicator compared to scanning fingerprints
or irises eyelids that require additional equipment to achieve authentication [3].
Another modern approach to solving the problem of authentication is the use of neural networks.
There are a number of architectures that have already become classic - maximum search network, input

International Workshop of IT-professionals on Artificial Intelligence (ProfIT AI 2021), September 20–21, 2021, Kharkiv, Ukraine
EMAIL: andrii.pryimak@live.com (A. Pryimak); yurevyar@gmail.com (Y. Yaremchuk); salieva8257@gmail.com (O. Salieva);
karpinets@gmail.com (V. Karpinets); nek.lviv@gmail.com (N. Kunanets)
ORCID: 0000-0001-9695-0462 (A. Pryimak); 0000-0002-6303-7703 (Y. Yaremchuk); 0000-0003-2388-7321 (O. Salieva); 0000-0001-8148-
2002 (V. Karpinets); 0000-0003-3007-2462 (N. Kunanets)
©️ 2021 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
and output star, single-layer perspective, BSP, network with radial basis function (RBF), Hopfield
network, Hemming, Cosco, McCulloch-Pitts, Kohonen, Grosnign, and Ymovir APT network. Thus for
each class of applied problems the architecture of a neural network is used. Applying a neural network
approach to the authentication problem can improve the accuracy of user authentication, because this
approach has the property of filtering random interference present in the input data, which allows to
abandon the algorithms for smoothing experimental dependencies required for statistical data
processing [4].

2. Related works
Methods of user authentication by keyboard handwriting have been practiced relatively recently. In
their work Leggett, Umphress and Williams [5] conducted an experiment with 17 programmers. They
used measured keystrokes, known as digraphs. The first set consisted of 1400 characters and was used
at the learning stage, the second set consisted of 300 characters and was used for verification. In his
report, Leggett specified an authentication probability of 89.5%. In their experiments, the authors
suggested a possible deviation of the mean retention time of bigram equal to 0.5, the user was
considered recognizable if 60% or more of the time delay coincided with the allowable deviation of the
sample.
Other work in this area was carried out by Garris, Young and Hammon [6]. In their proposed
approach, a matrix of changes in the associated delay vectors was used as a quantity (parameter) that
contains data on individual handwriting. The Mahalanobis distance function was then used to determine
the similarity between the handwriting identified and the user profile. Unlike others, Young and
Hammon used the Euclidean distance between two vectors to compare the number of attributes.
Another well-known work was presented by Rastorguev [7]. In his monograph, the author divided
the authentication procedure into two types. The first type is password authentication, where the user
goes through the authentication procedure by password, the second type is the authentication of users
by a set of random phrases. He also singled out two modes of the authentication procedure: the system
setup procedure and the authentication procedure.
When identifying users by typing free text in the mode of setting up the authentication system, the
keyboard was divided into four parts. When the user worked on the keyboard, the time intervals between
these four parts were calculated, regardless of which key was pressed in these parts. In the authentication
mode, the current values were compared with the reference and the system made decisions. The main
assumption was that the distribution of temporal characteristics of users seemed to be a normal Gaussian
law. Algorithms for excluding gross errors, system settings and authentication were also presented in
the paper.
Another paper was presented by Saket Maheshwari and Vikram Pudi [8], who propose a method for
identifying a user using keyboard handwriting based on a five-level neural network. The authors
investigated in detail the possibility of using three different methods of building a neural network
architecture (exclusion method, rectification method, packet normalization method). Studies have
shown that the maximum accuracy of user identification is 85.22-93.59%. However, it should be noted
that the best accuracy result was achieved by using a five-level neural network, which significantly
increases the learning time of this network (up to 9 minutes) per user.
Nura Hanura's [9] work focuses on using the time interval between keystrokes as a feature of a set
of individual characters to identify authentic users. A four-level neural network with a multilinear
perceptron (MLP) with a built-in error propagation method (BP) is used to train and test functions. The
results of this study showed that the accuracy of user identification is in the range of 90-92%.
In the previous work of the authors "Method of user identification by keyboard handwriting based
on neural networks" [10] a study of the possibility of using a neural network, which is a two-layer
system of direct access to a network with 70 sigmoid hidden neurons and 10 sigmoid source neurons.
The obtained indicators of user identification accuracy using the proposed method are 88-93%.
Most of the considered works are based on geometrical methods of recognition, using various degree
of closeness between the handwriting sample and its standard (Euclidean, Mahalanobis, etc.). The
maximum found probability of authentication of such systems is 90%. Methods based on the use of
neural networks have higher accuracy (85.22-93.59%), but it should be noted that the research was
conducted using multilevel neural networks, which significantly affected the speed of their learning.
The results of comparing the accuracy of user authentication by existing methods are presented in
the Table 1 below.

Table 1
Comparative characteristics of existing methods
Method name Accuracy, %
The method of Leggett, Umphress and Williams 89,5
The method of Rastorguev 90
The method of Maheshwari and Pudi 85,22–93,59
The method of Hanura 90–92
The method proposed in the previous work of the authors 88–93

Based on the analysis of existing methods of user identification by keyboard handwriting, it is seen
that the accuracy of identification is in the range from 85.22% to 93.59%, so it remains important to
increase the accuracy of identification and development of the appropriate method.
Therefore, it makes sense to pay attention to the injection of an error in the word to verify the user,
which can allow to collect more accurate data on human handwriting and reduce the risk of forgery. It
can also be borne in mind that a person's keyboard handwriting, like normal handwriting, can change,
leading to incorrect validation methods, and the user's reaction to the error remains relatively constant,
regardless of changes in speed or correct character set by the user. To increase the security of codeword
storage, it can be considered using a genetic approach to generate a hash function of the word.

3. Problem Statement
To study the possibility of using a neural network and a hash function based on a genetic algorithm
to improve the accuracy of user authentication based on keyboard handwriting with error injection, as
well as to propose a method based on this mathematical apparatus. Also compare the proposed method
of authentication with existing ones.

4. Proposed method
It is proposed to use the injection in the form of error generation to collect the necessary information
on the user's keyboard handwriting to verify the correctness of the entered word by the user, as well as
to use the hash function based on a genetic algorithm to increase the security of code word storage. The
neural network architecture, which is a two-tier system of direct access to a network with 70 sigmoid
hidden neurons and 10 sigmoid output neurons, using the sigmoid activation function, was chosen to
study keyboard writing directly.
The proposed method of user authentication (Figure 1) consists of the following steps:
1. Entering a code word by the user (code word pre-stored in the database as a hash with a use of
genetic algorithm - 𝐻1 ).
2. Generate a random word character number to be changed.
3. Generate a random number of the word symbol to which the symbol from the previous step will
be changed.
4. Replacing two characters with places - creating an error in the code word.
5. The user makes corrections in the word. Based on its correction, a hash is generated with a use
of genetic algorithm - 𝐻2 .
6. Comparison of two hashes with each other. If they are the same, then the user has made a fix
and the method continues to work. If the hashes do not match, the user returns to step 5.
7. User identification based on a trained neural network.
8. If all data converges, the user successfully passed authentication process, if not, he returns to
step 1.
This method uses a hash function, the feature of which is that it is based on a genetic algorithm, the
main operators of which are crossover on the worst gene, mutation of the two worst genes and fitness
function, which uses analysis based on five statistical tests (monobit test, poker test, start test, longruns
test and autocorrelation test). Using this approach to generating a hash function allows you to increase
the security of code word storage.
The stage of user authentication based on a trained neural network includes nine main stages of
collecting information using time functions and its further processing. The main stages are: collection
of all necessary data; data preparation and normalization; operation of synchronization functions; basic
component analysis; automatic selection of learning parameters; network training; checking the
correctness of training; adjustment of parameters; readiness for further use.

Begin

Code word entry and word hash generation

Generate a random word character number to be
changed

Generate a random number of the word character to
which the character from the previous step will be
changed

Replace two characters

The user makes corrections in the word
and word hash generation

No Yes
Checking hashes for equality

User identification based on a trained neural network

No Yes
Verification of the received user data

Authentication has occurred

End
Figure 1: Flowchart of the proposed method

To collect information, the proposed method contains five-time functions (delay time, up-down
delay, down-down delay, up-up delay, total time) that collect the information needed for comparison
and identification of the user by the neural network. These are the five-time functions:
1. Delay time - the time of keystroke, which is determined by the following equation
𝐾𝑒𝑦𝐷𝑢𝑟𝑎𝑡𝑖𝑜𝑛 = 𝑅𝑖 − 𝑃𝑖 , (1)
where 𝑅𝑖 - release time of the 𝑖-th key; 𝑃𝑖 is the time of pressing the 𝑖-th key.
2. «Up-down» delay - the time difference between releasing a key and pressing the next key
𝑈𝑝𝐷𝑜𝑤𝑛𝐿𝑎𝑡𝑒 = 𝑃𝑖+1 − 𝑅𝑖 , (2)
where 𝑅𝑖 - release time of the 𝑖-th key; 𝑃𝑖 is the time of pressing the 𝑖-th key.

3. «Down-down» delay - the time difference between releasing the same key twice
𝐷𝑜𝑤𝑛𝐷𝑜𝑤𝑛𝑎𝐿𝑎𝑡𝑒𝑛𝑐𝑦 = 𝑃𝑖+1 − 𝑃𝑖 , (3)
where 𝑃𝑖 is the time of pressing the 𝑖-th key.

4. «Up-up» delay is the time difference between pressing the same key twice
𝑈𝑝𝑈𝑝𝐿𝑎𝑡𝑒𝑛𝑐𝑦 = 𝑅𝑖+1 − 𝑅𝑖 , (4)
where 𝑅𝑖 - release time of the 𝑖-th key.
5. Total time - the time required to enter all the text
𝑇𝑜𝑡𝑎𝑙𝑇𝑦𝑝𝑖𝑛𝑔 = 𝑅𝑖=𝑁 − 𝑃𝑖=1 , (5)
where 𝑅𝑖 - release time of the 𝑖-th key; 𝑃𝑖 is the time of pressing the 𝑖-th key; 𝑁 is the number
of characters in the text.
It is proposed to use a neural network for further processing of the collected information.
To solve the problem of authentication, it is necessary to calculate the expressions for the given
parameters. Mathematically, a neuron is a weighted adder, the only output of which is determined
through its inputs and the weight matrix so that
2 (6)
𝑦 = 𝑓(𝑢), 𝑢 = ∑ 𝑤𝑖 𝑥𝑖 + 𝑤0 𝑥0 ,
𝑖=1
where 𝑓(𝑢) - activation function; 𝑢 - induced local field; 𝑤𝑖 - entrance weight; 𝑥𝑖 - signal at the
input of the neuron; 𝑤0 - additional entrance; 𝑥0 - the weight corresponding to it.
Let the number of input parameters be two. To begin with, it is necessary to investigate one hidden
layer of neurons. The number of elements on it should be determined using the Arnold-Kolmogorov-
Hecht Nelson formula
 Q  (7)
( )
N yQ
 Nw  N y  + 1 N x + N y + 1 + N y
1 + log 2 ( Q )  Nx 
where 𝑁𝑦 is the dimension of the output signal; 𝑄 is the number of elements of the set of educational
examples; 𝑁𝑤 - the required number of synaptic connections; 𝑁𝑥 - the dimension of the input signal.
After performing the calculations, we can conclude that the required number of synaptic connections
is in the range of 7 < 𝑁𝑤 < 20. To find out the number of required neurons in the hidden layer, you
must use the formula
𝑁𝑤 (8)
𝑁= .
𝑁𝑥 + 𝑁𝑦
Thus, the number of neurons in the hidden layer will be in the range of 1 < 𝑁 < 70. To study the
entire range, you need to select the number of neurons in the hidden layer at which the learning error
will be less. In this case, it would be advisable to select 70 neurons on the hidden layer.
You also need to select activation functions for each layer. Neurons of the input and output layers
are responsible only for data input and output, their functions can be left linear. The main calculated
load falls on the neurons of the hidden layer, so its activation function should be made sigmoid.
In direct propagation neural networks, synaptic connections are organized in such a way that each
neuron in a given level of the hierarchy receives information only from some non-empty set of neurons
that are located at a lower level. The name of the networks indicates that they have a dedicated direction
of propagation of signals that move from the input through one or more hidden layers to the output
layer. It is easy to see that a multilayer neural network can be obtained by cascading single-layer
networks with weights matrices
𝑊1, 𝑊 2, … , 𝑊 𝑝 , (9)
where 𝑝 is the number of layers of the neural network.
In the case of linearity of activation functions, a multilayer neural network can be reduced to an
equivalent single layer with a matrix of weights 𝑊 = 𝑊 1 ∗ 𝑊 2 ∗ … ∗ 𝑊 𝑝 , so the formation of such
structures makes sense only if nonlinear activation functions are used in neurons.
A neural network is proposed and presented in Figure 2, which is a two-layer system of direct access
to a network with 70 sigmoid hidden neurons and 10 sigmoid output neurons.

Hidden Output

Input Output
w w

b
+ b
+
2 10

70 10

Figure 2: The offered neural network architecture

A detailed diagram of the proposed neural network is presented in Figure 3.
The input for the network will be the key hold time and the time intervals between keystrokes.
Training is carried out as follows:
1. All weights of the network are randomized to small values.
2. The input training vector 𝑋 is fed to the network input and the 𝑁𝐸𝑇 signal from each neuron
is calculated using the standard expression
𝑁𝐸𝑇 = ∑ 𝑥𝑤 . (10)
𝑗
𝑖
3. The value of the activation threshold function for the 𝑁𝐸𝑇 signal from each neuron is
calculated.
4. The error for each neuron is calculated by subtracting the output from the desired output
𝑒𝑟𝑟𝑜𝑟𝑗 = 𝑡𝑎𝑟𝑔𝑒𝑡𝑗 − 𝑂𝑈𝑇𝑗 . (11)

5. Each weight is modified as follows
𝑊𝑖𝑗 (𝑡 + 1) = 𝑤𝑖𝑗 (𝑡) + 𝑎𝑥 𝑒𝑟𝑟𝑜𝑟𝑗 . (12)
6. Repeat steps two through five until the error is small enough.

z2
y1
x1 z3

y2
x2 z3

y3
...

...

z70
y10

Input layer Hidden layer Output layer
Figure 3: Scheme of the proposed neural network architecture

The results of the accuracy of the user recognition as well as comparison with existing methods are
presented in the next chapter of this work.
5. Results and discussion

Let us estimate the probability 𝑝 of correctly recognizing the user by his frequency in 𝑛 independent
experiments. With the help of the developed software, we will conduct an experiment. To do this, 5
users consistently entered their code word 100 times. The number of access denials is indicated and
shown in Table 2.

Table 2
Data on the number of denials of access for a real user
User № Number of false authentications Frequency of denied Frequency of confirmed
access access

1 6 0.06 0.94
2 3 0.03 0.97
3 3 0.03 0.97
4 7 0.07 0.93
5 5 0.05 0.95

The average value of the correct frequency of user recognition in a series of 100 experiments is 0.96.
To check the applicability of the normal distribution law, the values of 𝑛𝑝 and 𝑛𝑞 are estimated.
Assuming that 𝑝 ≈ 𝑝∗ we obtain:
𝑛𝑝 ≈ 𝑛𝑝∗ = 96, (13)
𝑛𝑞 ≈ 𝑛(1 − 𝑝∗ ) = 4,
where 𝑝∗ - average access frequency.
The obtained values give grounds to believe that in this case the normal distribution law can be
applied. According to the tables, we find 𝑡𝛽 = 1.652 for 𝛽 = 0,9. Next, calculate 𝑝1 and 𝑝2 by the
following formulas:
2 (14)
1 𝑡𝛽 𝑝 ∗ (1 − 𝑝∗ ) 1 𝑡𝛽2
𝑝 + 2 𝑛 − 𝑡𝛽 √
∗
𝑛 +4 2
𝑛
𝑝1 = 2 ,
𝑡𝛽
1+ 𝑛

0,96 + 0,0135 − 0,0366
𝑝1 = = 0,92,
1,027

2 2 (15)
1 𝑡𝛽 𝑝∗ (1 − 𝑝∗ ) 1 𝑡𝛽
𝑝 + 2 𝑛 + 𝑡𝛽 √
∗
𝑛 + 4 𝑛2
𝑝2 = ,
𝑡𝛽2
1+ 𝑛

0,96 + 0,0135 + 0,0366
𝑝2 = = 0,98.
1,027

Thus, the probability of correct user recognition is in the range from 92% to 98%.
A comparison of the proposed method with the existing methods (Figure 4) showed that the proposed
method has better accuracy by 2.5-8.5% than the accuracy of the method of Leggett, Umphress and
Williams, by 2-8% better accuracy than the method of Rastorguev, by 4.41-6.78% better than the
method of Maheshwari and Pudi and 2-6% better than the method of Hanura. The performance of the
recognition accuracy of the method proposed in the previous work was also improved by 4-5%.
Figure 4: Recognition accuracy comparison of proposed and existing methods

So the scientific novelty of the work is the proposed method of user authentication by keyboard
handwriting based on neural network and genetic algorithm, the feature of which is the use of a neural
network in the form of a two-level system of direct access to a network with 70 sigmoid hidden neurons,
10 sigmoid source neurons and sigmoid activation function, the use of error injection into the code
word, and the use of a hash function based on a genetic algorithm with crossover on the worst gene,
mutation of the two worst genes and fitness function, which uses analysis based on five statistical tests.
This allowed to increase the recognition accuracy of the user's keyboard handwriting to 92-98%, which
is better by 3-11% compared to existing methods.

6. Conclusions
An experimental study of the possibility of using a neural network and a genetic algorithm to
improve the accuracy of user identification based on keyboard handwriting with error injection was
made and as a result a method of user authentication was proposed, which is based on a two-level neural
network architecture using five-time functions and built-in sigmoid activation function to increase the
efficiency of the neural network. An error injection was also introduced, which allowed to collect more
accurate data on human handwriting and increase the accuracy of correct recognition of the user and
his successful authentication by 3-11% compared to existing methods.
The use of a hash function based on a genetic algorithm is proposed, which is aimed at increasing
the security of storing a code word in the database and the impossibility of making any changes to it,
as during the authentication process the code word is not compared by itself, but the hash values.

7. References
[1] Gavan Leonard Tredoux, Steven J. Harrington. Method and system for providing authentication
through aggregate analysis of behavioral and time patterns. Xerox Corporation, Norwalk, CT,
2016.
[2] El-Hajj, M., Chamoun, M., Fadlallah, A., & Serhrouchni, A. “Analysis of authentication
techniques in Internet of Things (IoT).” In: 2017 1st Cyber Security in Networking Conference
(CSNet). IEEE, 2017, pp. 1-3. doi: 10.1109/CSNET.2017.8242006.
[3] Salminen, J., Jung, S. G., Chowdhury, S., Sengün, S., & Jansen, B. J. “Personas and analytics: A
comparative user study of efficiency and effectiveness for a user identification task.”
In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 2020, pp.
1-13. doi: 10.1145/3313831.3376770.
[4] Luiz G. Hafemann, Robert Sabourin, Luiz S. Oliveira. Learning features for offline handwritten
signature verification using deep convolutional neural networks. Computer Vision and Pattern
Recognition, 2017, pp 163–176.
[5] Umphress David & Williams Glen. Identity verification through keyboard characteristics.
International Journal of Man-Machine Studies, 1985, pp. - 263-273. doi: 10.1016/S0020-
7373(85)80036-5.
[6] Young J.R. and Hammon R.W. Method and Apparatus for Verifying an Individual's Identity.
Patent Number 4,805,222, U.S. Patent and Trademark Office, Washington, D.C., Feb., 1989.
[7] Rastorguev S.P. Software methods for protecting information in computers and networks. Moscow:
«Yakhtsmen» Agency Publishing House, 1993, 188 p.
[8] Saket Maheshwary, Soumyajit Ganguly, Vikram Pudi. Deep Secure: A Fast and Simple Neural
Network based approach for UserAuthentication and Identification via Keystroke Dynamics.
Conference: 2017 International Joint Conference on Artificial Intelligence (IJCAI), At Melbourne,
Australia, 2017.
[9] Harun N., Woo W.L. and Dlay S.S. Performance of KeystrokeBiometrics AuthenticationSystem
Using Artificial Neural Network (ANN) and Distance Classifier Method. International Conference
on Computer and Communication Engineering (ICCCE 2010). 11–13 May 2010, Kuala Lumpur,
Malaysia, 2010.
[10] Danilyuk I.I., Karpinets V. V., Pryimak A.V., Yaremchuk, Y. Y., Kostyuchenko O.I. Neural
network based method of a user identification by keyboard handwriting. Data recording, storage
& processing, 20(2), 2018, pp. 68-76. doi: 10.35681/1560-9189.2018.20.2.142913.