1. Introduction

Rastorguev S.P. Software methods for protecting information in computers and networks. Moscow: «Yakhtsmen» Agency Publishing House

10.1145/3313831.3376770

Method of User Authentication by Keyboard Handwriting based on Neural Networks and Genetic Algorithm

Andrii Pryimak

andrii.pryimak@live.com 1

Yurii Yaremchuk

Olha Salieva

salieva8257@gmail.com 1

Vasyl Karpinets

karpinets@gmail.com 1

Nataliia Kunanets

0 0 Lviv Polytechnic National University , 12 Bandera street, Lviv, 79013 , Ukraine 1 Vinnytsia National Technical University , Khmelnytsky highway 95, Vinnytsia, 21000 , Ukraine

1993

188 4 0000 0003

A method of user authentication based on keyboard handwriting with error injection was proposed. It is based on a two-level neural network architecture using five-time functions and built-in sigmoid activation function to increase the efficiency of the neural network. An error code injection was also introduced, which allowed to collect more accurate data on human handwriting and increase the accuracy of correct recognition of the user and his successful authentication by 3-11% compared to existing methods. The use of a hash function based on a genetic algorithm is proposed, which provides the security of storing a code word in the database. Information security, user authentication, neural network, keyboard handwriting, genetic International Workshop of IT-professionals on Artificial Intelligence (ProfIT AI 2021), September 20-21, 2021, Kharkiv, Ukraine ORCID: 0000-0001-9695-0462 (A. Pryimak); 0000-0002-6303-7703 (Y. Yaremchuk); 0000-0003-2388-7321 (O. Salieva); 0000-0001-8148-

1. Introduction

Given the rapid pace of development of information technology, increasing the number of information threats, the degree of uncertainty of their origin and implementation, as well as the complexity of information security systems and their specialized focus, the task of building an information security system becomes relevant. One of the methods of information protection is user authentication. User authentication is the verification that the user being authentication is who he or she claims to be [1]. For correct user authentication, it is necessary for the user to present some unique information, which should be owned only by him and no one else.

Traditional methods of identification and authentication, based on the use of smart cards, USB keys, electronic keys or other portable identifiers, as well as passwords and access codes, have significant disadvantages, such as: the possibility of stealing the item from the user; the need for special equipment for working with magnetic cards, smart cards and others; the ability to make a copy of a unique item. In general, the main disadvantage of such methods is not always reliable authentication [2]. This shortcoming can be eliminated by using biometric authentication methods, such as the dynamics of keystrokes by the user. Biometric characteristics are an integral part of human beings and therefore cannot be falsified, lost or forgotten.

Keystroke dynamics, which represent the typing rhythms that the user performs while typing on the keyboard, provide a high level of security and also have advantages in practical application, as inexpensive implementation of this method is an important indicator compared to scanning fingerprints or irises eyelids that require additional equipment to achieve authentication [3].

Another modern approach to solving the problem of authentication is the use of neural networks. There are a number of architectures that have already become classic - maximum search network, input

2021 Copyright for this paper by its authors. and output star, single-layer perspective, BSP, network with radial basis function (RBF), Hopfield network, Hemming, Cosco, McCulloch-Pitts, Kohonen, Grosnign, and Ymovir APT network. Thus for each class of applied problems the architecture of a neural network is used. Applying a neural network approach to the authentication problem can improve the accuracy of user authentication, because this approach has the property of filtering random interference present in the input data, which allows to abandon the algorithms for smoothing experimental dependencies required for statistical data processing [4].

2. Related works

Methods of user authentication by keyboard handwriting have been practiced relatively recently. In their work Leggett, Umphress and Williams [5] conducted an experiment with 17 programmers. They used measured keystrokes, known as digraphs. The first set consisted of 1400 characters and was used at the learning stage, the second set consisted of 300 characters and was used for verification. In his report, Leggett specified an authentication probability of 89.5%. In their experiments, the authors suggested a possible deviation of the mean retention time of bigram equal to 0.5, the user was considered recognizable if 60% or more of the time delay coincided with the allowable deviation of the sample.

Other work in this area was carried out by Garris, Young and Hammon [6]. In their proposed approach, a matrix of changes in the associated delay vectors was used as a quantity (parameter) that contains data on individual handwriting. The Mahalanobis distance function was then used to determine the similarity between the handwriting identified and the user profile. Unlike others, Young and Hammon used the Euclidean distance between two vectors to compare the number of attributes.

Another well-known work was presented by Rastorguev [7]. In his monograph, the author divided the authentication procedure into two types. The first type is password authentication, where the user goes through the authentication procedure by password, the second type is the authentication of users by a set of random phrases. He also singled out two modes of the authentication procedure: the system setup procedure and the authentication procedure.

When identifying users by typing free text in the mode of setting up the authentication system, the keyboard was divided into four parts. When the user worked on the keyboard, the time intervals between these four parts were calculated, regardless of which key was pressed in these parts. In the authentication mode, the current values were compared with the reference and the system made decisions. The main assumption was that the distribution of temporal characteristics of users seemed to be a normal Gaussian law. Algorithms for excluding gross errors, system settings and authentication were also presented in the paper.

Another paper was presented by Saket Maheshwari and Vikram Pudi [8], who propose a method for identifying a user using keyboard handwriting based on a five-level neural network. The authors investigated in detail the possibility of using three different methods of building a neural network architecture (exclusion method, rectification method, packet normalization method). Studies have shown that the maximum accuracy of user identification is 85.22-93.59%. However, it should be noted that the best accuracy result was achieved by using a five-level neural network, which significantly increases the learning time of this network (up to 9 minutes) per user.

Nura Hanura's [9] work focuses on using the time interval between keystrokes as a feature of a set of individual characters to identify authentic users. A four-level neural network with a multilinear perceptron (MLP) with a built-in error propagation method (BP) is used to train and test functions. The results of this study showed that the accuracy of user identification is in the range of 90-92%.

In the previous work of the authors "Method of user identification by keyboard handwriting based on neural networks" [10] a study of the possibility of using a neural network, which is a two-layer system of direct access to a network with 70 sigmoid hidden neurons and 10 sigmoid source neurons. The obtained indicators of user identification accuracy using the proposed method are 88-93%.

Most of the considered works are based on geometrical methods of recognition, using various degree of closeness between the handwriting sample and its standard (Euclidean, Mahalanobis, etc.). The maximum found probability of authentication of such systems is 90%. Methods based on the use of neural networks have higher accuracy (85.22-93.59%), but it should be noted that the research was conducted using multilevel neural networks, which significantly affected the speed of their learning.

The results of comparing the accuracy of user authentication by existing methods are presented in the Table 1 below.

Based on the analysis of existing methods of user identification by keyboard handwriting, it is seen that the accuracy of identification is in the range from 85.22% to 93.59%, so it remains important to increase the accuracy of identification and development of the appropriate method.

Therefore, it makes sense to pay attention to the injection of an error in the word to verify the user, which can allow to collect more accurate data on human handwriting and reduce the risk of forgery. It can also be borne in mind that a person's keyboard handwriting, like normal handwriting, can change, leading to incorrect validation methods, and the user's reaction to the error remains relatively constant, regardless of changes in speed or correct character set by the user. To increase the security of codeword storage, it can be considered using a genetic approach to generate a hash function of the word.

3. Problem Statement

To study the possibility of using a neural network and a hash function based on a genetic algorithm to improve the accuracy of user authentication based on keyboard handwriting with error injection, as well as to propose a method based on this mathematical apparatus. Also compare the proposed method of authentication with existing ones.

4. Proposed method

It is proposed to use the injection in the form of error generation to collect the necessary information on the user's keyboard handwriting to verify the correctness of the entered word by the user, as well as to use the hash function based on a genetic algorithm to increase the security of code word storage. The neural network architecture, which is a two-tier system of direct access to a network with 70 sigmoid hidden neurons and 10 sigmoid output neurons, using the sigmoid activation function, was chosen to study keyboard writing directly.

The proposed method of user authentication (Figure 1) consists of the following steps: 1. Entering a code word by the user (code word pre-stored in the database as a hash with a use of genetic algorithm - 1). 2. Generate a random word character number to be changed. 3. Generate a random number of the word symbol to which the symbol from the previous step will be changed. 4. Replacing two characters with places - creating an error in the code word. 5. The user makes corrections in the word. Based on its correction, a hash is generated with a use of genetic algorithm - 2. 6. Comparison of two hashes with each other. If they are the same, then the user has made a fix and the method continues to work. If the hashes do not match, the user returns to step 5. 7. User identification based on a trained neural network. 8. If all data converges, the user successfully passed authentication process, if not, he returns to step 1.

This method uses a hash function, the feature of which is that it is based on a genetic algorithm, the main operators of which are crossover on the worst gene, mutation of the two worst genes and fitness function, which uses analysis based on five statistical tests (monobit test, poker test, start test, longruns test and autocorrelation test). Using this approach to generating a hash function allows you to increase the security of code word storage.

The stage of user authentication based on a trained neural network includes nine main stages of collecting information using time functions and its further processing. The main stages are: collection of all necessary data; data preparation and normalization; operation of synchronization functions; basic component analysis; automatic selection of learning parameters; network training; checking the correctness of training; adjustment of parameters; readiness for further use.

Begin

Code word entry and word hash generation Generate a random word character number to be

changed Generate a random number of the word character to which the character from the previous step will be

changed

Replace two characters The user makes corrections in the word

and word hash generation

Checking hashes for equality User identification based on a trained neural network

Verification of the received user data

Authentication has occurred

End

Yes Yes

To collect information, the proposed method contains five-time functions (delay time, up-down delay, down-down delay, up-up delay, total time) that collect the information needed for comparison and identification of the user by the neural network. These are the five-time functions: 1. Delay time - the time of keystroke, which is determined by the following equation = − , (1) where - release time of the -th key; is the time of pressing the -th key. = ( ), = ∑ + 0 0,

=1 where ( ) - activation function; - induced local field; - entrance weight; - signal at the input of the neuron; 0 - additional entrance; 0 - the weight corresponding to it.

Let the number of input parameters be two. To begin with, it is necessary to investigate one hidden layer of neurons. The number of elements on it should be determined using the Arnold-KolmogorovHecht Nelson formula

1+ lNogyQ2(Q)  Nw  N y  NQx +1 ( Nx + N y +1) + N y (7) where is the dimension of the output signal; is the number of elements of the set of educational examples; - the required number of synaptic connections; - the dimension of the input signal.

After performing the calculations, we can conclude that the required number of synaptic connections is in the range of 7 < < 20. To find out the number of required neurons in the hidden layer, you must use the formula (2) (3) (6) (8)

«Up-down» delay - the time difference between releasing a key and pressing the next key = +1 − , where - release time of the -th key; is the time of pressing the -th key. «Down-down» delay - the time difference between releasing the same key twice

= +1 − , where is the time of pressing the -th key.

Thus, the number of neurons in the hidden layer will be in the range of 1 < < 70. To study the entire range, you need to select the number of neurons in the hidden layer at which the learning error will be less. In this case, it would be advisable to select 70 neurons on the hidden layer.

You also need to select activation functions for each layer. Neurons of the input and output layers are responsible only for data input and output, their functions can be left linear. The main calculated load falls on the neurons of the hidden layer, so its activation function should be made sigmoid.

In direct propagation neural networks, synaptic connections are organized in such a way that each neuron in a given level of the hierarchy receives information only from some non-empty set of neurons that are located at a lower level. The name of the networks indicates that they have a dedicated direction of propagation of signals that move from the input through one or more hidden layers to the output layer. It is easy to see that a multilayer neural network can be obtained by cascading single-layer networks with weights matrices

1, 2, … , , (9) where is the number of layers of the neural network.

In the case of linearity of activation functions, a multilayer neural network can be reduced to an equivalent single layer with a matrix of weights = 1 ∗ 2 ∗ … ∗ , so the formation of such structures makes sense only if nonlinear activation functions are used in neurons.

A neural network is proposed and presented in Figure 2, which is a two-layer system of direct access to a network with 70 sigmoid hidden neurons and 10 sigmoid output neurons.

Input

2 w b

Hidden Output Output 10

A detailed diagram of the proposed neural network is presented in Figure 3. The input for the network will be the key hold time and the time intervals between keystrokes.

Training is carried out as follows:

All weights of the network are randomized to small values. 2. The input training vector is fed to the network input and the is calculated using the standard expression signal from each neuron = ∑ .

= −

. ( + 1) = ( ) + .

calculated. 3. The value of the activation threshold function for the signal from each neuron is 4. The error for each neuron is calculated by subtracting the output from the desired output 5. Each weight is modified as follows 6. Repeat steps two through five until the error is small enough. (10) (11) (12) x1 x2 z1 z2 z3 z3

The results of the accuracy of the user recognition as well as comparison with existing methods are presented in the next chapter of this work.

5. Results and discussion

Let us estimate the probability of correctly recognizing the user by his frequency in independent experiments. With the help of the developed software, we will conduct an experiment. To do this, 5 users consistently entered their code word 100 times. The number of access denials is indicated and shown in Table 2. Data on the number of denials of access for a real user To check the applicability of the normal distribution law, the values of and are estimated.

Assuming that

≈ ∗ we obtain: where ∗ - average access frequency.

The obtained values give grounds to believe that in this case the normal distribution law can be applied. According to the tables, we find = 1.652 for = 0,9. Next, calculate 1 and 2 by the following formulas: 6 3 3 7 5 1 = 1 = 2 = 2 = access 1 2 − √

∗(1 − ∗) Thus, the probability of correct user recognition is in the range from 92% to 98%.

A comparison of the proposed method with the existing methods (Figure 4) showed that the proposed method has better accuracy by 2.5-8.5% than the accuracy of the method of Leggett, Umphress and Williams, by 2-8% better accuracy than the method of Rastorguev, by 4.41-6.78% better than the method of Maheshwari and Pudi and 2-6% better than the method of Hanura. The performance of the recognition accuracy of the method proposed in the previous work was also improved by 4-5%.

So the scientific novelty of the work is the proposed method of user authentication by keyboard handwriting based on neural network and genetic algorithm, the feature of which is the use of a neural network in the form of a two-level system of direct access to a network with 70 sigmoid hidden neurons, 10 sigmoid source neurons and sigmoid activation function, the use of error injection into the code word, and the use of a hash function based on a genetic algorithm with crossover on the worst gene, mutation of the two worst genes and fitness function, which uses analysis based on five statistical tests. This allowed to increase the recognition accuracy of the user's keyboard handwriting to 92-98%, which is better by 3-11% compared to existing methods.

6. Conclusions

An experimental study of the possibility of using a neural network and a genetic algorithm to improve the accuracy of user identification based on keyboard handwriting with error injection was made and as a result a method of user authentication was proposed, which is based on a two-level neural network architecture using five-time functions and built-in sigmoid activation function to increase the efficiency of the neural network. An error injection was also introduced, which allowed to collect more accurate data on human handwriting and increase the accuracy of correct recognition of the user and his successful authentication by 3-11% compared to existing methods.

The use of a hash function based on a genetic algorithm is proposed, which is aimed at increasing the security of storing a code word in the database and the impossibility of making any changes to it, as during the authentication process the code word is not compared by itself, but the hash values. 7. References [1] Gavan Leonard Tredoux, Steven J. Harrington. Method and system for providing authentication through aggregate analysis of behavioral and time patterns. Xerox Corporation, Norwalk, CT, 2016. [2] El-Hajj, M., Chamoun, M., Fadlallah, A., & Serhrouchni, A. “Analysis of authentication techniques in Internet of Things (IoT).” In: 2017 1st Cyber Security in Networking Conference (CSNet). IEEE, 2017, pp. 1-3. doi: 10.1109/CSNET.2017.8242006.